Script to Parse and Change Numbers
-
06-07-2019 - |
Question
I am working with numbers a lot when editing a particular type of file, and it's mostly tedious work. The file has a format like this:
damagebase = 8.834
"abc_foo.odf" 3.77
"def_bar.odf" 3.77
"ghi_baz.odf" 3.77
"jkl_blah.odf" 4.05
...
What would you recommend for writing a script that parses this and lets me programmatically change each number?
Language: i use C#, some F# (noob), and Lua. If you suggest regexes, could you provide specific ones as i am not familiar with them?
Solution
You can match runs of non-whitespace and punt to Double.Parse:
int multiplier = 3;
string input =
"damagebase = 8.834\n" +
" \"abc.odf\" 3.77\n" +
" \"def.odf\" 3.77\n" +
" \"ghi.odf\" .77\n" +
" \"jkl.odf\" -4.05\n" +
" \"mno.odf\" 5\n";
Regex r = new Regex(@"^(\w+)\s*=\s*(\S+)" +
@"(?:\s+""([^""]+)""\s+(\S+))+",
RegexOptions.Compiled | RegexOptions.Multiline);
Match m = r.Match(input);
if (m.Success) {
double header = Double.Parse(m.Groups[2].Value);
Console.WriteLine("{0} = {1}", m.Groups[1].Value,
header * multiplier);
CaptureCollection files = m.Groups[3].Captures;
CaptureCollection nums = m.Groups[4].Captures;
for (int i = 0; i < files.Count; i++) {
double val = Double.Parse(nums[i].Value);
Console.WriteLine(@" ""{0}"" {1}", files[i].Value,
val * multiplier);
}
}
else
Console.WriteLine("no match");
Running it gives
damagebase = 26.502
"abc.odf" 11.31
"def.odf" 11.31
"ghi.odf" 2.31
"jkl.odf" -12.15
"mno.odf" 15
OTHER TIPS
Perl is pretty good for stuff like this. Here's a perl script that will do what you want.
#!/usr/bin/env perl
$multiplier = 2.0;
while (<>)
{
$n = /=/ ? 2 : 1;
@tokens = split;
$tokens[$n] *= $multiplier;
print "\t" if not /=/;
print join(' ', @tokens) . "\n";
}
Usage:
./file.pl input_file > output_file
If that's really all you want to do, use awk:
awk '{$NF *= 2.5 ; print }' < input_file > output_file
EDITED: All right, if you want to keep the whitespace as you describe, this should work (although it's getting inelegant).
awk '{$NF *= 2.5} /^\"/{print "\t" $0} !/^\"/{print}' < input_file > output_file
You can use AWK like this (note how the formatting was converted easily for the purpose),
sed 's/damagebase =/damagebase=/g' input.txt |\
awk '{printf " %s %s\n",$1,3.1*$2}' |\
sed 's/.*damagebase=/damagebase =/g'
I am multiplying the 2nd column by 3.1
in this sample script.
Note that to restore your formatting,
there is a TAB inserted at the start of the printf and,
the two sed
commands translate from-and-back your format to a suitable one for the AWK command
I tried
static void Main(string[] args)
{
Console.WriteLine("Please enter the multiplier:");
string stringMult = Console.ReadLine();
int multiplier;
Int32.TryParse(stringMult, out multiplier);
StreamReader sr = new StreamReader(@"C:\Users\[obscured]\Desktop\Fleetops Mod\Data To Process.txt", true);
string input = sr.ReadToEnd();
sr.Close();
StreamWriter sw = new StreamWriter(@"C:\Users\[obscured]\Desktop\Fleetops Mod\Data To Process.txt", false);
Regex r = new Regex(@"^(\w+)\s*=\s*(\S+)" +
@"(?:\s+""([^""]+)""\s+(\S+))+",
RegexOptions.Compiled | RegexOptions.MultiLine);
Match m = r.Match(input);
if (m.Success) {
double header = Double.Parse(m.Groups[2].Value);
sw.WriteLine("{0} = {1}", m.Groups[1].Value,
header * multiplier);
CaptureCollection files = m.Groups[3].Captures;
CaptureCollection nums = m.Groups[4].Captures;
for (int i = 0; i < files.Count; i++) {
double val = Double.Parse(nums[i].Value);
sw.WriteLine(@" ""{0}"" {1}", files[i].Value,
val * multiplier);
}
}
else
Console.WriteLine("no match");
sw.Close();
Console.WriteLine("Done!");
Console.ReadKey();
}
(thanks gbacon) and it comes back with "no match" even when i put in the right data. Why does it do this? Here's the test data:
damagebase = 8.098
"bor_adaptor_03.odf" 3.77
"bor_adaptor_13.odf" 3.77
"bor_adaptor_23.odf" 3.77
"bor_adaptor_33.odf" 4.05
"bor_adaptor_R3.odf" 3.77
"bor_adaptor_T3.odf" 3.77
"bor_cube_BHHHMM.odf" 6.48
"bor_cube_BRHHHM.odf" 4.52
"bor_cube_BRHHMM.odf" 6.48
"bor_cube_BTHHHM.odf" 4.52
"bor_cube_BTHHMM.odf" 6.48
"bor_cube_BTRHHM.odf" 4.52
"bor_cube_BTRHMM.odf" 6.48
"bor_cube_BTTHHM.odf" 4.52
"bor_cube_BTTHMM.odf" 6.48
"bor_cube_BTTRHM.odf" 4.52
"bor_cube_BTTRMM.odf" 6.48
"bor_cube_BTTTHM.odf" 4.52
"bor_cube_BTTTMM.odf" 6.48
"bor_cube_BTTTRM.odf" 4.52
"bor_cube_RHHHMM.odf" 6.48
"bor_cube_THHHMM.odf" 6.48
"bor_cube_TRHHHM.odf" 4.52
"bor_cube_TRHHMM.odf" 6.48
"bor_cube_TTHHHM.odf" 4.52
"bor_cube_TTHHMM.odf" 6.48
"bor_cube_TTRHHM.odf" 4.52
"bor_cube_TTRHMM.odf" 6.48
"bor_cube_TTTHHM.odf" 4.52
"bor_cube_TTTHMM.odf" 6.48
"bor_cube_TTTRHM.odf" 4.52
"bor_cube_TTTRMM.odf" 6.48
"dom_battle_cruiserY2r6.odf" 4.123
"dom_battle_cruiserYr6.odf" 4.123
"dom_battle_cruiserZ2r6.odf" 4.123
"dom_battle_cruiserZr6.odf" 4.123
"dom_battle_cruiser_fed2r6.odf" 4.123
"dom_battle_cruiser_fedr6.odf" 4.123
"dom_defenderr4.odf" 7.775
"dom_defenderr5.odf" 7.452
"dom_defenderr6.odf" 3.793
"dom_dreadnought_borr4.odf" 3.77
"dom_dreadnought_borr5.odf" 3.77
"dom_dreadnought_borr6.odf" 3.77
"dom_dreadnought_fedr4.odf" 3.77
"dom_dreadnought_fedr5.odf" 3.77
"dom_dreadnought_fedr6.odf" 3.77
"dom_dreadnought_klir4.odf" 3.77
"dom_dreadnought_klir5.odf" 3.77
"dom_dreadnought_klir6.odf" 3.77
"dom_dreadnought_romr4.odf" 3.77
"dom_dreadnought_romr5.odf" 3.77
"dom_dreadnought_romr6.odf" 3.77
"dom_intercept_destr4.odf" 5.346
"dom_intercept_destr5.odf" 2.673
"dom_intercept_destr6.odf" 2.673
"dom_intercept_dest_romr4.odf" 5.346
"dom_intercept_dest_romr5.odf" 2.673
"dom_intercept_dest_romr6.odf" 2.673
"fed_ambassadorMr6.odf" 5.67
"fed_ambassadorr6.odf" 5.67
"fed_intrepidYr6.odf" 5.67
"fed_intrepidZr6.odf" 5.67
"fed_intrepid_borr6.odf" 5.67
"fed_mirandaii.odf" 5.905
"fed_mirandaiiM.odf" 5.905
"fed_mirandaiiMr2.odf" 5.905
"fed_mirandaiiMr3.odf" 5.905
"fed_mirandaiiMr4.odf" 5.905
"fed_mirandaiiMr5.odf" 5.905
"fed_mirandaiiMr6.odf" 5.905
"fed_mirandaiir2.odf" 5.905
"fed_mirandaiir3.odf" 5.905
"fed_mirandaiir4.odf" 5.905
"fed_mirandaiir5.odf" 5.905
"fed_mirandaiir6.odf" 5.905
"fed_monsoonr4.odf" 4.782
"fed_monsoonr5.odf" 2.31
"fed_monsoonr6.odf" 3.726
"fed_monsoonZr4.odf" 4.782
"fed_monsoonZr5.odf" 2.31
"fed_monsoonZr6.odf" 3.726
"fed_monsoon_bor.odf" 4.52
"fed_monsoon_borr2.odf" 4.52
"fed_monsoon_borr3.odf" 4.52
"fed_monsoon_borr4.odf" 6.32
"fed_monsoon_borr5.odf" 3.315
"fed_monsoon_borr6.odf" 2.916
"fed_monsoon_klir4.odf" 4.782
"fed_monsoon_klir5.odf" 2.31
"fed_monsoon_klir6.odf" 3.726
"fed_sovereignr4.odf" 6.69
"fed_sovereignr5.odf" 5.51
"fed_sovereignr6.odf" 5.51
"fed_sovereignYr4.odf" 6.69
"fed_sovereignYr5.odf" 5.51
"fed_sovereignYr6.odf" 5.51
"kli_brelr4.odf" 7.452
"kli_brelr5.odf" 6.69
"kli_brelr6.odf" 6.69
"kli_brelZr4.odf" 7.452
"kli_brelZr5.odf" 6.69
"kli_brelZr6.odf" 6.69
"kli_brel_borr4.odf" 7.452
"kli_brel_borr5.odf" 6.69
"kli_brel_borr6.odf" 6.69
"kli_brel_romr4.odf" 7.452
"kli_brel_romr5.odf" 6.69
"kli_brel_romr6.odf" 6.69
"kli_edjenr4.odf" 7.452
"kli_edjenr5.odf" 6.69
"kli_edjenr6.odf" 6.69
"kli_kvortr6.odf" 6.69
"kli_kvortZr6.odf" 6.69
"kli_kvort_fedr6.odf" 6.69
"rom_generix_dreadr4.odf" 7.723
"rom_generix_dreadr5.odf" 7.21
"rom_generix_dreadr6.odf" 7.21
"rom_generix_dreadYr4.odf" 7.723
"rom_generix_dreadYr5.odf" 7.21
"rom_generix_dreadYr6.odf" 7.21
"rom_generix_dread_klir4.odf" 7.723
"rom_generix_dread_klir5.odf" 7.21
"rom_generix_dread_klir6.odf" 7.21
My theory is that because the whitespace preceding each non-header line is a tab (and it won't show up that way here), the regex doesn't match. In case you're wondering, the whitespace IS important.