awk and extracting specific field more than once

https://stackoverflow.com/questions/4571154

awk
gawk

14-10-2019
|

Question

I've got many files with variables in them like

{$var1} some text {$var2} some other text

I'd like to give them to awk so that awk extracts them and gives a result like this:

file_name.htm - 8 : {$title}
file_name.htm - 10 : {$css_style}
file_name.htm - 33 : {$img_carte_image_02_over}

This is a piece of cake with this awk script:

#!/usr/bin/gawk -f
BEGIN { }
match($0, /({.*\$.+})/, tab) {
  for (x=1; tab[x]; x++) {
    print FILENAME" - "FNR" : "substr($0, tab[x, "start"], tab[x, "length"])
  }
}
END { }

I'm calling it like this:

find website/ | grep -E '(html|htm)$' | xargs ./myh.sh | more

Everything works fine except when multiples variables are on the same line. In this case I get:

file_name.htm - 59 : {$var1}<br/>{$var2}

whereas I want:

file_name.htm - 59 : {$var1}
file_name.htm - 59 : {$var2}

Any idea how I could/should do? Of course if you have another solution (with sed or whatever) it's ok for me!

Thanks a lot!

Solution

Try this one:

awk '{
    line=$0; 
    while (match(line,/({[^$]*\$[^}]+})/)){
        print FILENAME,"-",FNR,":",substr(line,RSTART,RLENGTH);
        line=substr(line,RSTART+RLENGTH+1)
    }
}'

The cycle ends when match() returns 0, that is when line doesn't contain any other "{foo$bar}" strings; I used substr() to remove the part of the line which has been already scanned for matches.

OTHER TIPS

Try using a non-greedy regex in the match (http://www.exampledepot.com/egs/java.util.regex/Greedy.html). Probably won't work, but just an idea.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow