Ignorecase in awk creates unintentional output

https://stackoverflow.com/questions/22402872

14-06-2023
|

Question

I'm pretty new to scripting (2 months) and have run into an issue using Ignorecase in awk which I don't understand. I have a solution already in using sed but I'd still like to know why the below has occurred and how to avoid it.

From this line;

echo foo.BAZ.bar | awk 'IGNORECASE = 1;{gsub(/'.baz.'/,"'.'")};{print}'

I get the output,

foo.BAZ.bar foo.bar

but I expect the output foo.bar only. The behaviour above can be avoided by removing IGNORECASE = 1; from the line, but this then means that .BAZ., will of course, not be removed from foo.BAZ.bar. This behaviour seems odd to me and very undesirable!

Thanks for any input it's greatly appreciated :)

Vince

Solution

You need to do:

echo foo.BAZ.bar | awk 'BEGIN{IGNORECASE = 1}{gsub(/[.]baz[.]/,".")}1'

When you put an explicit ; after the IGNORECASE statement, your line gets printed as is before any modification, since the action is returned true and awk prints the line. Once gsub gets to modify the line, the explicit print then prints the modified line.

Also, notice the way, strings are kept in gsub function.

Update: As stated in the comments by Ed, using the previous solution would test the IGNORECASE variable against every line. Putting it in BEGIN section assigns it once and uses it for entire file.

OTHER TIPS

kent$  echo foo.BAZ.bar|awk -v IGNORECASE=1 '{gsub(/.baz./,".")}7'      
foo.bar

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow