Question

I am trying to create a regex substitution command in bash script, but after playing with single quotes, double quotes and escape characters, I am pulling my hair out. I use RegEXR to compose patterns and came up with following:

I search for:

/\.icon-(.*) {\n\t/gm

and want to replace it with:

if(strpos(\$embedicons,'$1') !== false) { \$svgicons .= <<<'EOD'\n\.$1 {

This replaces this:

.icon-basket-14-icon {
    background-image: url('data:image/svg+xml;charset=US-ASCII,blahblah');
    background-repeat: no-repeat;
}

with this:

if(strpos($embedicons,'basket-14-icon') !== false) { $svgicons .= <<<'EOD'
.basket-14-icon {background-image: url('data:image/svg+xml;charset=US-ASCII,blahblah');
    background-repeat: no-repeat;
}

However, I am pretty unsuccessful in making it happen within a bash script. I am not sure it is relevant, but I am on OSX Mavericks and use Terminal app. While the patterns in RegExr make sense to me, once I start escaping it, I completely lose track of what is going on.

  1. How to make sure the regex is global and multiline?
  2. Is it better to use some other option than builtin regex?
  3. Is there a resource online that teaches (bash) regex to complete noobs?
  4. Is there a group pattern that matches everything (including whitespace and newlines) until a given chain of characters like an asterisk (*) ?

Obviously I am not looking for a handout, as this problem haunts me regularly... I would love to learn regex and solve my problems by learning and I studied some examples online, but they seem far too advanced. Maybe there is an online generator like RegExr that translates into bash compatible and PHP compatible versions of regex patterns?


UPDATE/SOLUTION:

The following seems to work for me in OSX Mavericks terminal:

sed "s|\.icon-\(.*\) {|if(strpos(\$embedicons,'\1') !== false) { \$svgicons \.= <<<'EOD' \.\1 {|g"
Was it helpful?

Solution

I would suggest using sed for this kind of replacement, this line will do what you want:

sed "s/^.icon-\(.*\) {$/if(strpos(\$embedicons,'\1') !== false) { \$svgicons .= <<<'EOD'\n.\1 {/"g input_file.txt

input_file.txt:

.icon-basket-14-icon {
    background-image: url('data:image/svg+xml;charset=US-ASCII,blahblah');
    background-repeat: no-repeat;
}

Output:

if(strpos($embedicons,'basket-14-icon') !== false) { $svgicons .= <<<'EOD'
.basket-14-icon {
    background-image: url('data:image/svg+xml;charset=US-ASCII,blahblah');
    background-repeat: no-repeat;
}

With the -r (extended regex) flag set, you only need to escape literal braces and the variable dollar signs in your example.

With regards to your questions:

  • the g flag for sed makes it global. What exactly do you mean by "multiline"? Outputting newlines is easy using \n, matching across lines is a bit more complex as sed operates line by line. A common technique is to replace all newlines in the file/data with a placeholder, perform the regex/substitution with the placeholder in mind, then replace the placeholder with newlines again.
  • sed is probably your best bet for regex-type stuff. You can find documentation online, this is fairly comprehensive: http://www.grymoire.com/Unix/Sed.html

For the last part, using (.*) will capture everything, then you just have to handle the newlines and make sure you escape your terminating string properly.

testfile:

testing data with space -
and newlines /'\ *** ends
there

Command (tr is swapping newlines for tildes and back again):

tr '\n' '~' < testfile | sed -r 's/(.*)\*\*\*.*/\1/g' | tr '~' '\n'

Output:

testing data with space -
and newlines /'\ 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top