Question

trying to parse some content (no DOM available - or DOM parser for that matter i.e. jQuery, Cheerio) to replace some words/symbols (basically emotions) by images, BUT would like to ignore everything in between <code></code> and <pre></pre> this example works great on replacing all the emotions, but doesn't ignore code and pre tags http://jsbin.com/odARehI/5/edit?js,console

if you run the script, you will see the first print out before the code tag and the second after.

would appreciate another set of eyes on that pattern. Thanks

// see link for a list of the emotions to parse
var pattern = /&gt;:\)|\([\w~]+\)|\\[:]?[od]\/|[:;\|bBiIxX8\(\)\]][=\-"^:]?[)>$&|\w\(\)*@#?]?[)>$&|\w\(\)*@#?]/g;

I tried few things that didn't work without messing up the original match.

For the Don't-parse-html-with-regex-police-department: this is running server side and I do not have the luxury for a DOM parser at the moment.

Thank you.

UPDATE: for a RegExp solution to ignore <code> tags see this neat solution thanks to github/frissdiegurke in this commit

/(^|<\/code>)([^<]*|<(?!code>))*(<code>|$)/g
Was it helpful?

Solution 3

var co = -1, ce = 0, start=0, result; 
while ( ce != -1 ) {
   co = testString.indexOf('<code', ce);
   if (co > -1) {
     result += parse(testString.substring(start,co), pattern1); 
     start = co+1;
     ce = testString.indexOf('</code>', co + 5);
     if (ce >-1 ){
        start = ce + 7;
        ce = start;
        result += testString.substring(co,ce);
     }
  }
}
result += parse(testString.substring(start), pattern1);

console.log(result);

OTHER TIPS

Without DOM parsing you are going to have edge cases which will fail. But, this should work for you.

Given this HTML:

Hello :) <pre>Wassup :)</pre> Maybe :) <code>:) Foo</code> :) Bar

Use this code:

var blocks = [];
html = html.replace(/(?:<pre>.*?<\/pre>|<code>.*?<\/code>)/g, function (match) {
    blocks.push( match );
    return '__BLOCK__';
});

html = html.replace(/:\)/g, 'SMILE');

html = html.replace(/__BLOCK__/g, function () {
    return blocks.shift();
});

Which produces:

Hello SMILE <pre>Wassup :)</pre> Maybe SMILE <code>:) Foo</code> SMILE Bar

Just adjust the /:\)/g replace to work however you need it.

Guess you're using nodejs or a recent javascript engine (for "map" & "split" implementations), so you can do this:

function replaceSpecial(str, pattern, replacement) {
  var REG = /(<code>.*?<\/code>)|(<pre>.*?<\/pre>)/i;
  return str.split(REG).map(function(s) {
    if ('' + s !== s)
      return '';
    if (s.match(REG))
      return s;
    return s.replace(pattern, replacement);
  }).join('');
}

Example:

replaceSpecial("hey :) <code>:)</code> :'( <pre> :'( :)</pre>", /(:\))|(:'\()/, function(s) {
  switch(s) {
    case ":)":
      return '<img src="smile.gif" />';
    case ":'(":
      return '<img src="cry.gif" />';
  }
})

Will return:

"hey <img src="smile.gif" /> <code>:)</code> <img src="cry.gif" /> <pre> :'( :)</pre>"

Or easier if you just want to replace an emoticon:

replaceSpecial("hey :) <code>:)</code>", ":)", '<img src="smile.gif" />')

=>

"hey <img src="smile.gif" /> <code>:)</code>"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top