Adding a negative lookahead to a URL matching Regex

https://stackoverflow.com/questions/13669485

04-12-2021
|

Pergunta

So I'm tryin to replace all text URLs in multiple elements on my page with that URL in anchor tags:

http://google.com => <a target="_blank" href="http://google.com">http://google.com</a>

var titles = document.querySelectorAll(".title");
var l = titles.length, i, title;
    for(i = 0; i < l; i++) {
    console.log('foo')
    title = titles[i];
    title.innerHTML = title.innerHTML.replace(/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig,"<a target='_blank' href='$1'>$1</a>")
}

The problem is I have to run the regex again after some AJAX comes back and it's re-applying the anchor tags to the urls in the href="" attribute of the anchor tags! So I need to add a negative lookahead to this regex that prevents it from matching any urls with a trailing " or '.

Do match http://google.com but don't match "http://google.com"

/(\b(https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig

Solução

The general lookahead technique to assert that something is not inside double quotes is to check that there is an even number of them until the end of the string:

yourPatternHere(?=[\s\S]*(?:"[\s\S]*"[\s\S])*$)

But you are operating on HTML. This may cause all sorts of problems (unmatched quotes in text nodes, comments, single quote delimited attribute values and so on). Don't use regular expressions to parse HTML. Instead use JavaScript's built-in DOM manipulating capabilities as far as possible. Don't just find the .title elements, but traverse their text nodes instead and apply the replacement only to those.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow