Question

I need a regex to search some HTML and find all <img> tags that have this attribute : class="lazy" and not that one : data-original="...".

Here is my sample test markup :

<!-- Must match : -->
<img class="lazy" src="http://lorempicsum.com/futurama/350/200/1" alt="Lorem ipsum" />
<img class="lazy" src="http://placehold.it/640x360/abd125/fff" />
<img class="lazy" src="http://placehold.it/640x360/000/fff"
alt="Blabla" />

<!-- Must not match : -->
<img class="lazy" src="http://placehold.it/255x200/111/fff&text=loading" data-original="http://lorempicsum.com/futurama/255/200/2" width="255" height="200" alt="" />
<img src="http://placehold.it/640x360/111/fff" alt="Blabla" />
<img src="http://placehold.it/640x360/333/fff"
alt="Blabla" />

I wrote this : <img[^>]*class\s*=\s*["']lazy["'][^>]*(?!data-original)[^>]*>

This is not working since it match the 4th tag and it mustn't.

Can you help me ? Thanks.

P.S. Don't worry dudes, I'm not attempting to parse html the Cthulhu Way, I just need to find these tags quickly to fix a large amount of web templates, this is a one shot trick...

Was it helpful?

Solution 2

You need to somehow fix the lookahead, because if it moves, you can miss the 'fail if match' part, and it might also be a good idea to put the class='lazy' in a lookahead as well, and you could perhaps do it like this:

<img(?=[^>]*class\s*=\s*(["'])lazy\1)(?![^>]*data-original)[^>]*>

That way, you don't have to worry about the order data-original and class='lazy' appear either.

regex101 demo

OTHER TIPS

You have to check the negative lookahead (?![^>]*data-original) exactly after the img tag.

<img(?![^>]*data-original)[^>]*class\s*=\s*["']lazy["'][^>]*>
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top