Question

I'm using the following expression in classic asp that successfully grabs any image tag with a .jpg and .png suffix.

re.Pattern = " ]*src=[""'][^ >]*(jpg|png)[""']"

The problem that I've found is many sites that I need to use do not actually use a suffix. So, I need to new regex that finds an image tag and grabs whatever is in the src attribute.

As simple as this sounds, finding an regular expression to accomplish this in Classic ASP seems impossible without writing it myself (which IS impossible).

Please advise.

Was it helpful?

Solution

To match plainly on the img src you can do:

\<img src\=\"(\w+\.(gif|jpg|png)\")

And then if you only want the value that's in the img src, you can do a match for anything in quotes ending in a picture extension (but this may get you false positives depending on what you want):

\w+\.(gif|jpg|png)

But to match just the value while ensuring that it follows img src, you need a negative lookahead to do this (note that I added a matching group there):

(?!.*\<img src\=\")(\w+\.(gif|jpg|png))

Now to include the possibility of having image links in your image source:

(?!.*\<img src\=\")([\/\.\-\:\w]+\.(gif|jpg|png)?[\?\w+\%]+)

And then let's remove the false positives we get by fixing that lazy quantifier after (gif|jpg|png) and moving it to after the next set (which matches data you may get in a JS link, etc.) and making sure we have an end quote:

(?!.*\<img src\=\")([\/\.\-\:\w]+\.(gif|jpg|png)([\?\w+\%]+)?)(?=\")

Note: This will match this data, but regular expressions don't parse HTML, and I personally don't recommend using regular expressions to look through HTML data unless you're doing it on a case-by-case basis. If you're wanting to do some URL/Image scraping via a script, look into an XML/HTML parser.

Sample data:

<a href="myfile.htm"><img src="picture.gif"></a>
<a href="index.htm"><img src="pic859.jpg"></a>
<a href="page-57.htm"><img src="859.png"></a>
<img id="test1" class="answer1" src="text.jpg">
<img src="http://media.site.com/media/img/staff/2013/ROTHBARD-350_s90x126.jpg?e3e29f4a7131cd3bc7c4bf334be801215db5e3c2%22%3E">
<img src="yahoo.com/images/imagename.gif">

HTML Source

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top