Question

I am scraping a website and am trying to pull out certain elements from the HTML. In the sites I am scraping, there are script tags with a bunch of info in them however, there is one part inside these tags that I am interested in. The line basically looks like:

'image':'http://ut5.example.com/t/231/3_b_643435.jpg',

With some stuff above and below it. Now, this is different for each page source except for obviously the domain and some of the subfolders that store the images.

How would I go about looking through the source for this specific line, and cutting out just the URL? I would need to use regular expressions I feel as the URLs are dynamic.

The "gsub" method does something similar to what I want to search for, with its ability to use /regex/. But, I am not wanting to replace anything, I just want to find that URL in the source code using a /regex/ and copy it.

Was it helpful?

Solution

According to you comments, this is what you're looking for I guess

var regex = /http.+/;

Example http://jsfiddle.net/Km9ZB/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top