문제

I'm looking to parse through an HTML request that contains the element:

<img src="https://pbs.twimg.com/media/...." alt="Embedded image permalink"</a>

To try to get the img src tag. All I want is the URL.

At this point I'm probably going overboard. Using Request and Cheerio to try to accomplish this.

Of the 20 different ways that I've tried to do this here's my current code.

var dummy;
request('http://t.co/....', function (error, response, body) {
  if (!error && response.statusCode == 200) {
    $ = cheerio.load(response.body);
    dummy = $('img[alt=Embedded image permalink]').attr('html');
    console.dir(dummy);
  }
}

I get the error message:

selector = selector.substr(data[0].length);
TypeError: Cannot read property '0' of null

As I've said, probably overcomplicating this. What's the simplest (or just functional) way to do this?

도움이 되었습니까?

해결책

Use regexp!

Something like this should do the trick:

html.match(/<img [^>]*src="([^"]*)"/g)

See working example here: http://www.rubular.com/r/f89Y9fHGtN (Caution: Ruby regexes are a bit different than JS ones, but I don't know such a cool tool for the latter.)

 


Regexp explained:

<img – this matches beginning of the tag.

[^>]* – a bit tricky. This gets rid of things in front of src argument (alt argument, for example). This version fails when there's a > char inside of an argument, which probably should not happen. You may try replacing this part with .*, which will work in that case, failing on the other hand when an argument value ends with src=.

src=" – this finds the src argument.

([^"]*)captures the URL inside.

" – finds end of the value.

 

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

 

다른 팁

So if I understand correctly, you want to substract the url immediatly following the src string in a string of text?

Why don't you put all the text in a variable and then double split it?

For example:

    var arrayOfElements = $("#txt").val().split("src=");
    var replacing = arrayOfElements[1].replace(/"/g, "'");
    var url = replacing.split("'");

    //You can now access the element by using url[1]

You can see a working example HERE. Good luck!

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top