Objective-C HTML parsing. Get all text between tags

https://stackoverflow.com/questions/16376982

14-04-2022
|

Question

I am using hpple to try and grab a torrent description from ThePirateBay. Currently, I'm using this code:

NSString *path = @"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/node()";
NSArray *nodes = [parser searchWithXPathQuery:path];
for (TFHppleElement * element in nodes) {
    NSString *postid = [element content];
    if (postid) {
        [texts appendString:postid];
    }
}

This returns just the plain text, and not any of the URL's for screenshots. Is there anyway to get all links and other tags, not just plain text? The piratebay is fomratted like so:

<pre>
    <a href="http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg" rel="nofollow">
    http://img689.imageshack.us/img689/8292/itskindofafunnystory201.jpg</a>
More texts about the file
</pre>

Solution

That's an easy job and you did it almost correctly!

What you want is the content (or an attribute) of the a-tag, so you need to tell the parser that you want it.

Just change your XPath to

@"//div[@id='content']/div[@id='main-content']/div/div[@id='detailsouterframe']/div[@id='detailsframe']/div[@id='details']/div[@class='nfo']/pre/a"

(You missed the a at the very end and you do not need node())

Output:

http://www.imdb.com/title/tt1904996/
http://leetleech.org/images/65823608764828593230.png
http://leetleech.org/images/44748070481477652927.png
http://leetleech.org/images/42024611449329122742.png

If you only want the screenshot URLs you can do something like

NSMutableArray *screenshotURLs = [[NSMutableArray alloc] initWithCapacity:0];
for (int i = 1; i < nodes.count; i++) {
    [screenshotURLs addObject:nodes[i]];
}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow