Question

I am using hpple to parse an HTML document. I followed Ray Wenderlich’s tutorial and have everything working fine for their example file. However, I need to change it up a bit to read a certain HTML file for my friends blog. The file is more complex than the example I have used so far. The relevant part of the file (full uploaded on gist is:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">
<!-- snip -->
<div id="content" class="hfeed">
            <div class="post-21443 post type-post status-publish format-standard hentry category-about-catherine">

      <div class="postdate">
      Apr          <br />
      6            <br />
      2013         
      </div>
    <h2 class="entry-title"><a href="http://catherinepooler.com/2013/04/stampnation-live-retreat-updates/" title="StampNation LIVE Retreat Updates" rel="bookmark">StampNation LIVE Retreat Updates</a></h2>

    <div class="post-info"></div>       <div class="entry-content">
        <p><a href="http://catherinepooler.com/wp-content/uploads/2013/04/IMG_0560.jpg" ><img class="aligncenter size-large wp-image-21444" alt="StampNation LIVE" src="http://catherinepooler.com/wp-content/uploads/2013/04/IMG_0560-450x337.jpg" width="450" height="337" /></a></p> <p>StampNation LIVE is in full swing!  We are having a wonderful time.  I am taking a quick break from stamping and chatting to share a few photos with you.</p> <p>I think my favorite thing in getting ready for the retreat was setting up the Accessory Bar.  Each attendee received a small galvanized bucket with their fully glittered initial on it to fill up at the bar.  Awesome!</p>
<!-- snip -->

There are several of these sections within the file and I need to place all the

<h2 class = "entry-title"> 

(title="StampNation LIVE Retreat Updates") in an array. I have successfully placed the

<div class = "entry-content"> 

into an array by using the XPathQuery //div[@class = 'entry-content']/p. However, I can’t seem to get the title without the code crashing due to an empty array. Obviously my XPathQuery is incorrect. This is what I tried.

//h2[@class = 'entry-title']  (: this crashed :)

//div[@class = 'post-21443.....']//h2[@class = 'entry-title']  (: this crashed too.   ")

Along with a slew of other attempts!

Does anyone have any advice for me? I looked into many SO answers, and the examples that came with hpple, but I can not piece it together.

UPDATE: With Jens help I have changed the query to
NSString *postsXpathQueryString = @"//h2[@class = 'entry-title']/a";

This gets me an array, but I get this error as well now.

2013-04-08 10:26:30.604 HTML[12408:11303] * Terminating app due to uncaught exception 'NSRangeException', reason: '* -[__NSArrayM objectAtIndex:]: index 4 beyond bounds [0 .. 3]' * First throw call stack: (0x210a012 0x1203e7e 0x20ac0b4 0x3852 0x2028fb 0x2029cf 0x1eb1bb 0x1fbb4b 0x1982dd 0x12176b0 0x2706fc0 0x26fb33c 0x2706eaf 0x2372bd 0x17fb56 0x17e66f 0x17e589 0x17d7e4 0x17d61e 0x17e3d9 0x1812d2 0x22b99c 0x178574 0x17876f 0x178905 0x9733ab6 0x181917 0x14596c 0x14694b 0x157cb5 0x158beb 0x14a698 0x2065df9 0x2065ad0 0x207fbf5 0x207f962 0x20b0bb6 0x20aff44 0x20afe1b 0x14617a 0x147ffc 0x1d2d 0x1c55) libc++abi.dylib: terminate called throwing an exception

UPDATE 2

Fixed the error index beyond bounds by putting in an if statement when I reloadData. I get an array in my NSLog, but it is not putting it in my table view. Table view comes up empty!! But no more crash!!!

FINAL UPDATE

It is now working, Jens helped me get the query correct and then I just had to fill in the table view. I had set the array count to 20 because Ray's tut had a zillion entries. My friends blog, only had four! Thanks for all the help.

Was it helpful?

Solution

Problem:

Your document contains namespaces:

<html xmlns="http://www.w3.org/1999/xhtml" lang="en-US" xml:lang="en-US">

Solution:

I'm not familiar with hpple nor ObjectiveC, so I can't validate that code I adjusted from on this hpple github issue, but it looks reasonable. I guess all you have to do is change the first parameter to your xpath context variable.

xmlXPathRegisterNs(xpathCtx, [@"xhtml" cString],[@"http://www.w3.org/1999/xhtml" cString]); 

Then, prefix this namespace every time you access an element:

//xhtml:h2[@class = 'entry-title']

If you do not want to use namespaces (and no need to because of having different), you could add the wildcard namespace instead:

//*:h2[@class = 'entry-title']
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top