Question

I'm currently attempting to pull in specific data from an html site using xpath queries, but I'm having trouble pulling in specific parts.

Using //div[@id='main']/h2 as my xpath query I am able to pull the "View Current" text using the following:

exampleSite.title = [[element firstChild] content];

However I would also like to pull in the following:

1. <b>5/9/2013<nbsp><nbsp> 10:58:45 PM</b>
2. <b>6.32</b>
3. <b>5  Total Points</b>
4. <b>3.72</b>

So far I've got this: //div[@id='main']/table[@class='bodytext']/tr but that's where I get stuck. Any help would be greatly appreciated! Thank you!

Here is the html I'm attempting to scrape:

<div id="main">
<h2>View Current</h2>

      <table width="96%" border="0" cellpadding="4" cellspacing="0" bordercolor="#eeeeee" align="center" height="276" valign="top" class="bodytext">
        <tr valign="top" >
          <td colspan = 2 height="13" valign="top" align="left" width="54%" class="headerblue" >Balances <br>
          </td>
        </tr>
        <tr valign="top" > 
          <td colspan = 2 height="13" valign="top" align="left" width="54%" class="text" >Balances 
            as of: <b>5/9/2013<nbsp><nbsp> 10:58:45 PM</b></td>
        </tr>
        <tr valign="top" > 
          <td colspan = 2 height="13" valign="top" align="left" width="46%" class="text" >Account 
            Number: <b>101010123</b></td>
        </tr>
        <tr valign="top" > 
          <td colspan = 2 height="13" valign="top" align="left" width="46%" class="text" ></td>
        </tr>

        <tr valign="top" > 
          <td height="13" valign="top" align="left" width="46%" class="text" >Example Card Amount: 
            <b>6.32</b></td>
<td height="13" valign="top" align="left" width="46%" class="text" ><a href="balance.asp?">View Details</a></td>
        </tr>

        <tr valign="top" > 
          <td height="13" valign="top" align="left" width="46%" class="text" >Example Dining Plans:<b>5  Total Points</b>

</td>
<td height="13" valign="top" align="left" width="46%" class="text" ><a href="balance2.asp?">View Details</a></td>
        </tr>

        <tr valign="top" > 
          <td height="13" valign="top" align="left" width="46%" class="text" >Credit For Printing: 
            <b>3.72</b></td>
<td height="13" valign="top" align="left" width="46%" class="text" ><a href="balance1.asp?">View Details</a></td>
        </tr>

          <td colspan = 2 height="13" valign="top" align="CENTER"  class="text">For 
            questions contact Cashiers at<BR> (000)000-0011 or <a href="mailto:example@example.com">example@example.com</a></td>
        </tr>
        <tr valign="top"> 
          <td colspan = 2 height="13" valign="top" align="CENTER"  class="text" > 

<a href="balance1.asp">All Plan Usage for last 90 days is available here</a>
            </td>
        </tr>
        <tr valign="top"> 
          <td colspan = 2 height="13" valign="top" align="CENTER"  class="text" > 

<a href="balance.asp?pln=Full">All Usage for last 365 days is available here</a>
            </td>
        </tr>

      </table>



</div>
Was it helpful?

Solution

//div[@id='main']/table[@class='bodytext']/tr/td/b should give you a list of all <b>s in your table cells.

OTHER TIPS

Here is an extension to Mennny's answer, which is actually right, so you should accept it. I'll try to answer your additional questions in the comments:

You do your parsing like this: (htmlData is my demo data)

NSData *htmlData = [NSData dataWithContentsOfFile:[@"/Users/dennis/Desktop/demo.html" stringByStandardizingPath]];
TFHpple *parser = [[TFHpple alloc] initWithHTMLData:htmlData];
NSArray *bTags = [parser searchWithXPathQuery:@"//div[@id='main']/table[@class='bodytext']/tr/td/b"];

After that you put the contents of the parsed <b>tags in an NSMutableArray.

NSMutableArray *stringsInBTag = [[NSMutableArray alloc] initWithCapacity:0];
for (TFHppleElement *element in bTags) {
    [stringsInBTag addObject:element.content];
}

What you get there is: (logged output of the array)

( "5/9/2013", 101010123, "6.32", "5 Total Points", "3.72" )

Now you want to set your labels:

// Set label 1 to third <b>
self.label1.text = stringsInBTag[2];

// Set label 2 to first <b> 
self.label2.text = stringsInBTag[0];
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top