R-Advanced Web Scraping-bypassing aspNetHidden using xmlTreeParse()

Question

You've not read the help for scrape have you? It returns a list, you need to get parts of that list (if parse=TRUE) and so on.

Also I think that web page is doing some heavy heavy browser detection. If I try and wget the page from the command line I get an error page, the scrape function gets something usable (but seems different to you) and Chrome gets the full junk with all the encoded stuff. Yuck. Here's what works for me:

> URL<-scrape("http://www.digikey.com/product-detail/en/207314-1/A25077-ND/")
> tables = xpathSApply(URL[[1]],'//table')
> tables[[2]]
<table class="product-details" border="1" cellspacing="1" cellpadding="2">
  <tr class="product-details-top"/>
  <tr class="product-details-bottom">
    <td class="pricing-description" colspan="3" align="right">All prices are in US dollars.</td>
  </tr>
  <tr>
    <th align="right">Digi-Key Part Number</th>
    <td id="reportpartnumber"><meta itemprop="productID" content="sku:A25077-ND"/>A25077-ND</td>
    <td class="catalog-pricing" rowspan="6" align="center" valign="top">
      <table id="pricing" frame="void" rules="all" border="1" cellspacing="0" cellpadding="1">
        <tr>
          <th>Price Break</th>
          <th>Unit Price</th>
          <th>Extended Price&#13;
</th>
        </tr>
        <tr>
          <td align="center">1</td>
          <td align="right">2.75000</td>
          <td align="right">2.75</td>

Adjust to your use-case, here I'm getting all the tables and showing the second one, which has the info you want, some of it in the pricing table which you can get directly with:

pricing = xpathSApply(URL[[1]],'//table[@id="pricing"]')[[1]]

> pricing
<table id="pricing" frame="void" rules="all" border="1" cellspacing="0" cellpadding="1">
  <tr>
    <th>Price Break</th>
    <th>Unit Price</th>
    <th>Extended Price&#13;
</th>
  </tr>
  <tr>
    <td align="center">1</td>
    <td align="right">2.75000</td>
    <td align="right">2.75</td>
  </tr>

and so on.