Question

I'm doing a scrape using DOMDocument and DOMXpath and can only partially can get the data and I'm not sure what am I doing wrong.

The HTML looks like:

<script src="/MercuryGate/util/validate.js" type="text/javascript"></script>


<html>
<head>
<title>Title</title>

<script language="javascript" type="text/javascript" src="../util/popup_window.js">    </script>

</head>
<body background="white" style="margin: 0 0 0 0; padding: 0 0 0 0;">
<form onSubmit="return false;"> 
<table cellpadding="0" cellspacing="1" class="bids">
    <tr>
        <td class="headerTD">
            Respond By
        </td>
        <td class="headerTD">
           Load
        </td>
        <td class="headerTD">
            Origin
        </td>
        <td class="headerTD">
            State
        </td>
        <td class="headerTD">
            Pickup Range
        </td>
    </tr>
    <tr>
        <td class="oddTD">Data1</td>
        <td class="oddTD"><a href="xxx">2568103S</a></td>
        <td class="oddTD">Data3</td>
        <td class="oddTD">WA</td>
        <td class="oddTD">Data4</td>
    </tr>

    <input type="hidden" id="xxxxx" name="xxxx" value="false" />

    <tr>
        <td class="evenTD">Data1</td>
        <td class="evenTD"><a href="xxx">2568103S</a></td>
        <td class="evenTD">Data3</td>
        <td class="evenTD">WA</td>
        <td class="evenTD">Data4</td>
    </tr>
    <input type="hidden" id="xxxx" name="xxxx" value="false" />

</table>
<br>


<input type="button" value=" Refresh " onclick="refresh()" style="font-size: 8pt; font-family: Arial;">

</form>

My script after the scrape looks like

$dom = new DOMDocument('1.1');
$dom->preserveWhiteSpace = FALSE;
$dom->loadHTML(strtolower($html));
$xpath = new DOMXPath($dom);
$trNodes = $xpath->query("//table/tr");
$counter = -1;
foreach ($trNodes as $tr) {
    $counter++;
    $tdNodes = $xpath->query(".//td[contains(concat(' ',normalize-space(@class),' '),' oddTD ')]", $tr);
    print_r($tdNodes);
}

On print_r($tdNodes); I'm getting 0 lenght on all tr iterations. I dont know why. Can anyone spot an error on my xpath queries?

Was it helpful?

Solution 2

I found my problem... since I'm doing $dom->loadHTML(strtolower($html)); it wont match

$tdNodes = $xpath->query(".//td[contains(concat(' ',normalize-space(@class),' '),' oddTD ')]", $tr);

but

$tdNodes = $xpath->query(".//td[contains(concat(' ',normalize-space(@class),' '),' oddtd ')]", $tr);

OTHER TIPS

If you only need the td tags on each tr, this will do the trick:

foreach ($trNodes as $tr) {
    $tdNodes = $xpath->query('.//td[contains(@class, "headerTD")]', $tr);
    print_r($tdNodes);
}

You can also insert //text() in the XPath query if you want the content of each td.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top