質問

I have a database table which consists the following format of data in one column.

<table cellspacing="1" cellpadding="0" border="0" width="395">
    <tbody>
        <tr>
            <td valign="top" width="135">
                <p>Calories (kcal)<br>Energy (kj)<br>Fats<br>Carbohydrates<br>Protein<br></p>
            </td>
            <td valign="top">
                <p>178<br>748<br>0 g<br>9.6 g<br>0.1 g<br></p>
            </td>
            <td valign="top" width="135">
                <p>Fiber<br>Sugars<br>Cholesterol<br>Sodium<br>Alcohol<br></p>
            </td>
            <td valign="top">
                <p>0 g<br>-<br>0 mg<br>-<br>26.2 g<br></p>
            </td>
        </tr>
    </tbody>
</table>

I want to make another database which has separate columns for Calories, Fats, Carbohydrates and Protein. To separate this data, I need to fetch data from the old database and parse it like this.

$qry = "SELECT * FROM table";
$res = $mysqli->query($qry);

// new dom object
$dom = new DOMDocument();

while ($row = $res->fetch_assoc()) {

    $html = @$dom->loadHTML($row['columndata']);
    //the table by its tag name
    $tables = $dom->getElementsByTagName('table');
    $rows = $tables->item(0)->getElementsByTagName('tr');

    foreach ($rows as $row)
    {
        $cols = $row->getElementsByTagName('td');
        echo $cols->item(0)->nodeValue.'<br />';
        echo $cols->item(1)->nodeValue.'<br />';
    }
}

This outputs the following:

Calories (kcal)Energy (kj)FatsCarbohydratesProtein  
1787480 g9.6 g0.1 g

I am unable to separate the output string to have correct column values in the new database.

For example, I want to have value 178 in the Calories column, 0 g in the Fats column, etc.

役に立ちましたか?

解決

Try iterating over the child nodes of the P elements:

foreach ($rows as $row)
{
    $paragraphs = $row->getElementsByTagName('p');
    //ensure that all the text between <br> is in one text node
    $paragraphs->item(0)->normalize();
    foreach($paragraphs->item(0)->childNodes as $node) {
        if ($node->nodeType == XML_TEXT_NODE) {
            echo $node->nodeValue . '<br/>;
        }
    }

}

It is important to call normalize() on the p element, to ensure that the texts between br elements are in one text node each, and not separated, eg <p>Calories (kcal)<br>Energy (kj)<br>...</p> will have text nodes of Calories (kcal) and Energy (kj), not Cal, ories (, kcal) and so on, which they might be without normalizing.

他のヒント

If you want to get innerHTML of your td element, you may use following construction:

$tdElement = $row->getElementsByTagName('td')->item(0);
$tdElement->ownerDocument->saveHTML( $tdElement );

It should return you inner html of that node as a string.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top