문제

I have a database table which consists the following format of data in one column.

<table cellspacing="1" cellpadding="0" border="0" width="395">
    <tbody>
        <tr>
            <td valign="top" width="135">
                <p>Calories (kcal)<br>Energy (kj)<br>Fats<br>Carbohydrates<br>Protein<br></p>
            </td>
            <td valign="top">
                <p>178<br>748<br>0 g<br>9.6 g<br>0.1 g<br></p>
            </td>
            <td valign="top" width="135">
                <p>Fiber<br>Sugars<br>Cholesterol<br>Sodium<br>Alcohol<br></p>
            </td>
            <td valign="top">
                <p>0 g<br>-<br>0 mg<br>-<br>26.2 g<br></p>
            </td>
        </tr>
    </tbody>
</table>

I want to make another database which has separate columns for Calories, Fats, Carbohydrates and Protein. To separate this data, I need to fetch data from the old database and parse it like this.

$qry = "SELECT * FROM table";
$res = $mysqli->query($qry);

// new dom object
$dom = new DOMDocument();

while ($row = $res->fetch_assoc()) {

    $html = @$dom->loadHTML($row['columndata']);
    //the table by its tag name
    $tables = $dom->getElementsByTagName('table');
    $rows = $tables->item(0)->getElementsByTagName('tr');

    foreach ($rows as $row)
    {
        $cols = $row->getElementsByTagName('td');
        echo $cols->item(0)->nodeValue.'<br />';
        echo $cols->item(1)->nodeValue.'<br />';
    }
}

This outputs the following:

Calories (kcal)Energy (kj)FatsCarbohydratesProtein  
1787480 g9.6 g0.1 g

I am unable to separate the output string to have correct column values in the new database.

For example, I want to have value 178 in the Calories column, 0 g in the Fats column, etc.

도움이 되었습니까?

해결책

Try iterating over the child nodes of the P elements:

foreach ($rows as $row)
{
    $paragraphs = $row->getElementsByTagName('p');
    //ensure that all the text between <br> is in one text node
    $paragraphs->item(0)->normalize();
    foreach($paragraphs->item(0)->childNodes as $node) {
        if ($node->nodeType == XML_TEXT_NODE) {
            echo $node->nodeValue . '<br/>;
        }
    }

}

It is important to call normalize() on the p element, to ensure that the texts between br elements are in one text node each, and not separated, eg <p>Calories (kcal)<br>Energy (kj)<br>...</p> will have text nodes of Calories (kcal) and Energy (kj), not Cal, ories (, kcal) and so on, which they might be without normalizing.

다른 팁

If you want to get innerHTML of your td element, you may use following construction:

$tdElement = $row->getElementsByTagName('td')->item(0);
$tdElement->ownerDocument->saveHTML( $tdElement );

It should return you inner html of that node as a string.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top