Question

I'm really new to PHP so please understand my ignorance.

I'm trying to code a webapp which reads out certain dishes of an xml which is generated by our universities canteen homepage. Their menu is overloaded and really bad in design, so i'm building a mobile optimized webapp as a project in my webdesign class. The webapp will read out only the name of the dish and its price and leave the rest behind. I'm familiar with html/css/javascript and started reading a bit into php, but unfortunately I cant figure out how to get only the important information out of their rss feed.

Their RSS is here: RSS Feed of the canteen

The code I have until now:

<?php 
$xmlfile='http://www.studentenwerk-berlin.de/speiseplan/rss/htw_wilhelminenhof/tag/lang/0000000000000000000000000';
$xml = simplexml_load_file(rawurlencode($xmlfile));

$result = $xml->channel->item->description;
?>

(I know this isnt much...) So I figured out how to load the xml and I found under which path to look for the dishes. They're in "description". But now the Problem is, that theses dishes are not lying well ordered in subpaths, but all in one line in "description". (See the XML from above) How can I access for example all salads (Salate) and put them into an array to be able to format them later into a new table?

This is how the original table looks on their website: Canteen

(I know that you have to ask the owner, before reading something of a website. This app is only for an exercise at university.)

Was it helpful?

Solution

Instead of an array, you can also approach this with an Iterator that encapsulates the logic to traverse the descriptions HTML for the meals. It's simple to use as it sheds away the complexity of doing the parsing.

Here is an example followed by the output:

$uri = 'http://www.studentenwerk-berlin.de/speiseplan/rss/htw_wilhelminenhof/tag/lang/0000000000000000000000000';
$rss = simplexml_load_file($uri);
$meals = new MealIterator($rss->channel->item->description, 'Salate');
foreach ($meals as $entry) {
    vprintf("%s - %s\n", $entry);
}

Output:

Große Salatschüssel mit gekochtem Ei - EUR 1.55 / 2.50 / 3.25
Kleine Salatschale - EUR 0.55 / 0.90 / 1.15
Doppelt-Große Salatschale - EUR 2.95 / 4.70 / 6.20
Große Salatschale - EUR 1.55 / 2.50 / 3.25

The iterator makes use of PHP's built in DOM functionality, namely DOMDocument and DOMXpath. The first step is to obtain the table that contains one meal per each row. This is done with xpath in the constructor already:

public function __construct($html, $meal)
{
    $doc   = $this->createHtmlDoc($html);
    $xpath = new DOMXPath($doc);
    $expr  = sprintf('//th[.=%s]/../../following-sibling::tr', $this->xpathString($meal));
    $items = $xpath->query($expr);
    if ($items === FALSE) {
        throw new UnexpectedValueException('Failed to query the HTML document');
    }
    parent::__construct($items);
}

The key power to use here is Xpath. It will return a result that is one <tr> each containing one meal.

Still the data of each meal needs to be extracted. This is done in the current method of the iterator then:

public function current()
{
    $entry = parent::current();
    $tds   = $entry->getElementsByTagname('td');
    $name  = $this->childTextContent($tds->item(0));
    $price = trim($tds->item(1)->textContent);
    return compact("name", "price");
}

This is using merely DOMElement traversal methods (documented in the manual) and as this was a bit harder to parse, another quickly written helper method fetching only direct child text nodes content for the name of the meal:

private function childTextContent(DOMNode $node)
{
    $buffer = '';
    foreach ($node->childNodes as $child) {
        if ($child instanceof DOMText) {
            $buffer .= $child->textContent;
        }
    }
    return trim($buffer);
}

(You can see the full code of the iterator.)

Key points in this solution:

  • Encapsulate the parsing in an iterator - if the source changes, the parsing might change as well - but not the whole program.
  • Re-use existing libraries like simplexml and the sister library domdocument.
  • Solve the problem by dividing from big into small.

If you now say, you want to have an iterator instead of an array, it's pretty close, convert the iterator into an array:

print_r(iterator_to_array($meals, false));

Array
(
    [0] => Array
        (
            [name] => Große Salatschüssel mit gekochtem Ei
            [price] => EUR 1.55 / 2.50 / 3.25
        )

    [1] => Array
        (
            [name] => Kleine Salatschale
            [price] => EUR 0.55 / 0.90 / 1.15
        )

    [2] => Array
        (
            [name] => Doppelt-Große Salatschale
            [price] => EUR 2.95 / 4.70 / 6.20
        )

    [3] => Array
        (
            [name] => Große Salatschale
            [price] => EUR 1.55 / 2.50 / 3.25
        )

)

The routine to create an xpath string is from: Mitigating XPath Injection Attacks in PHP

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top