Question

I've got to parse and flatten an XML file that consists of a lot of single products. The XML is thoroughly documented and it's easy to parse the XML in PHP using SimpleXML. Please, see the code below how I'm creating an array from a single product. I then access all the required keys and store the data in an SQL database.

My problem now is, how to deal with varying child nodes. As you see in the provided XML snippets, there may be a single "Name" node, but sometimes there are two or even more of them. When there is more than one such node, I have to decide according to the "NameType" which "NameText" is the one to use. The same happens with the "Price" nodes.

<Product>
  <Id>123</Id>
  <Name>
    <NameType>3</NameType>
    <NameText>Hello World</NameText>
  </Name>
  <Price>
    <Country>US</Country>
    <Amount>9.90</Amount>
  </Price>
</Product>

<Product>
  <Id>124</Id>
  <Name>
    <NameType>1</NameType>
    <NameText>Goodbye Cruel World</NameText>
  </Name>
  <Name>
    <NameType>3</NameType>
    <NameText>Goodbye Cruel World, I'm Leaving You Today</NameText>
  </Name>
  <Name>
    <NameType>9</NameType>
    <NameText>Goodbye</NameText>
  </Name>
  <Price>
    <Country>CAN</Country>
    <Amount>27.90</Amount>
  </Price>
  <Price>
    <Country>US</Country>
    <Amount>19.90</Amount>
  </Price>
</Product>

Here's my code to deal with this problem: I transform the XML to an associative array and then use a lot of if-magic to get the data I need. The provided code prints out "Hello World" for the first product example and "Goodbye Cruel World" for the second.

$xml = simplexml_load_string($product);
$json = json_encode($xml);
$arr = json_decode($json, True);
// $arr['Name']['NameText'] contains the single NameText for this product in example one
// $arr['Name'][0]['NameText'] contains the first of three NameTexts in example two

if( array_key_exists(0, $arr['Name']) ) {
  foreach( $arr['Name'] as $n) {
    if( $n['NameType'] == 1 ) {
      echo $n['NameText']."\n";
      break;
    } elseif ( $n['NameType'] == 3 ) {
      echo $n['NameText']."\n";
      break;
    }
  }
} else {
  echo $arr['Name']['NameText']."\n";
}

While this code is working, I'm not very glad with the case-by-case analysis for all of the nodes that may occur multiple times. And I even have to depend on the "correct" order of the child nodes, assuming that NameType "1" always happens to come before NameType "3". So I'm inclined to hope that there is a smarter solution out there.

The question XML with varying amount of child nodes for each parent node seems to be similar, but it doesn't really address the part with varying amount of child nodes and the task to select a special child node.

Was it helpful?

Solution

I'm not clear entirely what you're trying to do (you give no clear explanation of the desired output), but I will give you a few pointers:

  • Ditch the conversion to array (the json_decode(json_encode()) hack). All you're doing is throwing away the extra functionality that SimpleXML provides, and potentially throwing away part of the XML data.
  • One of the nice facilities of SimpleXML is that you can write $xml->Product->Name, and it means the first (0th, if you like) Name on the first Product, and so does $xml->Product[0]->Name[0] - regardless of whether there are actually multiple Products and Names.
  • You can also use foreach ( $xml->Product as $product ) in much the way you'd expect - again, it works whether or not there are multiple Product nodes in that particular document.
  • If you don't mind learning a new syntax, XPath can be useful for hunting down nodes based on their value. In SimpleXML, you can start at any node (say, a particular Product) and use the ->xpath() method to get a plain array of "search results" starting at that node.
  • Your code also has some unnecessary duplication, in that the elseif does the same code as the if, so you could just use an or (||). (I'm not sure if this is just a result of anonymization.)

For comparison, here's a live demo of your code, with the XML snippets combined into one XML document.

Using SimpleXML itself, rather than just parsing to an array, you can simplify it down to this (Live Demo):

$xml = simplexml_load_string($xml_data);

foreach ( $xml->Product as $product )
{
    foreach ( $product->Name as $name )
    {
        if ( $name->NameType == 1 || $name->NameType == 3 )
        {
            echo $name->NameText."\n";
            break;
        }
    }
}

Using a simple XPath expression in place of the inner if gives this version (Live Demo):

$xml = simplexml_load_string($xml_data);

foreach ( $xml->Product as $product )
{
    foreach ( $product->xpath('Name[NameType=1 or NameType=3]') as $name )
    {
        echo $name->NameText."\n";
        break;
    }
}

Or you can go all the way and put all the logic into an XPath expression - note the [1] on the end, which is the equivalent of the break; in the inner loop, to stop multiple names being echod for one product (Live Demo):

$xml = simplexml_load_string($xml_data);

foreach ( $xml->xpath('Product/Name[NameType=1 or NameType=3][1]') as $name )
{
    echo $name->NameText."\n";
}

OTHER TIPS

I can't find a suitable method using SimpleXML. I'm more familiar with DomDocument and its loadXML() and load methods.

Instead of changing it to an array just get the children you want with getElementsByTagName().

Nest foreach loops where needed and it should iterate as many times as it needs. So this solves the case-by-case analysis and relying on the document to provide the elements in a specific order.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top