Question

I was successfully using the following code to merge multiple large XML files into a new (larger) XML file. Found at least part of this on StackOverflow

   $docList = new DOMDocument();

    $root = $docList->createElement('documents');
    $docList->appendChild($root);

    $doc = new DOMDocument();

    foreach(xmlFilenames as $xmlfilename) {

        $doc->load($xmlfilename);

        $xmlString = $doc->saveXML($doc->documentElement);

        $xpath = new DOMXPath($doc);
        $query = self::getQuery();  // this is the name of the ROOT element

        $nodelist = $xpath->evaluate($query, $doc->documentElement);

        if( $nodelist->length > 0 ) {

            $node = $docList->importNode($nodelist->item(0), true);

            $xmldownload = $docList->createElement('document');

            if (self::getShowFileName())
                $xmldownload->setAttribute("filename", $filename);

            $xmldownload->appendChild($node);

            $root->appendChild($xmldownload);
        }

    }

$newXMLFile = self::getNewXMLFile();
$docList->save($newXMLFile);

I started running into OUT OF MEMORY issues when the number of files grew as did the size of them.

I found an article here which explained the issue and recommended using XMLWriter

So, now trying to use PHP XMLWriter to merge multiple large XML files together into a new (larger) XML file. Later, I will execute xpath against the new file.

Code:

$xmlWriter = new XMLWriter();
$xmlWriter->openMemory();
$xmlWriter->openUri('mynewFile.xml');
$xmlWriter->setIndent(true);
$xmlWriter->startDocument('1.0', 'UTF-8');

$xmlWriter->startElement('documents');

$doc = new DOMDocument();

foreach($xmlfilenames as $xmlfilename) 
{
    $fileContents = file_get_contents($xmlfilename);
    $xmlWriter->writeElement('document',$fileContents);
}

$xmlWriter->endElement();
$xmlWriter->endDocument();
$xmlWriter->flush();

Well, the resultant (new) xml file is no longer correct since elements are escaped - i.e. <?xml version="1.0" encoding="UTF-8"?>

&lt;CONFIRMOWNX&gt;
&lt;Confirm&gt;
&lt;LglVeh id=&quot;GLE&quot;&gt;
&lt;AddrLine1&gt;GLEACHER &amp;amp; COMPANY&lt;/AddrLine1&gt;
&lt;AddrLine2&gt;DESCAP DIVISION&lt;/AddrLine2&gt;

Can anyone explain how to take the content from the XML file and write them properly to new file?

I'm burnt on this and I KNOW it'll be something simple I'm missing.

Thanks. Robert

Was it helpful?

Solution

See, the problem is that XMLWriter::writeElement is intended to, well, write a complete XML element. That's why it automatically sanitize (replace & with &amp;, for example) the contents of what's been passed to it as the second param.

One possible solution is to use XMLWriter::writeRaw method instead, as it writes the contents as is - without any sanitizing. Obviously it doesn't validate its inputs, but in your case it does not seem to be a problem (as you're working with already checked source).

OTHER TIPS

Hmm, Not sure why it's converting it to HTML Characters, but you can decode it like so

htmlspecialchars_decode($data);

It converts special HTML entities back to characters.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top