speed up xml parse with php

https://stackoverflow.com/questions/11730860

23-06-2021
|

Pregunta

Hi I have an xml file with approximately 12,000 records in it. I have the code written and it works fine it just takes awhile to parse the xml file and return the content. Is there any way to speed this process up any?

My Code:

<?php 
$dom = new DOMDocument(); 
$dom->load('comics.xml'); 
foreach ($dom->getElementsByTagName('record') as $entry) 
{   
$title = $entry->getElementsByTagName('title')->item(0)->textContent;   
echo $title;   

} 
?>

XML File (Just 1 demo in there cant link em all lol):

<?xml version='1.0' encoding='utf-8'?>
<calibredb>
  <record>
    <id>1</id>
    <uuid>991639a0-7cf6-4a34-a863-4aab8ac2921d</uuid>
    <publisher>Marvel Comics</publisher>
    <size>6109716</size>
    <title sort="Iron Man v1 101">Iron Man v1 101</title>
    <authors sort="Unknown">
      <author>Unknown</author>
    </authors>
    <timestamp>2012-04-15T18:49:22-07:00</timestamp>
    <pubdate>2012-04-15T18:49:22-07:00</pubdate>
    <cover>M:/Comics/Unknown/Iron Man v1 101 (1)/cover.jpg</cover>
    <formats>
      <format>M:/Comics/Unknown/Iron Man v1 101 (1)/Iron Man v1 101 - Unknown.zip</format>
    </formats>
  </record>
  </calibredb>

Solución

The answer depends a lot on the data. Some possible solutions would be to move the data into a relational database like MySQL, or normalize the data into a format like CSV that is easier to parse, takes up less room, and can be read line by line.

Otros consejos

DOM approach is good for small data sets, because all the XML structure is parsed and put in the memory.

In your situation, you should use SAX approach when parsing large XML files, because the XML file is read line-by-line, not everything at a time.

Google has some examples: https://www.google.lv/search?q=php+SAX+XML

I'm not specifically familiar with the PHP implementation, however using the following approach in C++ using Xerces I've seen huge performance improvements for your scenario.

Instead of requesting all the elements by name and waiting for an entire NodeList to be returned, I found it was much faster to just get the first child node under the root node and then get the NextSibling node. Using each sibling as the new node, you keep getting the NextSibling until there are none left.

Hopefully this provides a performance improvement in PHP similar to how it did in C++.

Licenciado bajo: CC-BY-SA con atribución

No afiliado a StackOverflow