XPath performance with merged Nokogiri::XML::NodeSets?

https://stackoverflow.com/questions/15504909

24-03-2022
|

Question

I am getting data from a web service, 100 <row> per page. My script joins these pages to a Nokogiri::XML::Nodeset. Searching the nodeset via XPath is extremely slow.

This code replaces the web service call and XML parsing, but the symptom is the same:

rows = []
(1..500).to_a.each_slice(100) { |slice|
  rows << Nokogiri::XML::Builder.new { |xml|
    xml.root {
      xml.rows {
        slice.each { |num|
          xml.row {
            xml.NUMBER {
              xml.text num
            }
          }
        }
      }
    }
  }.doc.at('/root/rows')
}

rows = rows.map { |a| a.children }.inject(:+)

The resulting NodeSet contains nodes from five documents. This seems to be a problem:

rows.map { |r| r.document.object_id }.uniq
  => [21430080, 21732480, 21901100, 38743080, 40472240]

The Problem: The following code runs in about ten seconds. With a non-merged nodeset this is done within a blink of an eye:

(1..500).to_a.sample(100).each do |sample|
  rows.at('//row[./NUMBER="%d"]' % sample)
end

Does somebody have a solution to merge the Nodesets a better way or merge the documents?

I would like to keep the behaviour of only one nodeset, as this data is practically one big nodeset, which was split by the web service for technical reasons.

Solution

The key to merge the Nodesets is to detach the nodes with Node#remove and add them to the other nodeset:

nodeset = nil
rows.each do |slice|
  if nodeset.nil?
    nodeset = slice
  else
    slice.children.each do |row|
      nodeset.add_child(row.remove)
    end
  end
end

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow