Question

I am getting data from a web service, 100 <row> per page. My script joins these pages to a Nokogiri::XML::Nodeset. Searching the nodeset via XPath is extremely slow.

This code replaces the web service call and XML parsing, but the symptom is the same:

rows = []
(1..500).to_a.each_slice(100) { |slice|
  rows << Nokogiri::XML::Builder.new { |xml|
    xml.root {
      xml.rows {
        slice.each { |num|
          xml.row {
            xml.NUMBER {
              xml.text num
            }
          }
        }
      }
    }
  }.doc.at('/root/rows')
}

rows = rows.map { |a| a.children }.inject(:+)

The resulting NodeSet contains nodes from five documents. This seems to be a problem:

rows.map { |r| r.document.object_id }.uniq
  => [21430080, 21732480, 21901100, 38743080, 40472240]

The Problem: The following code runs in about ten seconds. With a non-merged nodeset this is done within a blink of an eye:

(1..500).to_a.sample(100).each do |sample|
  rows.at('//row[./NUMBER="%d"]' % sample)
end

Does somebody have a solution to merge the Nodesets a better way or merge the documents?

I would like to keep the behaviour of only one nodeset, as this data is practically one big nodeset, which was split by the web service for technical reasons.

Était-ce utile?

La solution

The key to merge the Nodesets is to detach the nodes with Node#remove and add them to the other nodeset:

nodeset = nil
rows.each do |slice|
  if nodeset.nil?
    nodeset = slice
  else
    slice.children.each do |row|
      nodeset.add_child(row.remove)
    end
  end
end
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top