I am getting data from a web service, 100 <row>
per page. My script joins these pages to a Nokogiri::XML::Nodeset. Searching the nodeset via XPath is extremely slow.
This code replaces the web service call and XML parsing, but the symptom is the same:
rows = []
(1..500).to_a.each_slice(100) { |slice|
rows << Nokogiri::XML::Builder.new { |xml|
xml.root {
xml.rows {
slice.each { |num|
xml.row {
xml.NUMBER {
xml.text num
}
}
}
}
}
}.doc.at('/root/rows')
}
rows = rows.map { |a| a.children }.inject(:+)
The resulting NodeSet contains nodes from five documents. This seems to be a problem:
rows.map { |r| r.document.object_id }.uniq
=> [21430080, 21732480, 21901100, 38743080, 40472240]
The Problem: The following code runs in about ten seconds. With a non-merged nodeset this is done within a blink of an eye:
(1..500).to_a.sample(100).each do |sample|
rows.at('//row[./NUMBER="%d"]' % sample)
end
Does somebody have a solution to merge the Nodesets a better way or merge the documents?
I would like to keep the behaviour of only one nodeset, as this data is practically one big nodeset, which was split by the web service for technical reasons.