Nokogiri: Sort Array of IDs according to order in HTML document
-
06-07-2019 - |
Question
I have an unsorted Array holding the following IDs:
@un_array = ['bar', 'para-3', 'para-2', 'para-7']
Is there a smart way of using Nokogiri (or plain Javascript) to sort the array according to the order of the IDs in the example HTML document below?
require 'rubygems'
require 'nokogiri'
value = Nokogiri::HTML.parse(<<-HTML_END)
"<html>
<head>
</head>
<body>
<p id='para-1'>A</p>
<div id='foo'>
<p id='para-2'>B</p>
<p id='para-3'>C</p>
<div id='bar'>
<p id='para-4'>D</p>
<p id='para-5'>E</p>
<p id='para-6'>F</p>
</div>
<p id='para-7'>G</p>
</div>
<p id='para-8'>H</p>
</body>
</html>"
HTML_END
In this case the resulting, sorted array should be:
['para-2', 'para-3', 'bar', 'para-7']
Solution 3
This is the solution a coworker and I came up with:
parent = value.css('body').first
indexes = []
parent.children.each do |child|
indexes << child['id']
end
puts @un_array.sort! { |x,y| indexes.index(x) <=> indexes.index(y) }
First I fetch all IDs of the HTML document into an Array, than I sort @un_array
according to the IDs-Array I created before.
OTHER TIPS
I don't know what Nokogiri is, but if you have the HTML code as a String, than it would be possible to get the order with regexp matching, for example:
var str = '<html>...</html>'; // the HTML code to check
var ids = ['bar', 'para-3', 'para-2', 'para-7']; // the array with all IDs to check
var reg = new RegExp('(?:id=[\'"])('+ids.join('|')+')(?:[\'"])','g') // the regexp
var result = [], tmp; // array holding the result and a temporary variable
while((tmp = reg.exec(str))!==null)result.push(tmp[1]); // matching the IDs
console.log(result); // ['para-2', 'para-3', 'bar', 'para-7']
using this code you have to be careful with IDs containing regexp meta-characters. They should be escaped first.
Here's one way to do it in Nokogiri - there may be others which are more efficient, as this ends up walking the entire DOM.
require 'set'
#Using a set here to make lookup O(1), because we don't care about the initial order
id_set = ['bar', 'para-3', 'para-2', 'para-7'].to_set
sorted = []
value.root.traverse do |node|
node_id = node['id']
sorted << node_id if node_id && id_set.delete?(node_id)
end
# sorted is now ['para-2', 'para-3', 'bar', 'para-7']
EDIT: Here's a one-liner that gets the same results, but I haven't done benchmarking to see which is faster.
ids = ['bar', 'para-3', 'para-2', 'para-7']
value.xpath("//*[@id]").collect {|node| node['id']} & ids