Getting elements in the order they appear in the document
Question
I have a document and want to extract a couple of elements which ar direct descendents of the parent element but leave out others. The problem is that I don't get the elements in the order they appear in the document. The reason might actually be that the CSS selector I am using is wrong...
require 'rubygems'
require 'nokogiri'
require 'open-uri'
html = <<END
<content>
<p>Lorem</p>
<div>
FOO
<p>BAR</p>
</div>
<h1>Ipsum</h1>
<p>Dolor</p>
<div>
BAR
<h2>FOO</h2>
</div>
<h2>Sit</h2>
<p>Amet</p>
</html>
END
Nokogiri::HTML(html).css('content > p, content > h1, content > h2').inner_html # "<p>Lorem</p><p>Dolor</p><p>Amet</p><h1>Ipsum</h1><h2>Sit</h2>"
What I want is
<p>Lorem</p><h1>Ipsum</h1><p>Dolor</p><h2>Sit</h2><p>Amet</p>
Solution
Try using this XPath:
//content/p|//content/h1|//content/h2
OTHER TIPS
You want the different elements to be listed the way they appear in the document, but as you can see, you get the elements according to the css selector order.
To solve this you would have to add a class attribute to the elements so you select all the elements with that class, than you use only one css selector which would imply that the elements would be in the right order.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow