How do I use the Nokogiri gem to select a div according to nested span class?

Question

If I’ve understood you correctly, you have some HTML that looks something like this:

<div>
  This is the div we want.
  <span class="test1">Span contents</span> Other contents
</div>
<div>
  We don't want this div.
  <span class="something else">Not this</span> one
</div>

and you want to select the first div, but not the second.

This isn’t possible with CSS (and as far as I can tell isn’t possible with any of the CSS extensions Nokogiri implements), but can be done using XPath.

A simple XPath query that would select the div we want could look like this:

//div[span[@class = 'test1']]

This can be read as “all div elements that have span elements as direct children that have class attributes with the value test1”.

This query only tests the class attribute for a direct match against test1, so it won’t match if the class is something like “test1 otherclass”. To get it to work like CSS, you need to change the test to something like:

[contains(concat(' ', normalize-space(@class), ' '), ' test1 ')]

Additionally the original query only selects spans that are direct children of the div. If you have span inside other elements that you want to match, you will need to use the descendant axis in your query.

Putting it all together:

//div[descendant::span[contains(concat(' ', normalize-space(@class), ' '), ' test1 ')]]

Which can be read as “all div elements that have a span descendant that are in the test1 class (in the CSS sense)”.

Obviously to use this you need to use the xpath method not the css method:

divs = @page.xpath("//div[descendant::span[contains(concat(' ', normalize-space(@class), ' '), ' test1 ')]]")