Question

I am new to Ruby on Rails. I wanted to use CSS parsing in Nokogiri to select the entire div on an HTML page instead of selecting just the nested element. For example, if I select:

@sel = @page.css("div").select{|li| li['class']== "test" }
@sel_net = @sel.css("span").select{|li| li['class']== "test1" }

It will select all the spans from the divs with class equal to test1. But what if I want to select the entire outer div, which has span with a class named test1? Is that possible?

Was it helpful?

Solution

If I’ve understood you correctly, you have some HTML that looks something like this:

<div>
  This is the div we want.
  <span class="test1">Span contents</span> Other contents
</div>
<div>
  We don't want this div.
  <span class="something else">Not this</span> one
</div>

and you want to select the first div, but not the second.

This isn’t possible with CSS (and as far as I can tell isn’t possible with any of the CSS extensions Nokogiri implements), but can be done using XPath.

A simple XPath query that would select the div we want could look like this:

//div[span[@class = 'test1']]

This can be read as “all div elements that have span elements as direct children that have class attributes with the value test1”.

This query only tests the class attribute for a direct match against test1, so it won’t match if the class is something like “test1 otherclass”. To get it to work like CSS, you need to change the test to something like:

[contains(concat(' ', normalize-space(@class), ' '), ' test1 ')]

Additionally the original query only selects spans that are direct children of the div. If you have span inside other elements that you want to match, you will need to use the descendant axis in your query.

Putting it all together:

//div[descendant::span[contains(concat(' ', normalize-space(@class), ' '), ' test1 ')]]

Which can be read as “all div elements that have a span descendant that are in the test1 class (in the CSS sense)”.

Obviously to use this you need to use the xpath method not the css method:

divs = @page.xpath("//div[descendant::span[contains(concat(' ', normalize-space(@class), ' '), ' test1 ')]]")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top