Try:
tids = doc.xpath("//div[contains(concat(' ', @class, ' '),' thing ')]").collect {|node| node['data-thing-id']}
terms = doc.xpath("//div[contains(concat(' ', @class, ' '),' col_b ')]").collect {|node| node.text.strip }
tids.zip(terms).each do |tid, term|
puts tid+" "+term
end
# => 29966403 foobar desc
What the above code is doing is using an XPATH on the doc to find each of the DIVs that contain the classes thing
and col_b
respectively. Then it takes each of the found DIVs and extracts either the attribute data-thing-id
or the displayed text contained within the element, and creates arrays out of the results.
Nokogiri supports both xpath
and css
, and you can find how to fully utilize those tools by looking at their respective documentations