How to parse HTML using nokogiri if the required content doesn't have a class or id? [closed]

https://stackoverflow.com/questions/22599583

19-06-2023
|

Question

Closed. This question needs details or clarity. It is not currently accepting answers.

Want to improve this question? Add details and clarify the problem by editing this post.

Closed 9 years ago.

I am trying to scrape some content and parse it using Nokogiri! Got struck now since i am trying to get some text content which is not enclosed with in any kind of tags. Just text and some of it in tags with out any class or id to it.

Can i find content by searching just the content/text starting and ending of it and get all of it in between?

<body>
text <br/>
<ul>
<li>some more text </li>
</body>

CSS selectors or Xpath, any solution would be great.

Solution

require "nokogiri"

Nokogiri::HTML.parse(<<_).css("body").children.first.text
<body>
text <br/>
<ul>
<li>some more text </li>
</body>
_
# => "\ntext "


Nokogiri::HTML.parse(<<_).css("body").children.first.text.strip
<body>
text <br/>
<ul>
<li>some more text </li>
</body>
_
# => "text"

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow