java - org.htmlparser.Parser , need to get whats between the h3's
-
25-05-2021 - |
Question
htmlparser.Parser, I have the snippet of html(see below) and i need to get the content of the there a bunch of these container divs with unqiue id's in my file. I can get the divs and their inner html just fine. I can not figure out how to get the whats between the H3 tags
this snippet of code works for divs but not the h3: if finds the h3 with the correct ID, i just can not figure out how to get the innerHTML or whats between the tags.
thanks for any help
parser = new Parser();
parser.setInputHTML(inHTML);
parser.setEncoding("UTF-8");
lstNodes = parser.extractAllNodesThatMatch( new AndFilter(new TagNameFilter("h3"),
new HasAttributeFilter("id", "h3_"+num)));
This finds it but does not return the data between the h3's
<div class="container" id="container_2">
<h3 id="h3_2">Adding a few</h3>
<div class="maindiv" id="div_2">
...new articles in here jus tto flesh it out.
</div><!--end of div_2-->
</div>
Solution
i ended up creating my own TAG
class H3Tag extends CompositeTag
OTHER TIPS
You're almost there. You can cast it to HeadingTag
manually, and use getStringText()
to get text between tags.
NodeList nodes = parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("h3"),
new HasAttributeFilter("id", "h3_"+num)));
SimpleNodeIterator nodeIterator = nodes.elements();
while (nodeIterator.hasMoreNodes()) {
Node node = nodeIterator.nextNode();
HeadingTag tag = (HeadingTag)node;
System.out.println(tag.getStringText());
}
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow