java - org.htmlparser.Parser , need to get whats between the h3's

https://stackoverflow.com/questions/9813705

25-05-2021
|

質問

htmlparser.Parser, I have the snippet of html(see below) and i need to get the content of the there a bunch of these container divs with unqiue id's in my file. I can get the divs and their inner html just fine. I can not figure out how to get the whats between the H3 tags

this snippet of code works for divs but not the h3: if finds the h3 with the correct ID, i just can not figure out how to get the innerHTML or whats between the tags.

thanks for any help

    parser = new Parser();
    parser.setInputHTML(inHTML);
    parser.setEncoding("UTF-8");
    lstNodes = parser.extractAllNodesThatMatch(  new AndFilter(new TagNameFilter("h3"),
                                                  new HasAttributeFilter("id", "h3_"+num)));

This finds it but does not return the data between the h3's

 <div class="container" id="container_2">
      <h3 id="h3_2">Adding a few</h3>       
      <div class="maindiv" id="div_2">
          ...new articles in here jus tto flesh it out.
      </div><!--end of div_2-->
  </div>

解決

i ended up creating my own TAG

class H3Tag extends CompositeTag

他のヒント

You're almost there. You can cast it to HeadingTag manually, and use getStringText() to get text between tags.

NodeList nodes = parser.extractAllNodesThatMatch(new AndFilter(new TagNameFilter("h3"),
    new HasAttributeFilter("id", "h3_"+num)));
SimpleNodeIterator nodeIterator = nodes.elements();
while (nodeIterator.hasMoreNodes()) {
    Node node = nodeIterator.nextNode();
    HeadingTag tag = (HeadingTag)node;
    System.out.println(tag.getStringText());
}

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow