Java with XPath and TagSoup

https://stackoverflow.com/questions/8898376

17-04-2021
|

문제

I am using TagSoup with java to extract some data , but certain XPATH are not working , I just get empty results

  FileReader frInHtml = new FileReader("doc.html");
  BufferedReader brInHtml = new BufferedReader(frInHtml);

  SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser");
   org.jdom.Document jdomDocument = saxBuilder.build(brInHtml);

 // This is working                                
 XPath xpath = XPath.newInstance("/ns:html[1]/ns:body/ns:div[@class='content']/ns:table/ns:tr/ns:td/ns:h1");

// All 3 lines below didn't work , tried them 1 at a time 
  XPath xpath = XPath.newInstance("/ns:html/ns:body/ns:div[7]/ns:table/ns:tbody/ns:tr/ns:td/ns:h1");
  XPath xpath = XPath.newInstance("//html//body//div[7]//table//tbody//tr//td//h1");
  XPath xpath = XPath.newInstance("/html/body/div[7]/table/tbody/tr/td/h1");                               

   xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");

해결책

To debug this you will need to look at the "equivalent XML" produced by TagSoup. And for us to help you, you will need to show us the equivalent XML.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow