Frage

I am using TagSoup with java to extract some data , but certain XPATH are not working , I just get empty results

  FileReader frInHtml = new FileReader("doc.html");
  BufferedReader brInHtml = new BufferedReader(frInHtml);

  SAXBuilder saxBuilder = new SAXBuilder("org.ccil.cowan.tagsoup.Parser");
   org.jdom.Document jdomDocument = saxBuilder.build(brInHtml);

 // This is working                                
 XPath xpath = XPath.newInstance("/ns:html[1]/ns:body/ns:div[@class='content']/ns:table/ns:tr/ns:td/ns:h1");

// All 3 lines below didn't work , tried them 1 at a time 
  XPath xpath = XPath.newInstance("/ns:html/ns:body/ns:div[7]/ns:table/ns:tbody/ns:tr/ns:td/ns:h1");
  XPath xpath = XPath.newInstance("//html//body//div[7]//table//tbody//tr//td//h1");
  XPath xpath = XPath.newInstance("/html/body/div[7]/table/tbody/tr/td/h1");                               

   xpath.addNamespace("ns", "http://www.w3.org/1999/xhtml");
War es hilfreich?

Lösung

To debug this you will need to look at the "equivalent XML" produced by TagSoup. And for us to help you, you will need to show us the equivalent XML.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top