Question

I am trying to learn how to parse HTML, but as I don't have a lot of experience in either Java or Android, it's a little complicated. I have read the IBM XML parsing tutorial and have learned to parse an RSS feed. My problem is: I would like to get data from an HTML site. I have read some information on HTML cleaner, JSON, etc., but I can't find a good tutorial to help me. Do you have any tutorials that might be helpful?

Thanks.

Was it helpful?

Solution

Check out the following HTML parsers. There are more out there. Maybe one will work for you:

OTHER TIPS

IMO there are two easy ways to parse HTML:

  • Convert the HML to XML (XHTML) using a library (e.g. HTMLTidy) and then use an XML parser
  • Use an existing HTML parser (e.g. a standard Web browser like WebKit, ForeFox, and/or IE) and then read the "DOM" which is a more-or-less-API-friendly representation of the parsed HTML

Alternatively, if you want to write your own parser (which I doubt you should, for homework: it would be long and complicated to implement it properly/completely), see the specs for parsing HTML.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top