I am trying to learn how to parse HTML, but as I don't have a lot of experience in either Java or Android, it's a little complicated. I have read the IBM XML parsing tutorial and have learned to parse an RSS feed. My problem is: I would like to get data from an HTML site. I have read some information on HTML cleaner, JSON, etc., but I can't find a good tutorial to help me. Do you have any tutorials that might be helpful?

Thanks.

有帮助吗?

解决方案

Check out the following HTML parsers. There are more out there. Maybe one will work for you:

其他提示

IMO there are two easy ways to parse HTML:

  • Convert the HML to XML (XHTML) using a library (e.g. HTMLTidy) and then use an XML parser
  • Use an existing HTML parser (e.g. a standard Web browser like WebKit, ForeFox, and/or IE) and then read the "DOM" which is a more-or-less-API-friendly representation of the parsed HTML

Alternatively, if you want to write your own parser (which I doubt you should, for homework: it would be long and complicated to implement it properly/completely), see the specs for parsing HTML.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top