Question

I have a Java String with SGML, something like this...

<misspell></misspell><plain>I</plain> <plain>know</plain> <plain>you</plain> <suggestion>ducky</suggestion> <plain>suck</plain> <plain>and</plain> <plain>I</plain> <plain>rocky</plain> <plain>rock</plain>

How do I parse it to get for instance say the text inside <suggestion> </suggestion>so as to get "ducky" out??

Will javax.swing.text.html.parser.Parse can be of any help? or I can only parse HTML docs with it?

Was it helpful?

Solution

The string you show is not HTML, but it could be parsed by an XML parser.

The SAX API is part of the JDK and AFAIK most XML parsers implement it.

OTHER TIPS

try an html parser, they are (by necessity) quite forgiving of malformed markup and html is by nature based on SGML.

e.g. http://htmlparser.sourceforge.net/

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top