Question

I am using a Jsoup API to parse a part of HTML using Jsoup.parse() method. However, while parsing it adds the document structure to the HTML content.

For Example:

<p><a href="some link">some link data</a> Some paragraph content</p>

Becomes

<html>
<head></head>
<body>
<p><a href="some link">some link data</a> Some paragraph content</p>
</body>
</html>

I don't want the document structure after parsing (I don't want the html, head, body tags). Is there any way to do it? Thanks in advance.

SOLUTION

I have use the body() and html() method of the Document

Document storyBodyDoc;
storyBodyDoc = Jsoup.parse(body);
storyBodyDoc.body().html()

Thanks for the suggestion.

Was it helpful?

Solution

You could select the children of the body-element:

Document doc = Jsoup.parse("<p><a href=\"some link\">some link data</a> Some paragraph content</p>");
Elements content = doc.body().children();
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top