Вопрос

In java I am trying to read a webpage. I want to print only the data of the page. But my code is printing whole html code. It looks weird. I can see the exact data I want it is hiding in the html. How can I get rid of printing the html code? here is my code:

URL url = new URL("http://www.rxbd.info/Controller/Controller?action=details&drug=zorubicin&group=generic");
URLConnection con = url.openConnection();
InputStream is =con.getInputStream();
BufferedReader br = new BufferedReader(new InputStreamReader(is));
String line = null;
while ((line = br.readLine()) != null ) {
    System.out.println(line);
}
Это было полезно?

Решение

Have a look at Jericho. The Renderer class can render the original HTML to text, The TextExtractor class can just extract the text.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top