문제

The problem is really that specific.

I need a library in java that can take HTML content and generate text in the same format that is generated by the Linux lynx program.


I need to expose data provided by 3rd party servers to end users on Android. Data format is ancient, in badly formatted HTML, so much that I've tried reading it using java and it fails occasionally (unacceptable). It is also growing every month (preinstall ruled out) and I can't convince them to change to "modern" stuff (life would be great in XML etc.).

Shortest route: I wrote a class to use the W3 html2txt service online (google search it). It worked fine on the app until I got complains and noticed that the W3 service fails occasionally. It's not that big of a deal, but the black box logic expects the output to be in this "lynx like" text format.

So I would like a library to do the conversion (HTML->TXT) in "lynx style" inside the app and avoid the outages in the W3 service. And besides, the lynx output the probably the best I've seen, the most organized and neat.

Are you guys aware of any?

도움이 되었습니까?

해결책 2

After a year, I give up. Answer is: no way to handle that, no library in Java. At least for now.

I'm closing this. Thank you for your attention.

다른 팁

not sure what you mean by lynx style so I might be completely off by submitting this (if so please excuse me).

I used some piece of code a while back to check HTML/XML files (at the time I was just priting it out in the logs

InputStream in = context.getResources().openRawResource(id); StringBuffer inLine = new StringBuffer(); InputStreamReader isr = new InputStreamReader(in); BufferedReader inRd = new BufferedReader(isr);

String text; while ((text = inRd.readLine()) != null) { inLine.append(text); inLine.append("\n"); } in.close(); return inLine.toString();

I hope it helps but I got the feeling you need something more complex :P

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top