Reading and terminating stream in HttpClient 4
-
18-09-2019 - |
Question
I'm reading large documents from which I only need top 5%, can I do the following with HttpClient 4?
- Request the page (get or post)
- Read response as a stream
- Feed it into SAX-based HTML parser "on the fly"
- When certain HTML tag is detected - terminate the stream
Please note that HttpClient v. 4 is required - I cannot use v. 3
Solution
Thanks to Ken from HttpClient mail list here's the answer
Use the HttpEntity#getContent() method, which returns an
java.io.InputStream, and pass that to your SAX-based HTML parser.http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e122
When you see the tag you need, terminate the request via invoking the HttpUriRequest#abort() method.
http://hc.apache.org/httpcomponents-client/tutorial/html/fundamentals.html#d4e285
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow