Question

I am currently running a program that will download the source code from a website using Apache HttpComponents. I will be downloading a lot (10,000s) and so am using multiple threads to do this.

Sometimes all threads die (join) and sometimes they don't. Through debugging I have determined that the line

CloseableHttpResponse response = httpClient.execute(httpget,context);

is the problem. Does anybody know how I can set a timeout for the this line, or why this line is blocking thread execution?

Was it helpful?

Solution

There can be various reasons for threads getting stuck in an i/o operation, incorrect timeout settings being the most likely cause. One can set desired timeout values using RequestConfig class. However if all threads get blocked at once inside #execute method connection leak (connection pool depletion) would be more likely. Make sure that you always close CloseableHttpResponse instances even if do not care about the response or its content. You can find out more details about request execution by turning on wire / context logging as described in the logging guide

OTHER TIPS

I use the following timeout settings in HttpConnectionParams in my code (HttpParams are given to the HttpClient constructor):

org.apache.http.params.HttpConnectionParams.setConnectionTimeout(HttpParams, int)
org.apache.http.params.HttpConnectionParams.setSoTimeout(HttpParams, int)

A problem which I discovered when connecting to the same host with multiple threads, that blocking/timeouts occur when the maxPerRoute setting is lower than the number of threads. Have a look at PoolingClientConnectionManager:

org.apache.http.impl.conn.PoolingClientConnectionManager.setDefaultMaxPerRoute(int)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top