Question

I have a java process that is hanging in a call to IOUtils.toString with the following code:

String html = "";
try {
    html = IOUtils.toString(someUrl.openStream(), "utf-8"); // process hangs on this line
} catch (Exception e) {
    return null;
}

It can't reproduce this reliably. It's part of a web crawler and so executes this line thousands of times successfully but ultimately causes the process to hang here after a few days.

Output from jstack:

2013-09-25 09:09:36
Full thread dump OpenJDK 64-Bit Server VM (20.0-b12 mixed mode):

"Attach Listener" daemon prio=10 tid=0x00007f2b1c001000 nid=0x225a waiting on condition [0x0000000000000000]
   java.lang.Thread.State: RUNNABLE

"Thread-0" prio=10 tid=0x00007f2b34122000 nid=0x187b runnable [0x00007f2b30970000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:146)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
        - locked <0x00000000e3d2d160> (a java.io.BufferedInputStream)
        at sun.net.www.http.ChunkedInputStream.readAheadBlocking(ChunkedInputStream.java:552)
        at sun.net.www.http.ChunkedInputStream.readAhead(ChunkedInputStream.java:609)
        at sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:696)
        - locked <0x00000000e3d30558> (a sun.net.www.http.ChunkedInputStream)
        at java.io.FilterInputStream.read(FilterInputStream.java:133)
        at sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConnection.java:2582)
        at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:282)
        at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:324)
        at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:176)
        - locked <0x00000000e3d317d0> (a java.io.InputStreamReader)
        at java.io.InputStreamReader.read(InputStreamReader.java:184)
        at java.io.Reader.read(Reader.java:140)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1364)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1340)
        at org.apache.commons.io.IOUtils.copy(IOUtils.java:1315)
        at org.apache.commons.io.IOUtils.toString(IOUtils.java:525)

I can't see any way to set a timeout on the toString method. Any suggestions? Is this a bug in Apache commons? Or in my OpenJDK perhaps?

Was it helpful?

Solution 2

I've decided to try simply using guava IO instead since it was already in my classpath anyway:

String html = "";
try {
    InputSupplier<? extends InputStream> supplier = Resources
            .newInputStreamSupplier(metaUrl);
    html = CharStreams.toString(CharStreams.newReaderSupplier(supplier,
            Charsets.UTF_8));
} catch (Exception e) {
    return null;
}

It generally takes a few days to crash so if I don't update this answer in a few days, assume this worked!

Update : 7 days so far without hanging... :)

OTHER TIPS

Your call to toString() is ultimately forwarded to copyLarge(). Here you can see that reading from the stream is continued until an end of file (EOF) marker is detected by InputStream.read(). According to this post read() can read 0 bytes, i.e., if the URLConnection your reading from does not return an EOF marker the method keeps probably reading 0 bytes forever.

Maybe you can track down which URL causes the problem?

Anyways, to realize a timeout you could start each reading in a separate thread and kill that thread after a certain time elapsed.

I had the same problem. Maybe it gets solved by using guava but in my opinion the root of the problem is that the socket has no soTimeout configured.

try

socket.setSoTimeout(10000)

to throw a SocketTimeoutException when there is no EOF coming after 10 seconds.

Java native method:

InputStream in = new URL(url).openStream();

Guava method:

InputSupplier supplier = Resources.newInputStreamSupplier(new URL(url)); InputStream in = supplier.getInput();

Both of them will throw Connection timed out Exception. Because guave is also use URL.openStream()

But some site is so slow that I can read a little data from it every time, and so many many times still not reach end. And I also see it is hang there by Jstack.

Like this(maybe only slow at my host): a txt file address

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top