What is the underlying issue with sporadic connectivity issues using sun.net.www.protocol.http.HttpURLConnection.getInputStream()?

StackOverflow https://stackoverflow.com/questions/22623188

  •  20-06-2023
  •  | 
  •  

Вопрос

My question is very similar to this, as yet unanswered, StackOverflow question about mysterious connectivity issues. Sometimes (only under certain conditions in certain environments, specifically when trying to hit one particular URL from AWS) http connectivity consistently fails with no apparent reason as to why.

Background:

I've been able to reproduce this in 2 AWS EC2 server environments (though I cannot reproduce it locally), but only when trying to hit one particular customer's web services URL (all other URLs running similar services work fine).

My Java version:

# java -version
java version "1.6.0_45"
Java(TM) SE Runtime Environment (build 1.6.0_45-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.45-b01, mixed mode)

The machine I'm trying to hit runs a RESTful web service (in Tomcat, probably fronted by Apache, on a Windows machine). I can curl the same endpoints my code is trying to hit from the very instance where my code is running and get a valid response in ~48-120ms. From code, I hit my configured 10 second timeout. Running netstat from both machines shows the following for my server (that I'm making the request FROM):

$ netstat -cowtune | grep <remote_ip>
tcp        0    389 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   ESTABLISHED 501        33146      on (0.08/2/0)
tcp        0    389 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   ESTABLISHED 501        33146      on (0.22/3/0)
tcp        0    389 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   ESTABLISHED 501        33146      on (1.50/4/0)
tcp        0    389 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   ESTABLISHED 501        33146      on (0.48/4/0)
tcp        0    389 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   ESTABLISHED 501        33146      on (4.07/5/0)
tcp        0    389 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   ESTABLISHED 501        33146      on (3.05/5/0)
tcp        0    389 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   ESTABLISHED 501        33146      on (2.03/5/0)
tcp        0    389 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   ESTABLISHED 501        33146      on (1.00/5/0)
tcp        0    389 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   ESTABLISHED 501        33146      on (18446744073.69/5/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (8.20/6/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (7.18/6/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (6.15/6/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (5.13/6/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (4.11/6/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (3.09/6/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (2.07/6/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (1.05/6/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (0.03/6/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (17.46/7/0)
tcp        0    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   FIN_WAIT1   0          0          on (16.44/7/0)

tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0              0          on (15.42/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (14.39/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (13.37/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (12.35/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (11.33/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (10.31/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (9.29/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (8.27/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (7.25/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (6.23/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (5.21/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (4.19/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (3.17/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (2.15/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (1.13/7/0)
tcp        1    390 ::ffff:10.91.184.202:40153  ::ffff:<remote_ip>:<port>   CLOSING     0          0          on (0.11/7/0)

and this from the remote server (that I'm making requests TO):

D:\Cygwin>netstat -ant 1 | grep 54.81.126.17
TCP    <ip_address>:<port>    54.81.126.17:40153    SYN_RECEIVED    InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    ESTABLISHED     InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    ESTABLISHED     InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    ESTABLISHED     InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    ESTABLISHED     InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    ESTABLISHED     InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    ESTABLISHED     InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    ESTABLISHED     InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    ESTABLISHED     InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    ESTABLISHED     InHost

TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost
TCP    <ip_address>:<port>    54.81.126.17:40153    FIN_WAIT_2      InHost

At my configured 10 second timeout, my server shows the transition from ESTABLISHED to FIN_WAIT_1. Some time later my server show the transition from FIN_WAIT_1 to CLOSING, and at the same moment the remote server transitions from ESTABLISHED to FIN_WAIT_2. The remote Tomcat never registers receiving the request. TShark shows:

0.000000 10.182.160.132 -> <remote_ip> TCP 74 49486 > http-alt [SYN] Seq=0 Win=14600 Len=0 MSS=1460 SACK_PERM=1 TSval=1814494 TSecr=0 WS=128
0.035580 <remote_ip> -> 10.182.160.132 TCP 70 http-alt > 49486 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1380 SACK_PERM=1 TSval=101011325 TSecr=1814494
0.035601 10.182.160.132 -> <remote_ip> TCP 66 49486 > http-alt [ACK] Seq=1 Ack=1 Win=14600 Len=0 TSval=1814503 TSecr=101011325
0.035935 10.182.160.132 -> <remote_ip> HTTP 457 POST /service/rest/security/myEndpoint HTTP/1.1
0.171137 10.182.160.132 -> <remote_ip> HTTP 457 [TCP Retransmission] POST /service/rest/security/myEndpoint HTTP/1.1
0.443125 10.182.160.132 -> <remote_ip> HTTP 457 [TCP Retransmission] POST /service/rest/security/myEndpoint HTTP/1.1
0.987118 10.182.160.132 -> <remote_ip> HTTP 457 [TCP Retransmission] POST /service/rest/security/myEndpoint HTTP/1.1
2.079144 10.182.160.132 -> <remote_ip> HTTP 457 [TCP Retransmission] POST /service/rest/security/myEndpoint HTTP/1.1
4.263141 10.182.160.132 -> <remote_ip> HTTP 457 [TCP Retransmission] POST /service/rest/security/myEndpoint HTTP/1.1
8.631153 10.182.160.132 -> <remote_ip> HTTP 457 [TCP Retransmission] POST /service/rest/security/myEndpoint HTTP/1.1
10.036939 10.182.160.132 -> <remote_ip> TCP 66 49486 > http-alt [FIN, ACK] Seq=392 Ack=1 Win=14600 Len=0 TSval=1817003 TSecr=101011325
10.072638 <remote_ip> -> 10.182.160.132 TCP 66 [TCP Window Update] http-alt > 49486 [ACK] Seq=1 Ack=1 Win=64296 Len=0 TSval=101012329 TSecr=1814503
17.351131 10.182.160.132 -> <remote_ip> HTTP 457 [TCP Retransmission] POST /service/rest/security/myEndpoint HTTP/1.1
20.584358 <remote_ip> -> 10.182.160.132 TCP 66 http-alt > 49486 [FIN, ACK] Seq=1 Ack=1 Win=64296 Len=0 TSval=101013380 TSecr=1814503
20.584421 10.182.160.132 -> <remote_ip> TCP 66 49486 > http-alt [ACK] Seq=393 Ack=2 Win=14600 Len=0 TSval=1819640 TSecr=1

My old code:

InputStream getResponseStream(String webServiceUrl) {
  URL server = new URL(webServiceUrl);
  HttpURLConnection connection = (HttpURLConnection) server.openConnection();
  connection.setDoInput(true);
  connection.setDoOutput(true);
  connection.setRequestMethod("GET");
  return connection.getInputStream(); // timeout happens here
}

My better code (this and below):

private Object getResponse(HttpURLConnection connection,
        SdRestResponseType respType) throws IOException, JAXBException,
        ProtocolException {
    InputStream is = null;
    try {
        // check if valid response
        int responseCode = connection.getResponseCode(); // timeout happens here
        if (responseCode == HttpURLConnection.HTTP_OK) {
            is = connection.getInputStream();
            switch (respType) {
            case BOOLEAN:
                return Boolean.valueOf(readInput(is));
            case STRING:
                return readInput(is);
            case XML:
                Unmarshaller unmarshaller = context.createUnmarshaller();
                return unmarshaller.unmarshal(is);
            default:
                return null;
            }
        }

        is = connection.getErrorStream();
        Unmarshaller unmarshaller = context.createUnmarshaller();
        Object response = unmarshaller.unmarshal(is);

        if (response instanceof Fault) {
            throw new SdFaultException((Fault) response);
        }
        throw new ProtocolException(connection.getResponseMessage());

    } finally {
        if (is != null) {
            is.close();
        }
    }
}

The bit of code that creates the HttpURLConnection object that executes the request:

private HttpURLConnection getConnection(String operation, boolean xmlContent)
        throws IOException {

    URL server = new URL(baseUrl + operation);
    HttpURLConnection connection = (HttpURLConnection) server
            .openConnection();
    connection.setDoInput(true);
    connection.setDoOutput(true);
    connection.setReadTimeout(10000);
    connection.setRequestMethod(POST); // the remote endpoint accepts this request as either a GET or POST just fine, except from this code
    connection.setRequestProperty(CONTENT_TYPE, (xmlContent ? XML_ENCODED
            : URL_ENCODED));
    // set header values
    connection.addRequestProperty(CLIENT_ID, header.getClientID());
    if (header.getLocale() != null) {
        connection.addRequestProperty(LOCALE, header.getLocale());
    }
    if (header.getSessionToken() != null) {
        connection.addRequestProperty(SESSION, header.getSessionToken());
    }
    if (this.passthrough != null) {
        connection.addRequestProperty(PASSTHRU, this.passthrough);
    }

    return connection;
}

My server (the FROM box) is running Linux, Apache, and my app in Tomcat. All DNS lookups have turned up nothing unexpected. Connectivity between the boxes seems normal in all other respects (I have not exhaustively gone through my iptables config). When I step through the code everything seems normal until execution disappears into the compiled code of sun.net.www.protocol.http.HttpURLConnection.getInputStream().

In GrepCode's OpenJDK source, line 710 shows an IOException getting swallowed, but since the Oracle version source is proprietary (and therefore not available anywhere I've found) I'm wondering if anyone knows (or could point me to) what might be happening under the hood, since I've not been able to completely rule out the possibility of issues in the server environment.

Thanks in advance for any insight!

Это было полезно?

Решение

Answering my own ?:

Never trust their IT staff.

After double/triple checking, it turns out that the remote server was fronted by an active Intrusion Detection System blocking all unknown IP addresses. Since AWS instances can change their IP when cycled, even if they white-listed a known IP, it would only work until my instance bounced. Lesson learned: be nauseatingly specific when asking "Could you be blocking us?"

Why they allowed a curl through is still a mystery until I get a reply email to update this answer with...

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top