Question

I am experiencing an odd problem every now and then (too often actually).

I am running a server application, which is binding a socket for itself.

But once in a while, the socket is not released. The process dies, although Eclipse reports that Terminate failed, however it disappears properly from 'ps' and JConsole/JVisualVM. 'lsof' also displays nothing for the port anymore. But still, I get this error when I try to start the server again to the same port:

Caused by: java.net.BindException: Address already in use
    at sun.nio.ch.Net.bind(Native Method)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)

The problem is worst in my unit tests, which never run fully, because this will for sure occur after one of the tests (which all recreate the server).

I am running MacOSX 10.7.3

Java(TM) SE Runtime Environment (build 1.6.0_31-b04-415-11M3635) Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01-415, mixed mode)

I have also Parallels, and often the problem looks like it's caused by the Parallels network adapter, but I am not sure if it has anything to do with this problem after all (I have contacted their support without any help so far).

The only thing that helps to resolve the situation is to reboot OSX.

Any ideas?

--

This is the relevant code to open the socket:

channel = (ServerSocketChannel) ServerSocketChannel.open().configureBlocking(false);
 channel.socket().bind( addr, 0 );

and it is closed by

  channel.close();

But I assume that the process gets stuck here and then Eclipse kills it.

--

netstat -an (for port 6007):

tcp4      73      0  127.0.0.1.6007         127.0.0.1.51549        ESTABLISHED
tcp4       0      0  127.0.0.1.51549        127.0.0.1.6007         ESTABLISHED
tcp4      73      0  127.0.0.1.6007         127.0.0.1.51544        CLOSE_WAIT 
tcp4       0      0  127.0.0.1.6007         127.0.0.1.51543        CLOSE_WAIT 
tcp4       0      0  10.37.129.2.6007       *.*                    LISTEN     
tcp4       0      0  10.211.55.2.6007       *.*                    LISTEN     
tcp4       0      0  127.0.0.1.6007         *.*                    LISTEN     
tcp4       0      0  10.50.100.236.6007     *.*                    LISTEN     

--

And now I get this exception after the socket is opened for every test (netstat output from this situation):

Caused by: java.net.SocketTimeoutException: Read timed out
    at java.net.SocketInputStream.socketRead0(Native Method)
    at java.net.SocketInputStream.read(SocketInputStream.java:129)
    at java.net.SocketInputStream.read(SocketInputStream.java:182)

--

Stopping the process from eclipse I got "Terminate failed", but lsof -i TCP:6007 is displaying nothing and the process is no longer found by 'ps'. netstat output did not change...

Can I somehow kill the socket without rebooting (that would help a litte bit already)?

--

UPDATE 5.5.12:

I ran the tests now in Eclipse debugger. This time the tests got stuck after 18 methods. I stopped the main thread after it was stuck around 15 minutes. This is the stack:

Thread [main] (Suspended)   
    FileDispatcher.preClose0(FileDescriptor) line: not available [native method]    
    SocketDispatcher.preClose(FileDescriptor) line: 41  
    ServerSocketChannelImpl.implCloseSelectableChannel() line: 208 [local variables unavailable]    
    ServerSocketChannelImpl(AbstractSelectableChannel).implCloseChannel() line: 201 
    ServerSocketChannelImpl(AbstractInterruptibleChannel).close() line: 97  
...

--

Hmm, it looks like the process is not killed, after all - and does not die to kill -9 either (I noticed that process 712 and probably also 710 are the TestNG processes):

$ kill -9 712
$ ps xa | grep java
  700   ??  ?E     0:00.00 (java)
  712   ??  ?E     0:00.00 (java)
  797 s005  S+     0:00.00 grep java

-- Edit: 10.5.12:

?E in the ps output above means that the process is exiting. I could not find any means to kill such a process fully without rebooting. The same issue has been noticed with some other applications. No solutions found:

http://www.google.com/search?q=ps+process+is+exiting+osx

Was it helpful?

Solution 3

So it seems that the problem lies in the implementation of Selector in the Mac version of JDK 6. Installing the new Oracle JDK 7u4 fixes the issue, independent of how the Selector is used.

OTHER TIPS

try closing the socket with http://docs.oracle.com/javase/1.4.2/docs/api/java/net/ServerSocket.html#close() after each test, in the teardown, if you're not already.

Just a shot in the dark here, but make sure that any thread that is waiting on Selector.select() has been woken up, and has exited.

I have also Parallels, and often the problem looks like it's caused by the Parallels network adapter....

I'd say that's a fair bet if this problem is not cropping up on other platforms. What have you done to exclude Parallels as the culprit?

if you think that the resources are not properly released, you can try to do the release in a shutdownhook. like this at least when its shut down the resouces will be released (not though if you hard kill)

an example for a very basic shutdownhook:

public void shutDownProceedure(){
    Runtime.getRuntime().addShutdownHook(new Thread() {
        public void run() {
            /* my shutdown code here */
        }
    });
}

This helped me release resources that somehow weren't entirely released before. I don't know if this works for sockets as well, i think it should.

It also allowed me to see loggings i haven't seen before

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top