Domanda

After DB dropps an idle connection or DB is down and back up I'm receiving the following error in my webapp:

javax.persistence.PersistenceException: org.hibernate.exception.JDBCConnectionException: could not inspect JDBC autocommit mode
    at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1365)
    at org.hibernate.ejb.AbstractEntityManagerImpl.convert(AbstractEntityManagerImpl.java:1293)
    at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:265)    
... 60 more
Caused by: org.hibernate.exception.JDBCConnectionException: could not inspect JDBC autocommit mode
    at org.hibernate.exception.internal.SQLStateConversionDelegate.convert(SQLStateConversionDelegate.java:131)
    at org.hibernate.exception.internal.StandardSQLExceptionConverter.convert(StandardSQLExceptionConverter.java:49)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:125)
    at org.hibernate.engine.jdbc.spi.SqlExceptionHelper.convert(SqlExceptionHelper.java:110)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.isAutoCommit(LogicalConnectionImpl.java:395)
    at org.hibernate.engine.transaction.internal.TransactionCoordinatorImpl.afterNonTransactionalQuery(TransactionCoordinatorImpl.java:195)
    at org.hibernate.internal.SessionImpl.afterOperation(SessionImpl.java:565)
    at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1220)
    at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
    at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:256)
... 70 more
Caused by: org.postgresql.util.PSQLException: This connection has been closed.
    at org.postgresql.jdbc2.AbstractJdbc2Connection.checkClosed(AbstractJdbc2Connection.java:712)
    at org.postgresql.jdbc2.AbstractJdbc2Connection.getAutoCommit(AbstractJdbc2Connection.java:678)
    at sun.reflect.GeneratedMethodAccessor138.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:601)
    at org.apache.tomcat.jdbc.pool.ProxyConnection.invoke(ProxyConnection.java:126)
    at org.apache.tomcat.jdbc.pool.JdbcInterceptor.invoke(JdbcInterceptor.java:99)
    at org.apache.tomcat.jdbc.pool.DisposableConnectionFacade.invoke(DisposableConnectionFacade.java:63)
    at $Proxy66.getAutoCommit(Unknown Source)
    at org.hibernate.engine.jdbc.internal.LogicalConnectionImpl.isAutoCommit(LogicalConnectionImpl.java:392)

When this starts I get an

SQL Error: 0, SQLState: 08006 - An I/O error occured while sending to the backend.

but after that it's only:

SQL Error: 0, SQLState: 08003 - This connection has been closed.

The problem is: I have set testOnBorrow, so I'd expect to get only open connections.

If that helps: the pool ususaly contains a mix of good and bad connections, and the problem seems to clear up over time, but I had the server running for >12h and there were still bad connections returned. After a restart evertything is working fine (for a time).

I have debugged the problem some more and it seems like pool is returning bad connections, e.g. if after I have all connections killed on DB I get:

SQL Error: 0, SQLState: 57P01

and then the usual stuff - the killed connections are returned from the pool. The question is: is it an application problem?

I tried purging the pool via JMX but that doesn't seem to have any effect. Another weird thing is that even though app is apparently not doing anything (checked via thread dump) the JMX bean is showing 7 active connections and 0 idle connections. When I execute a request requiring DB access I get a response immediately (depite there being no idle connections available), but JMX shows 7 active and 0 idle connections after that.

PS. Maybe I'm missing something obvious and this is a connection management issue on my part? I'm using JPA EntityManager configured via persistence.xml, so maybe I'm doing something wrong and the connections are not closed(returned) properly after use?

È stato utile?

Soluzione

Actually I was right when I suspected an application error.

It's all nicely decribed in Issue 730: Automatically started UnitOfWork is never ended

When using the JpaPersistService, if you attempt to access an EntityManager outside of an active UnitOfWork, Guice will automatically start one for you. However, as Guice does not (and cannot) know when to end this UnitOfWork, it never does.

Result? The offending thread will be stuck with the same EntityManager throughout the life of the application. This is a bad state for the application to run in, and ours inevitably exhaust the available memory after a while and crash.

The real killer here is that it's not at all obvious when you've made this mistake. The only real tip-off is that you're getting inconsistent data from your database between different threads (due to the EMs first-level cache) or that the applications memory consumption keeps on growing. In my case it was the active connection in the pool that got me to suspect it, and then, when I turned on detailed logging I noticed that app was not borrowing the connection from pool at all, instead it was reusing the connection already held by unclosed EntityManager.

Actually there are quite some duplicates of this issue reported: http://code.google.com/p/google-guice/issues/list?can=1&q=UnitOfWork

Altri suggerimenti

A wild shot: The code here suggests, that the validationInterval is checked for testOnBorrow also.

Since you set this value from the 30sec default up to 5min this means that for 5 minutes after the DB drops the connection you still can get that stale connection. If your DB timeout is less than 5mins ... bad luck.

In order to test this theory you can set validationInterval to a ridiculous low value.

If that helps (reading: we have found the right knob) you should set it at least to a shorter time than the DB timeout. So when the DB decides to drop an idle connection the lower validationInterval will make sure that connection gets checked before the next borrow. Closed connections due to a DB server restart (i.e. without a timeout) will not be affected by this solution but at least the time returning to a sane state is lower too.

Note: I just asked Google for the code. I have no idea whether this is actual code or ancient code.

If you have modified your META-INF/context.xml to include the validationQuery, Tomcat is likely to be caching the old definition under conf/engine/host/webapp.xml. Shut-down Tomcat, remove that file, and restart Tomcat. It wouldn't hurt to remove the work directory while Tomcat is down as well.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top