Mysql connection pooling question: is it worth it?

https://stackoverflow.com/questions/405352

03-07-2019
|

Question

I recall hearing that the connection process in mysql was designed to be very fast compared to other RDBMSes, and that therefore using a library that provides connection pooling (SQLAlchemy) won't actually help you that much if you enable the connection pool.

Does anyone have any experience with this?

I'm leery of enabling it because of the possibility that if some code does something stateful to a db connection and (perhaps mistakenly) doesn't clean up after itself, that state which would normally get cleaned up upon closing the connection will instead get propagated to subsequent code that gets a recycled connection.

Solution

There's no need to worry about residual state on a connection when using SQLA's connection pool, unless your application is changing connectionwide options like transaction isolation levels (which generally is not the case). SQLA's connection pool issues a connection.rollback() on the connection when its checked back in, so that any transactional state or locks are cleared.

It is possible that MySQL's connection time is pretty fast, especially if you're connecting over unix sockets on the same machine. If you do use a connection pool, you also want to ensure that connections are recycled after some period of time as MySQL's client library will shut down connections that are idle for more than 8 hours automatically (in SQLAlchemy this is the pool_recycle option).

You can quickly do some benching of connection pool vs. non with a SQLA application by changing the pool implementation from the default of QueuePool to NullPool, which is a pool implementation that doesn't actually pool anything - it connects and disconnects for real when the proxied connection is acquired and later closed.

OTHER TIPS

Even if the connection part of MySQL itself is pretty slick, presumably there's still a network connection involved (whether that's loopback or physical). If you're making a lot of requests, that could get significantly expensive. It will depend (as is so often the case) on exactly what your application does, of course - if you're doing a lot of work per connection, then that will dominate and you won't gain a lot.

When in doubt, benchmark - but I would by-and-large trust that a connection pooling library (at least, a reputable one) should work properly and reset things appropriately.

Short answer: you need to benchmark it.

Long answer: it depends. MySQL is fast for connection setup, so avoiding that cost is not a good reason to go for connection pooling. Where you win there is if the queries run are few and fast because then you will see a win with pooling.

The other worry is how the application treats the SQL thread. If it does no SQL transactions, and makes no assumptions about the state of the thread, then pooling won't be a problem. OTOH, code that relies on the closing of the thread to discard temporary tables or to rollback transactions will have a lot of problems with pooling.

The connection pool speeds things up in that fact that you do not have create a java.sql.Connection object every time you do a database query. I use the Tomcat connection pool to a mysql database for web applications that do a lot of queries, during high user load there is noticeable speed improvement.

I made a simple RESTful service with Django and tested it with and without connection pooling. In my case, the difference was quite noticeable.

In a LAN, without it, response time was between 1 and 5 seconds. With it, less than 20 ms. Results may vary, but the configuration I'm using for the MySQL & Apache servers is pretty standard low-end.

If you're serving UI pages over the internet the extra time may not be noticeable to the user, but in my case it was unacceptable, so I opted for using the pool. Hope this helps you.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow