Cassandra Datastax Driver - Connection Pool

Question 1

You are right, the connection is actually in the Session, and the Session is the object you should give to your DAOs to write into Cassandra.

As long as you use the same Session object, you should be reusing connections (you can see the Session as being your connection pool).

Edit (2017/4/10) : I precised this answer following @William Price one. Please be aware that this answer is 4 years old, and Cassandra have changed a fair bit in the meantime !

Question 2

The accepted answer _{_{(at the time of this writing)}} is giving the correct advice:

As long as you use the same Session object, you [will] be reusing connections.

However, some parts were originally oversimplified. I hope the following provides insight into the scope of each object type and their respective purposes.

Builder ≠ Cluster ≠ Session ≠ Connection ≠ Statement

A `Cluster.Builder` is used to configure and create a Cluster

A `Cluster` represents the entire Cassandra ring

A ring consists of multiple nodes (hosts), and the ring can support one or more keyspaces. You can query a Cluster object about cluster- (ring)-level properties.

I also think of it as the object that represents the calling application to the ring. You communicated your application's needs (e.g. encryption, compression, etc.) to the builder, but it is this object that first implements/communicates with the actual C* ring. If your application uses more than one authentication credential for different users/purposes, you likely have different Cluster objects even if they connect to the same ring.

A `Session` itself is not a connection, but it manages them

A session may need to talk to all nodes in the ring, which cannot be done with a single TCP connection except in the special case of rings that contain exactly one(1) node. The Session manages a connection pool, and that pool will generally have at least one connection for each node in the ring. This is why you should re-use Session objects as much as possible. An application does not directly manage or access connections.

A Session is accessed from the Cluster object; it is usually "bound" to a single keyspace at a time, which becomes the default keyspace for the statements executed from that session. A statement can use a fully-qualified table name (e.g. keyspacename.tablename) to access tables in other keyspaces, so it's not required to use multiple sessions to access data across keyspaces. Using multiple sessions to talk to the same ring increases the total number of TCP connections required.

A `Statement` executes within a Session

Statements can be prepared or not, and each one either mutates data or queries it (and in some cases, both). The fastest, most efficient statements need to communicate with at most one node, and a Session from a topology-aware Cluster should contact only that node (or one of its peers) on a single TCP connection. The least efficient statements must touch all replicas (a majority of nodes), but that will be handled by the coordinator node on the ring itself, so even for these statements the Session will only use a single connection from the application.

Also, versions 2 and 3 of the Cassandra binary protocol used by the driver use multiplexing on the connections. So while a single statement requires at least one TCP connection, that single connection can potentially service up to 128 or 32k+ asynchronous requests simultaneously, depending on the protocol version (respectively).

Question 3

Just an update for the community. You can set connection pool in the following way

private static Cluster cluster;

cluster.getConfiguration().getPoolingOptions().setMaxConnectionsPerHost(HostDistance.LOCAL,100);

Cassandra Datastax Driver - Connection Pool

Builder ≠ Cluster ≠ Session ≠ Connection ≠ Statement

A Cluster.Builder is used to configure and create a Cluster

A Cluster represents the entire Cassandra ring

A Session itself is not a connection, but it manages them

A Statement executes within a Session

A `Cluster.Builder` is used to configure and create a Cluster

A `Cluster` represents the entire Cassandra ring

A `Session` itself is not a connection, but it manages them

A `Statement` executes within a Session