The accepted answer (at the time of this writing) is giving the correct advice:
As long as you use the same Session object, you [will] be reusing connections.
However, some parts were originally oversimplified. I hope the following provides insight into the scope of each object type and their respective purposes.
Builder ≠ Cluster ≠ Session ≠ Connection ≠ Statement
A Cluster.Builder
is used to configure and create a Cluster
A Cluster
represents the entire Cassandra ring
A ring consists of multiple nodes (hosts), and the ring can support one or more keyspaces. You can query a Cluster object about cluster- (ring)-level properties.
I also think of it as the object that represents the calling application to the ring. You communicated your application's needs (e.g. encryption, compression, etc.) to the builder, but it is this object that first implements/communicates with the actual C* ring. If your application uses more than one authentication credential for different users/purposes, you likely have different Cluster objects even if they connect to the same ring.
A Session
itself is not a connection, but it manages them
A session may need to talk to all nodes in the ring, which cannot be done with a single TCP connection except in the special case of rings that contain exactly one(1) node. The Session manages a connection pool, and that pool will generally have at least one connection for each node in the ring.
This is why you should re-use Session objects as much as possible. An application does not directly manage or access connections.
A Session is accessed from the Cluster object; it is usually "bound" to a single keyspace at a time, which becomes the default keyspace for the statements executed from that session. A statement can use a fully-qualified table name (e.g. keyspacename.tablename
) to access tables in other keyspaces, so it's not required to use multiple sessions to access data across keyspaces. Using multiple sessions to talk to the same ring increases the total number of TCP connections required.
A Statement
executes within a Session
Statements can be prepared or not, and each one either mutates data or queries it (and in some cases, both). The fastest, most efficient statements need to communicate with at most one node, and a Session from a topology-aware Cluster should contact only that node (or one of its peers) on a single TCP connection. The least efficient statements must touch all replicas (a majority of nodes), but that will be handled by the coordinator node on the ring itself, so even for these statements the Session will only use a single connection from the application.
Also, versions 2 and 3 of the Cassandra binary protocol used by the driver use multiplexing on the connections. So while a single statement requires at least one TCP connection, that single connection can potentially service up to 128 or 32k+ asynchronous requests simultaneously, depending on the protocol version (respectively).