Question

How Connection Pool/distribution are across Vertica cluster ?

I am trying to understand how connections are handeled in Vertica! Like Oracle handles it's connections thou it's listener or how the connections are balanced inside the cluster (for better distribution).

Was it helpful?

Solution

Vertica's process of handling a connection is basically as follows:

  • A node receives the connection, making it the Initiator Node.
  • The initiator node generates the query execution plan and distributes it to the other nodes.
  • The nodes fill in any node specific details of the execution plan
  • The nodes execute the query
  • (ignoring some stuff here)*
  • The nodes send the result set back to the initiator node
  • The initiator node collects the data and does final aggregations
  • The initiator node sends the data back to the client.

The recommended way to connect through Vertica is through a load balancer so no single node becomes a failure point. Vertica itself does not distribute connections between nodes, it distributes the query to the other nodes.

I'm not well versed in Oracle or the details of how systems do their data connection process; so hopefully I'm not too far off the mark of what you're looking for.

From /my/ experience, each node can handle a number of connections. Once you try to connect more than that to a node, it will reject the connection. That was experienced from a map-reduce job that connected in the map function.

*Depending on the query/data/partitioning it may need to do some data transfer behind the scene to complete the query for each node. It slows the query down when this happens.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top