How can I query a Cassandra cluster for its metadata?

https://stackoverflow.com/questions/12946170

08-07-2021
|

Question

We have a process creatively named "bootstrap" that sets up our Cassandra clusters for a given rev of software in an environment (Dev1, Dev2, QA, ..., PROD). This bootstrap Creates/Updates keyspaces and column families as well as populating initial data in non-prod.

We are using Astyanax, but we could use Hector for bootstrapping.

Given that another team has decided that each environment will have its own datacenter names. And Given that I want this to work in prod when we go from two to more datacenters. And Given that we will be using PropertyFileSnitch:

How can I ask the Cassandra cluster for its layout? (Without shelling to nodetool ring)

Specifically, I need to know the names of the datacenters so I can Create or Update a keyspace with the correct settings for strategy options when using NetworkTopologyStrategy. We want 3 copies per datacenter. Some envs have one and several have two, eventually production will have more.

Is there CQL or a Thrift call that will give me info about the cluster layout?

I have looked though several TOCs in various doc sets, and googled a bit. I thought I would ask here before digging though the nodetool code.

Solution

I'm not sure how Hector or Astyanax expose this, but the basic Thrift method describeRing(keyspace) should give you what you're looking for. Part of the information that it contains are EndpointDetails structs that look like this:

endpoint_details=[EndpointDetails(datacenter='datacenter1', host='127.0.0.1', rack='rack1')]

Along with the rest of the results from that method, you should be able to figure out tokens, DCs, racks, and so on, for each node in the cluster.

Since you're using a Java client, you could also use some of the JMX methods (which nodetool uses) to describe more select parts of the cluster. For example, you might look at the snitch mbean ("org.apache.cassandra.db:type=EndpointSnitchInfo"), specifically the getDatacenter(ip) and getRack(ip) methods.

OTHER TIPS

Well, another option(indirect answer) is you can do what PlayOrm is doing and all the create CF's go through you and you save some of the data you want such that you can query your own data though this means the other team and you had better be going through the same middle man so all the information is there. Well, probably not want you want, but just an idea to get you thinking about other potential solutions.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow