Question

I have recently started using Cassandra in our Production environment. We have a 24 node cluster with replication factor of 4. Meaning 2 copies will be there in each datacenter. So that means we have a single cross colo cluster with 24 nodes which means 12 nodes in SLC colo and 12 nodes in PHX colo.

I am using Astyanax client to write the data in Cassandra database. Now I am trying to figure out is there any way Astyanax client will be able to figure out all the nodes in the PHX colo or SLC colo and not all of the nodes?

In my setSeeds method I will be passing the nodes related to only one datacenter. It will be either SLC or PHX. So if you take a look into my below code, I have specified 2 nodes in my setSeeds method and all those nodes belongs to PHX colo. Now I am trying to have auto discovery mode enabled but only for that particular colo. So in my case it should be able to detect 12 nodes for PHX colo not all the 24 nodes.

Below is my code in which I am using ConnectionPoolType as TOKEN_AWARE which by default uses NodeDiscoveryType as RING_DESCRIBE and that will show me all the 24 nodes corresponding to both the colos/datacenter and that's what I don't want. I need to have all the nodes corresponding to each colo/datacenter

Any idea how can I achieve this scenario using Astyanax client? Is this possible?

private CassandraAstyanaxConnection() {

    context = new AstyanaxContext.Builder()
    .forCluster(ModelConstants.CLUSTER)
    .forKeyspace(ModelConstants.KEYSPACE)
    .withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(40)
        .setSeeds("cdb03.vip.phx.host.com:9160,cdb04.vip.phx.host.com:9160")
    )
    .withAstyanaxConfiguration(new AstyanaxConfigurationImpl()      
        .setCqlVersion("3.0.0")
        .setTargetCassandraVersion("1.2")
        .setConnectionPoolType(ConnectionPoolType.TOKEN_AWARE))
    .withConnectionPoolMonitor(new CountingConnectionPoolMonitor())
    .buildKeyspace(ThriftFamilyFactory.getInstance());

    context.start();
    keyspace = context.getEntity();

    emp_cf = ColumnFamily.newColumnFamily(
        ModelConstants.COLUMN_FAMILY, 
        StringSerializer.get(), 
        StringSerializer.get());
}

In short- does astynanx support autodiscovery for local nodes only?

Why I am asking this is because of RING_DESCRIBE, it will give me all the nodes from both the colo. So suppose if I am running my program from PHX colo then it might possibily go to SLC colo to get the data and because of the reason I am seeing very bad write performance as the ping time between PHX and SLC is 15-20 ms.

Was it helpful?

Solution

With NodeDiscoveryType RING_DESCRIBE (or TOKEN_AWARE with no HostSupplier) Astyanax will discover all nodes, but you should also use setLocalDatacenter. When setting up your AstyanaxContext using ConnectionPoolConfigurationImpl use setLocalDatacenter with the desired DC. That will ensure that hosts from the other dc's are not in the connection pool and that your requests are local.

.withConnectionPoolConfiguration(new ConnectionPoolConfigurationImpl("MyConnectionPool")
        .setPort(9160)
        .setMaxConnsPerHost(40)
        .setLocalDatacenter("DC1")
        .setSeeds("127.0.0.1:9160")
    )

Again, my understanding is that NodeDiscoveryType of TOKEN_AWARE (with no HostSupplier set) or RING_DESCRIBE will both result in the RingDescribeHostSupplier being used within Astyanax. So, Astyanax will "know" about all of the nodes, but the connection pool will be limited (via setLocalDatacenter) to the DC specified.

OTHER TIPS

Try with NodeDiscoveryType.TOKEN_AWARE, as from the astyanax documentation it can be said that it was desgined for multi-region ring describe problem where ring describe return nodes from other regions or datacenters

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top