Apache Cassandra Packet Loss and Delay influencing choice of the Node to read data

https://stackoverflow.com/questions/23458610

15-07-2023
|

Вопрос

Hello everyone who is reading this. Are there any Cassandra gurus on Stackoverflow?)))

I have been researching in using Apache Cassandra for our company's monitoring system project. I wonder, how Cassandra would work in the following scenario:

3 Node cluster, the client asks some node for the data from some keyspace. Data replication is configured in such way, that the node has no data for the required keyspace (the other two have). Does the latency/packet loss influence on the selection of the Node, from which the data would be requested. If it does/doesn't, what is the algorithm of the selection of the closest Node?

I've made some experiments in packet loss simulation by using Wanem. The tests showed, that in case of packet loss, Cassandra always chose the node with the lowest packet loss. So, I would like to find out the reasons why the results showed such an interesting fact. Maybe it's because packet loss causes the Node to think, that the other Node is dead?

Tanks for any help.

Edit:

Smart people reminded me about Dynamic snitch existence. But available info about it has no information about packet loss at all.

Решение

When choosing who to query, the coordinator sorts available nodes by state and snitch preference, then queries as many nodes as required to satisfy consistency requirements. Rapid Read protection issues subsequent queries if no reply is received within the 99%(default) of currently measured latencies.

http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2

Лицензировано под: CC-BY-SA с атрибуция

Не связан с StackOverflow