Question

Somewhere I have heard that using multi row selection in cassandra is bad because for each row selection it runs new query, so for example if i want to fetch 1000 rows at once it would be the same as running 1000 separate queries at once, is that true?

And if it is how bad would it be to keep selecting around 50 rows each time page is loaded if say i have 1000 page views in a single minute, would it severely slow cassandra down or not?

P.S I'm using PHPCassa for my project

Was it helpful?

Solution

Yes, running a query for 1000 rows is the same as running 1000 queries (if you use the recommended RandomPartitioner). However, I wouldn't be overly concerned by this. In Cassandra, querying for a row by its key is a very common, very fast operation.

As to your second question, it's difficult to tell ahead of time. Build it and test it. Note that Cassandra does use in memory caching so if you are querying the same rows then they will cache.

OTHER TIPS

We are using Playorm for Cassandra and there is a "findAll" pattern there which provides support to fetch all rows quickly. Visit https://github.com/deanhiller/playorm/wiki/Support-for-retrieving-many-entities-in-parallel for more details.

1) I have little bit debugged the Cassandra code base and as per my observation to query multiple rows at the same time cassandra has provided the multiget() functionality which is also inherited in phpcassa.

2) Multiget is optimized to to handle the batch request and it saves your network hop.(like for 1k rows there will be 1k round trips, so it definitely reduces the time for 999 round trips)

3) More about multiget() in phpcassa: php cassa multiget()

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top