Question

I'm writing an application that uses Hector to access a Cassandra database. I have some situations where I only need to query one column, and some where I need to query multiple columns at once. Writing one method that takes an array of column names and returns a list of columns using SliceQuery would be simplest in terms of code, but I'm wondering whether there's a significant drawback to using SliceQuery for one column compared to using ColumnQuery.

In short, are there enough (or any) performance benefits of using ColumnQuery over SliceQuery for one column to make it worth the extra code to deal with a one-column case separately?

Was it helpful?

Solution

By looking at Hector's code , the difference between using a ColumnQuery (ThriftColumnQuery.java) and a SliceQuery (ThriftSliceQuery.java) is the different thrift command being sent - "get" or "get_slice" (respectively).

I didn't find an exact documentation of how each of those operations are implemented by Cassandra's server, but I took a quick look in Cassandra's sources and after examining CassandraServer.java I got the impression that the "get" operation is there more for client's convenience than for better performance when querying a single column:

  • For a "get" request, a SliceByNamesReadCommand instance is created and executed.
  • For a "get_slice" request (assuming you're using Hector's setColumnNames method and not setRange), a SliceByNamesReadCommand instance is created for each of the wanted columns and then executed (the row is read only once though).

Bottom line, as far as I see it there's not much more than the (negligible) overhead of creating some collections meant for handling the multiple columns. If you're still worried however, I believe it shouldn't be too difficult to handle the two cases differently when wrapping the use of Hector in your DAOs.

Hope I managed to help.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top