Question

In my application, I want to get all the rows in a column family, but to ignore the rows that are temporarily unavailable (e.g. some nodes are down).

I have multiple nodes. If one of the node is down, then get_range will throw UnavailableException, and I can get nothing.

What I want is to get all the rows that are currently available, because, to the user, its better than nothing. How can I do this?

  • I'm using pycassa.
  • The row keys in my column family are like random string, so I cannot use get to get all the rows one by one.
Was it helpful?

Solution

If get_range by token support is added to pycassa, you could fetch each token range (as reported by describe_ring) separately, discarding those that resulted in an UnavailableException. Barring that, using consistency level ONE is your best option, as Dean mentions.

OTHER TIPS

there should be a call to get that takes a List of rowkeys so you don't need to get them one by one. Also, if you have an index, that can help. for instance playORM has an index for each partition of a table(and you can have as many partitions as you want). With that, you can then iterate over each index and call get passing it a LIST of keys.

Also, make sure your consistency read is set to ONE as well ;).

later, Dean

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top