In my application, I want to get all the rows in a column family, but to ignore the rows that are temporarily unavailable (e.g. some nodes are down).

I have multiple nodes. If one of the node is down, then get_range will throw UnavailableException, and I can get nothing.

What I want is to get all the rows that are currently available, because, to the user, its better than nothing. How can I do this?

  • I'm using pycassa.
  • The row keys in my column family are like random string, so I cannot use get to get all the rows one by one.
有帮助吗?

解决方案

If get_range by token support is added to pycassa, you could fetch each token range (as reported by describe_ring) separately, discarding those that resulted in an UnavailableException. Barring that, using consistency level ONE is your best option, as Dean mentions.

其他提示

there should be a call to get that takes a List of rowkeys so you don't need to get them one by one. Also, if you have an index, that can help. for instance playORM has an index for each partition of a table(and you can have as many partitions as you want). With that, you can then iterate over each index and call get passing it a LIST of keys.

Also, make sure your consistency read is set to ONE as well ;).

later, Dean

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top