pycassa - Remove multiple rows by their secondary index?

https://stackoverflow.com/questions/14002605

11-12-2021
|

Question

I have a column family with a secondary index 'pointer'. How do I remove multiple rows that have the same 'pointer' value (e.g. abc)?

The only option I know is:

expr = create_index_expression('pointer', 'abc')
clause = create_index_clause([expr])
for key, user in cassandra_cf.get_indexed_slices(clause):
    cassandra_cf.remove(key)

but I know this is very inefficient and can take long if I have thousands of rows with the same 'pointer' value. Are there any other options?

Solution

You can remove multiple rows at once:

expr = create_index_expression('pointer', 'abc')
clause = create_index_clause([expr])
with cassandra_cf.batch() as b:
    for key, user in cassandra_cf.get_indexed_slices(clause):
        b.remove(key)

This will group the removes into batches of 100 (by default). When the batch object is used as a context manager as it is here, it will automatically handle sending any remaining mutations once the with block is left.

You can read more about this in the pycassa.batch API docs.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow