I have a column family with a secondary index 'pointer'. How do I remove multiple rows that have the same 'pointer' value (e.g. abc)?

The only option I know is:

expr = create_index_expression('pointer', 'abc')
clause = create_index_clause([expr])
for key, user in cassandra_cf.get_indexed_slices(clause):
    cassandra_cf.remove(key)

but I know this is very inefficient and can take long if I have thousands of rows with the same 'pointer' value. Are there any other options?

有帮助吗?

解决方案

You can remove multiple rows at once:

expr = create_index_expression('pointer', 'abc')
clause = create_index_clause([expr])
with cassandra_cf.batch() as b:
    for key, user in cassandra_cf.get_indexed_slices(clause):
        b.remove(key)

This will group the removes into batches of 100 (by default). When the batch object is used as a context manager as it is here, it will automatically handle sending any remaining mutations once the with block is left.

You can read more about this in the pycassa.batch API docs.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top