Predictable Cassandra row deletion

https://stackoverflow.com/questions/17675035

03-06-2022
|

Question

We have a write-heavy workflow on a 1.2.5 Cassandra cluster. As disk space is limited, we must delete older data occasionally. This deletion starts when the amount of free disk space drops to a certain level. We have learnt the role of tombstones, i.e. they are removed when gc_grace timeout expires and a minor compaction is in progress. So we have set up a "patience delay" and when it expires we can check the free space on disk again.

But we require a more predictable deletion scheme as we cannot rely on "minor compaction maybe will run some day". That doesn't seem too specific, so we don't know when we should check the free space on disk again. Maybe you can offer some ideas.

Solution

This might be a good use case for leveled compaction - if your insert rate stays constant, the time taken to remove expired tombstones will be roughly constant.

Alternatively, with size tiered compaction (the default), you can run a full compaction with nodetool compact. This will delete all tombstones older than gc_grace.

However, this rewrites all your data into one large SSTable so takes time proportional to your total data size. You will also need to use less than half your disk space to be able to complete subsequent compactions.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow