Why am I reading many tombstones in Cassandra table although my access pattern should avoid them

https://stackoverflow.com//questions/24053860

21-12-2019
|

Question

I know this is not the best way to use Cassandra, but the type of my data requires reading all data from the last week. However when using Collection-types in CQL3, I ran into certain limitations which prevent me from doing normal date-range queries.

So I have set up Cassandra (currently single node, probably more in the future) with the following table

CREATE TABLE cache (tag text, id int, tags map<text,text>, 
  PRIMARY KEY (tag, id) );
ALTER TABLE cache WITH GC_GRACE_SECONDS = 0;

I am inserting with a TTL of one week to automatically remove the items from the Cache.

I tried to follow the suggestions mentioned in this article to avoid reading many tombstones by selecting by "minimum id", which I persist elsewhere to avoid reading old data:

SELECT * FROM cache WHERE tag = ? AND id >= ?

The id is basically some sort of timestamp which is constantly increasing, i.e. I only insert higher values over time and constantly remove older ids from the table.

But I still get warnings about thresholds being reached

WARN 08:59:06,286 Read 5001 live and 5702 tombstoned cells in cache (see tombstone_warn_threshold)

And if I do not run manual compaction/scrubbing regularly I get exceptions and queries fail.

However based on my understanding from the articles and documentation, I should be avoiding most if not all tombstones here as I query on equality for the tag, which allows Cassandra to only look for those areas and I use a minimum id which allows Cassandra to start reading only after most of the tombstones, so why are there still tombstone warnings/exceptions reported?

Solution

Map k/v pair is actually a column (name, value and timestamp): so, if you are issuing a lot of deletions of map elements (expiring by TTL is also the case) -- this is the source of this warning. Because you are still reading full maps (with lots of tombstones in them). Also, TTL setting on map is applied on per-element basis.

Second, this is multiplied by >= predicate in your select query.

If this is the case, you should remodel your data access pattern to use only EQ relations in SELECT query and bump id more often. Also, this access pattern will allow you to get rid of clustering part of your PRIMARY KEY.

So, if you do not issue lots of deletions on that map, you can try to use tag text, time timeuuid, name text, data text model and slice it precisely by time.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow