Question

Cassandra nodetool has a command called cleanup:

cleanup [keyspace][cf_name]

Triggers the immediate cleanup of keys no longer belonging to this node. This has roughly the same effect on a node that a major compaction does in terms of a temporary increase in disk space usage and an increase in disk I/O. Optionally takes a list of column family names.

My questions are:

  1. When will a node having keys not belonging to it?
  2. When should I issue a cleanup?
  3. Should I do cleanup regularly (e.g. once per week)?
Was it helpful?

Solution

When will a node having keys not belonging to it?

When you have added new nodes to the cluster, decreased replication factor or moved tokens.

When should I issue a cleanup?

After one of the above operations, if you need to save disk space. There is no harm in delaying running it - there is a performance impact and the only reason to is to save disk space.

Should I do cleanup regularly (e.g. once per week)?

No, only if you need to save space after one of the above operations.

OTHER TIPS

When will a node having keys not belonging to it?

When you bootstrap a new node, some of the existing nodes will lose ownership of data by transferring the ownership to the new node.

Reducing replication factor also does this.

When should I issue a cleanup?

After operations mentioned below, but before you start any other topology / replication change.

You should run it on all affected nodes in the cluster. When in doubt, run on all nodes.

One reason to run it is to reclaim the disk space used to store no longer owned data.

Another reason is that failure to do so may cause data consistency problems. You may see resurrection of deleted data. Consider the case of node A losing ownership of key k after bootstrapping a new node, and holding a live row for key k. Later, key k is deleted but deletion does not propagate to node A (no longer a replica). Then the deletion expires in the whole cluster. Then you change the topology such that A is the owner of key k again. It will serve the old, deleted, row.

Source: https://docs.datastax.com/en/dse/6.7/dse-admin/datastax_enterprise/tools/nodetool/toolsCleanup.html

No need to run nodetool cleanup after nodetoool decommission, nodetool replace, or nodetool removenode.

Should I do cleanup regularly (e.g. once per week)?

No need to.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top