What is the ordering for Cassandra UTF8Type keys? (Cassandra 2.0)

https://stackoverflow.com/questions/21767641

cassandra-2.0

11-10-2022
|

Question

What is the ordering for Cassandra UTF8Type?

All the documentation led me to expect a lexographical sort order (essentially, alphabetical order). That doesn't appear to be the order Cassandra uses. What it is using is hard for me to guess.

I built a table to count interactions affecting named "applications", organized in time-buckets of one day. (This is a simple example to demonstrate the cause of my confusion). I want to be able to look for a particular application The CQL description of the table is as follows:

CREATE TABLE "appMetrics" (app text,time timestamp,counter_val counter,
    PRIMARY KEY (app, time)) WITH COMPACT STORAGE;

I load it with data:

update "appMetrics" set counter_val = counter_val+1 WHERE app='ab' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='a' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='c' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='b' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='bc' AND time='2014-02-14 00:00:00';
update "appMetrics" set counter_val = counter_val+1 WHERE app='ca' AND time='2014-02-14 00:00:00';

I select from the table and see this result:

    select * from "appMetrics";

     app | time                     | counter_val
    -----+--------------------------+-------------
       a | 2014-02-14 00:00:00-0500 |           1
       c | 2014-02-14 00:00:00-0500 |           1
      ab | 2014-02-14 00:00:00-0500 |           1
      ca | 2014-02-14 00:00:00-0500 |           1
      bc | 2014-02-14 00:00:00-0500 |           1
       b | 2014-02-14 00:00:00-0500 |           1

    (6 rows)

So, this order is not alphabetic, not order of entry, not any order I can see. The ordering isn't random, or at least it's repeatable:

cqlsh:simplex> select * from "appMetrics" where token(app) >= token('ab');

 app | time                     | counter_val
-----+--------------------------+-------------
  ab | 2014-02-14 00:00:00-0500 |           1
  ca | 2014-02-14 00:00:00-0500 |           1
  bc | 2014-02-14 00:00:00-0500 |           1
   b | 2014-02-14 00:00:00-0500 |           1

(4 rows)

cqlsh:simplex> select * from "appMetrics" where token(app) <= token('ab');

 app | time                     | counter_val
-----+--------------------------+-------------
   a | 2014-02-14 00:00:00-0500 |           1
   c | 2014-02-14 00:00:00-0500 |           1
  ab | 2014-02-14 00:00:00-0500 |           1

(3 rows)

For what it's worth, the column family is described as:

    ColumnFamily: appMetrics
      Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
      Default column value validator: org.apache.cassandra.db.marshal.CounterColumnType
      Cells sorted by: org.apache.cassandra.db.marshal.TimestampType
      GC grace seconds: 864000
      Compaction min/max thresholds: 4/32
      Read repair chance: 0.1
      DC Local Read repair chance: 0.0
      Populate IO Cache on flush: false
      Replicate on write: true
      Caching: KEYS_ONLY
      Default time to live: 0
      Bloom Filter FP chance: 0.01
      Index interval: 128
      Speculative Retry: 99.0PERCENTILE
      Built indexes: []
      Compaction Strategy: org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy
      Compression Options:
        sstable_compression: org.apache.cassandra.io.compress.LZ4Compressor

Can someone explain how these are ordered?

Solution

Ok, I think I know the answer to this question now. Because the key (partition key) is a tokenized representation of the key, the answer is that the rows (partitions) are stored in the order of the tokens.

As a demonstration, for the same table shown above, I requested the token values for the keys, and got this.

cqlsh:simplex> select token(app), app from "appMetrics";

 token(app)           | app
----------------------+-----
 -8839064797231613815 |   a
 -8198557465434950441 |   c
 -7815133031266706642 |  ab
  -633243080167210587 |  ca
  4832945267908438539 |  bc
  8833996863197925870 |   b

(6 rows)

Further info: this is because I've used the default Murmur3Partitioner. I could get things in alphabetic order (I think) by using the ByteOrderPartitioner. Unfortunately, that is set at the cluster level, so it controls the whole cluster. Using the ByteOrderPartitioner is not recommended by Datastax (http://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAntiPatterns_c.html).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow