How much control over static and dynamic column mixing do I have in Cassandra CQL 3 table definitions?

StackOverflow https://stackoverflow.com/questions/16049461

  •  04-04-2022
  •  | 
  •  

Question

I am trying to better understand what level I can actually mix static and dynamic columns in CQL3. I creating a table in CQL 3 to store user subscription data, but I am also testing out methods to store a tenant_id (with a secondary index) in the table as well so that I can quickly identify what users belong to what tenants (a tenant being a higher-level entity to which the user belongs).

My create table statement is as follows, followed by some dummy data inserts:

cqlsh:demodb> CREATE TABLE subscription_by_user (
              user_id text,
              tenant_id uuid,
              subscription_id int,
              type text,
              distribution int,
              PRIMARY KEY (user_id, subscription_id) );

cqlsh:demodb> CREATE INDEX subscription_ids ON subscription_templates_by_user (tenant_id);


cqlsh:demodb> INSERT INTO subscription_by_user (user_id, tenant_id, subscription_id, type, distribution) VALUES ('user1', f81d4fae-7dec-11d0-a765-00a0c91e6bf6, 2, 'MESSAGE', 4);

cqlsh:demodb> INSERT INTO subscription_by_user (user_id, tenant_id, subscription_id, type, distribution) VALUES ('user2', f81d4fae-7dec-11d0-a765-00a0c91e6bf6, 3, 'TOPIC', 5);

cqlsh:demodb> INSERT INTO subscription_by_user (user_id, tenant_id, subscription_id, type, distribution) VALUES ('user1', f81d4fae-7dec-11d0-a765-00a0c91e6bf6, 3, 'USER', 4);

This is all well and good and I get back what I expect to using the cql 3 API:

cqlsh:demodb> SELECT * FROM subscription_by_user  ;

 user_id | subscription_id | distribution | tenant_id                            | type
---------+-----------------+--------------+--------------------------------------+------------
   user2 |               3 |            5 | f81d4fae-7dec-11d0-a765-00a0c91e6bf6 | TOPIC
   user1 |               2 |            4 | f81d4fae-7dec-11d0-a765-00a0c91e6bf6 | BOARD
   user1 |               3 |            4 | f81d4fae-7dec-11d0-a765-00a0c91e6bf6 | USER

However, the underlying storage of the row results in Cassandra duplicating tenant_id for every new subscription:

[default@demodb] list subscription_templates_by_user;
Using default limit of 100
Using default column limit of 100
-------------------
RowKey: user2
=> (column=3:, value=, timestamp=1366150799244000)
=> (column=3:distribution, value=00000005, timestamp=1366150799244000)
=> (column=3:tenant_id, value=f81d4fae7dec11d0a76500a0c91e6bf6, timestamp=1366150799244000)
=> (column=3:type, value=746573742d7479706532, timestamp=1366150799244000)
-------------------
RowKey: user1
=> (column=2:, value=, timestamp=1366150764854000)
=> (column=2:distribution, value=00000004, timestamp=1366150764854000)
=> (column=2:tenant_id, value=f81d4fae7dec11d0a76500a0c91e6bf6, timestamp=1366150764854000)
=> (column=2:type, value=746573742d74797065, timestamp=1366150764854000)
=> (column=3:, value=, timestamp=1366151741325000)
=> (column=3:distribution, value=00000004, timestamp=1366151741325000)
=> (column=3:tenant_id, value=f81d4fae7dec11d0a76500a0c91e6bf6, timestamp=1366151741325000)
=> (column=3:type, value=746573742d74797065, timestamp=1366151741325000)

My question is: is there a way for me (using CQL 3) to structure the table in a way where subscription_id is only listed once per row? If not, is it possible to do this using the older cassandra-cli API? It seems like this would be a viable use case and is really a question of mixing static data with dynamic data (i.e., the subscription bits), but I could very well be wrong. I know that maps and the other CQL 3 collection types are one way of doing this, but the fact that I must retrieve the entire collection at once makes me a bit nervous and I would rather not serialize some complex object into the map value slot if I can help it.

Thanks!

Was it helpful?

Solution

If you don't need to perform range queries by subscription, or query "what are all the subscriptions for user X", then you can can make it part of the partition key:

CREATE TABLE subscription_by_user (
     user_id text,
     tenant_id uuid,
     subscription_id int,
     type text,
     distribution int,
     PRIMARY KEY ((user_id, subscription_id))
);

If you do need either of those, then suck it up and take the storage space penalty; most of it will be compressed away. This is true whether you are using CQL or Thrift.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top