Data Modeling and uuid on Cassandra

https://stackoverflow.com/questions/21846928

13-10-2022
|

Question

I am trying to build a movie database for educational purpose using Cassandra in the backend. The querying on the database will be principally made by movie title. So currently the data I have fits in the following model.

movie title | imdb rating | year of release | actors

Reading the CQL documentation I found the music playlist example where the following structure was used

CREATE TABLE playlists (
id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order ) );

The query I have is what is the necessity of using a separate id column. Can't the title column be used as a primary key? what are the advantages and disadvantages of not using a separate uuid field?

The command which I am designing for my model is

CREATE TABLE movies (
title text,
imdb_rating double,
year int,
actors text,
PRIMARY KEY (title, imdb_rating ) );

Here I believe in my model title is the PRIMARY KEY and the PARTITION KEY and imdb_rating is the CLUSTERING KEY(for arranging output in ascending order). Is there anything wrong in my model and how will it affect distribution of the data and why should I/should not use uuid? I am planning to keep a replication_factor of 2 because the number of nodes I am using is just 3.

Also according to the documentation

Do not use an index in these situations:
...... •On a frequently updated or deleted column

In my database the most updated column is imdb_rating so I am not building any secondary index on it.

Solution

Can't the title column be used as a primary key?

If the movie title is unique (which is not necessarily true) you could use title as primary key.

what are the advantages and disadvantages of not using a separate uuid field?

UUID is good if you need a unique id that is globally unique and you don't have to check for it's uniqueness. If you can find a set of columns that can be granted that their combination is unique you don't have to use UUID (assuming you don't need an id to refer to it). But it all depends on your query pattern. if you are going to look for a movie with it's id (probably coming from another table) use UUID as primary key. if you want to find movies with specific title then use title as primary key.

in your case since title is not unique, use a combination of title and UUID as composite key, given that you would search by title.

Here I believe in my model title is the PRIMARY KEY and the PARTITION KEY and imdb_rating is the CLUSTERING KEY(for arranging output in ascending order). Is there anything wrong in my model and how will it affect distribution of the data and why should I/should not use uuid?

in this case you have to use the rating and a UUID for primary key, but when you query you need to allow filtering.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow