Domanda

I want to make an efficient compound key to work WHERE queries with multiple conditions like:

SELECT * FROM playlists 

WHERE 
      album = 'We Must Obey'
      artist = 'Fu Manchu' AND
      title = 'Ojo Rojo'

ORDER BY song_order ASCENDING ALLOW FILTERING

For this query, does it make sense to make album, artist and title secondary indexes? Would making the 3 of them secondary indexes be redundant? Would a single secondary index (the most common in this case, the artist column) suffice?

enter image description here

È stato utile?

Soluzione

If most of your queries on albums and titles will come with a condition on artist, then I would say a single secondary index on artist would be sufficient since an artist is less likely to have more than a hundred albums. In this case, queries having an EQ on artist are very selective.

If many of your queries might solely on albums and titles without identifying an artist, then I would say building three secondary indices is necessary.

Altri suggerimenti

Secondary indexes in Cassandra work best on low-mid cardinality fields, and even then only in certain situations. They are not intended to allow you to query a table (column family) many different ways. The best way to go about this, is to model a specific table for this query. If we follow a hierarchy of these entities (artists write albums, albums have titles) then creating a new table with a composite primary key makes the most sense:

CREATE TABLE playlists_by_artist_album_title (
  id uuid,
  song_order bigint,
  album text,
  artist text,
  song_id uuid,
  title text,
  PRIMARY KEY (artist, album, title));

This will key all entries by artist (partitioning key), and also allow you to narrow your results with album and title. Note that with composite primary keys, you can query by one or more primary key fields in order from left-to-right. Note that this approach also does not sort by song_order.

Another way to go about this, would be if you knew for sure that you would always be querying by artist, album and title. If that is the case, then you could define all of them as a part of a composite partitioning key, like this:

PRIMARY KEY ((artist, album, title), song_order);

While this would require the presence of artist, album and title in all queries, this would be the fastest way to query this data. And song_order is a clustering column, so the order of songs on the album would be preserved. But again, it all depends on the query it will be serving.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top