Cassandra: secondary indexes for queries with multiple WHERE clauses

Question 1

If most of your queries on albums and titles will come with a condition on artist, then I would say a single secondary index on artist would be sufficient since an artist is less likely to have more than a hundred albums. In this case, queries having an EQ on artist are very selective.

If many of your queries might solely on albums and titles without identifying an artist, then I would say building three secondary indices is necessary.

Question 2

Secondary indexes in Cassandra work best on low-mid cardinality fields, and even then only in certain situations. They are not intended to allow you to query a table (column family) many different ways. The best way to go about this, is to model a specific table for this query. If we follow a hierarchy of these entities (artists write albums, albums have titles) then creating a new table with a composite primary key makes the most sense:

CREATE TABLE playlists_by_artist_album_title (
  id uuid,
  song_order bigint,
  album text,
  artist text,
  song_id uuid,
  title text,
  PRIMARY KEY (artist, album, title));

This will key all entries by artist (partitioning key), and also allow you to narrow your results with album and title. Note that with composite primary keys, you can query by one or more primary key fields in order from left-to-right. Note that this approach also does not sort by song_order.

Another way to go about this, would be if you knew for sure that you would always be querying by artist, album and title. If that is the case, then you could define all of them as a part of a composite partitioning key, like this:

PRIMARY KEY ((artist, album, title), song_order);

While this would require the presence of artist, album and title in all queries, this would be the fastest way to query this data. And song_order is a clustering column, so the order of songs on the album would be preserved. But again, it all depends on the query it will be serving.