Cassandra/Redis: Way to create feed without Cassandra 'IN' secondary index?

Question

Without knowing the schema of the posts table (and preferably the others, as well), it's really hard to make any useful suggestions.

It's unclear to me why you need to have user_id be a secondary index, as opposed to your primary key.

In general it's quite useful to key content like posts off of the user that created it, since it allows you to do things like retrieve all posts (optionally over a given range, assuming they are chronologically sorted) very efficiently.

With Cassandra, if you find that a table can effectively answer some of the queries that you want to perform but not others, you are usually best of denormalizing that table and creating another table with a different structure in order to keep your queries to a single CQL partition and node.

CREATE TABLE posts (
  user_id int,
  post_id int,
  post_text text,
  PRIMARY KEY (user_id, post_id)
  ) WITH CLUSTERING ORDER BY (post_id DESC)

This table can answer queries such as:

 select * from posts where user_id = 1234;

 select * from posts where user_id = 1 and post_id = 53;

 select * from posts where user_id = 1 and post_id > 5321 and post_id < 5400;

The reverse clustering on post_id is to make retrieving the most recent posts the most efficient by placing them at the beginning of the partition physically within the sstable.

In that example, user_id being a partition column, means "all cql rows with this user_id will be hashed to the same partition, and hence the same physical nodes, and eventually, the same sstables. That's why it's possible to

retrieve all posts with that user_id, as they are store contiguously
retrieve a slice of them by doing a ranged query on post_id
retrieve a single post by supplying both the partition column(user_id) and the clustering column (post_id)

In effect, this become a hashmap of a hashmap lookup. The one major caveat, though, is that when using partition and clustering columns, you always need to supply all columns from left to right in your query, without skipping any. So in this case, that means you can't retrieve an individual post without knowing the user_id that the post_id belongs to. That is addressable in user-code(by storing a reverse mapping and doing the lookup when necessary, or by encoding the user_id into the post_id that is passed around your application), but is definitely something to take into consideration.