Question

I have the following table:

CREATE TABLE tab ( userID varchar, grpID varchar, itemID varchar, timestamp bigint, PRIMARY KEY());

I need to execute the following queries on this table, where:

  • userID = x
  • userID = x and grpID = y
  • userID = x and grpID IN (a,b,c)

Also, the results should be sorted by timestamp field in descending order, which makes timestamp as my clustering key.

I want to avoid duplication of data or creating same table with 2 primary keys to achieve my queries.

What should be my primary key now so that I should be able to execute all these queries?

Was it helpful?

Solution

You can't achieve all that with your restrictions

  • PK(userID, timestamp) -- you won't be able to put condition on grpID but you can have sorting by ts
  • PK(userID,grpID) -- you can perform all 3 queries but sorting won't be by timestamp
  • PK((userID, grpID), timestamp) you can perform query 2 and 3 and have sorting by timestamp, but won't be able to query only for userID=x

If rows retrieved are not huge you could perform sorting client side -- alternatively you could use a secondary index. A secondary index creates and mantain automatically a new table (so duplication of your data, violation of your terms!) but should allow you to perform all. Secondary indexes do not allow the IN operator, so you should choose userID as lookup key.

By the way: denormalization is very normal in NoSQL -- so data duplication is something usual in this context.

HTH, Carlo

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top