Question

From my understanding, in noSQL, data should be duplicated. So, for instance, if you have a users table and a posts table, you'd store the user's info in the users table—as usual—but then you'd store the relevant user data in the posts table.

Question 1: is my understanding correct?

Question 2: if so, that means if I change a user detail I'll have make an update to all affected posts entries?

Was it helpful?

Solution

From the Cassandra perspective, it mostly depends on the queries that you need to support efficiently. When you query posts, do you also need user data? If so, it will generally be more efficient to include the required data where the post is stored.

So for question 1, yes in many circumstances, what you describe is the common practice, but it depends on the application's needs.

For question 2, this is also an application concern. If you foresee user data changing regularly, then your application should perhaps performa a lookup to the users table when displaying a post. However, if that introduces too many reads to display the required posts in a timely fashion, then including the user data in the posts data means that changes to the user data will need to be changed in two places. But it is important to ask if the historical data needs to be changed. For example, if you change your username on Twitter, it doesn't go back and update all prior references to you to your new username. This is an application choice. What is the user data that you anticipate might change? In the case of a username change where you do want the new value to be reflected in all previous posts, how timely does that change need to be? Should it be reflected immediately, or can you wait for a batch process to handle it?

The important thing to understand, is how to perform efficient queries and to understand the referential integrity tradeoff that is made when we denormalize to achieve high performance applications. Always consider the application query patterns when designing the data model.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top