Question

I will do microblogging web service (for school, so don't blast me for lack of new idea) and I worry that DB could be often be overloaded (user could following other users or even tag so I suppouse that SELECT will be heavy - check 20 latest messages which contains all observing tags and user).

My idea is create another table, and store in it only statusID and userID (who should pick up message). Danger of that is, if some tag or user has many followers there will be a lot of record with that status ID. So, is it good idea? Or maybe better is used M2M relation? (one status -> many receivers)

Was it helpful?

Solution

I think most databases can easily handle large record sets. The responsibility to have it preform lies in your design with properly setting up the indexes. If you create the right indexes the select clauses should perform really well.

OTHER TIPS

I'd go with a users table, a table to have the m2m relationship between users and messages table.

You can then do one select to find all of the users a user is following and then a second select in to get all of the messages of interest (sorting and limiting the results as appropriate). Extending this to tagging should be pretty simple.

This design should be fine for large numbers of users and messages as long as you index the right columns. If you got massive then you could also run the users tables and messages tables to different servers or have read only replicates. I wouldn't even worry about that for the moment - you'd need to be huge.

When implementing Collabinate (http://www.collabinate.com), a service-based engine for microblogging and shared activity streams, I used a graph database. The fact that people create posts and follow other people lends itself to a graph structure. With the right relationships and algorithms, this can be a very efficient and performant solution.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top