A questions about database structures and lots of functions ala Reddit

https://stackoverflow.com/questions/8499577

15-03-2021
|

質問

I have a question about how a bookmarking site like Reddit would manage all the 'votes' a user has logged.

For example when I (User_ID_292929) vote up a post (Post_ID_282828) logically somewhere in a database it says that User_ID_292929 has voted up Post_ID_282828.

But how would that be structuralized in the DB? Would the table that handels user profiles have a a field that is full of comma seperated values and it gets exploded and checked to see if the posts on the page that's being loaded has been voted up?

I'm not looking for a long answer but more a example program or documentation on a similar structure.

Thanks

解決

Assuming a user can only vote a particular post once, then you could create a new table (let's call it users_vote_posts) with 2 columns (user_id and post_id). Set both user_id and post_id as a composite primary key.

Using your example, let's say a user (User_ID_292929) votes up a post (Post_ID_282828). The table would look like this:



    +---------+---------+
    | user_id | post_id |
    +---------+---------+
    |  292929 |  282828 |
    +---------+---------+

If there are more than one type of vote (either vote up OR down for example) then you could add another column that defines the type of vote (let's call it vote_type).

Now the table would look like this:



    +---------+---------+-----------+
    | user_id | post_id | vote_type |
    +---------+---------+-----------+
    |  292929 |  282828 | up        |
    +---------+---------+-----------+

他のヒント

The simplest way is to have a table with one column to track the user who voted, and another column with the id of the thing they voted on. You may also have a third column specifying the type of thing they voted on, if the id wasn't unique across all types.

This is what is called a multivalue attribute. When this happens you have a separate table that specifies the information you need. So, you can have userid, postid. This would then be your primary key for that table, as userid and postid together would be unique, therefore there would be no duplicates or errors in the database. If you need more information about a post, you can always use a join operator in the query to get more information about a post or user.

Also, because the table is smaller you can cache it for faster access, sites like reddit will use caching extensively and clustering.

I tackled this problem for a website I was making.

Like Reddit, a user could be logged in and see 20+ stories on the home page.

JOIN'ing a Vote table against the User and Story table's would not be terribly efficent to find out whether the currently logged in user had voted on each story or not.

I took a hybrid approach of: 1) Making a 'Vote' table (id, userid, storyid) 2) Adding a 'Voted_Cache' column to 'Story' table that was a comma separated list (CSV) of User IDs that have voted on the story.

Now when I load 20 articles on the home page, I can check to see if the current userid exists in the story.Voted_Cache column, instead of needing to do a JOIN to the Vote table.

The 'Vote' table is the authoritative to let me know what stories are voted on, and the Voted_Cache column can be rebuilt from this table if necessary.

A typical design pattern to this type of problem would be to create an association table for the users who vote. The association table could look as simple as

assoc_user_vote - table name

id - primary key

userid

voteid

Each record in the assoc_user_vote table has a unique id - probably auto incremented or seeded and contains a user and vote id. The userid and voteid are primary keys in their respective tables.

This pattern supports many votes by a specific user and follows data normalization best practices. http://en.wikipedia.org/wiki/Database_normalization

ライセンス： CC-BY-SA と帰属

所属していません StackOverflow