Question

If it matters as of now I'm using MySQL/MyISAM but I'm open to using PostgreSQL. I'm also open to using memcached.

Consider a table that's used to store forum threads:

id forum_name post_date
1  Hey!       2009-01-01 12:00:00
  • What's the best practice of storing thread-related entities such as votes, views, and counters?

Should I..

Create a separate table such as

id thread_id views
1  1         532

Or keep it as a column in the initial threads table?

id forum_name post_date              views
1  Hey!       2009-01-01 12:00:00    532

Another related tidbit, what's the practical solution for displaying and incrementing page views? I read this thread and it seems like I could just cache a value for a certain time, I wasn't exactly clear on the incrementing part - perhaps something like storing the values in flat files somewhere, then periodically with cronjobs update the database forum views every hour or so?

EDIT: To clarify, voting IS like SO with one vote per thread and there can be reversals. Nevermind what I meant about counters.

Was it helpful?

Solution

Voting

I suggest defining two tables in addition to the thread table - VOTE_CODES and THREAD_VOTES. At a glance, it might appear over normalized but the format will allow you to change vote value without needing major DML changes.

VOTE_CODES table

  • vote_code, primary key, IE: up, down
  • vote_cast_value --value attributed to up/down vote
  • vote_caster_value --optional, if you want to keep SO's style of the negative vote impacting the caster.

THREAD_VOTES table

  • thread_id
  • user_id
  • vote_code

All of the columns in THREAD_VOTES are the primary key - that will ensure that there can only be as many rows for a given user and thread as there are vote codes. Assuming only two codes, this would support the ability to reverse a vote because there could only be two records - one with either code.

Views

I would suggest storing the:

  • thread id
  • ip address
  • user_agent --rough browser capture
  • timestamp

All of the above are the primary key. Your table will populate fast, but it will give you the ability to create a computed column in a view for more accurate reporting.

OTHER TIPS

flat files are obviously a bad idea, because you'd need to implement locking (the db already does that, and there's fewer bugs in that code).

relational database design is more of an art rather than a science: you can have

CREATE TABLE threads (
  tid THREADID
, title THREADTITLE
, views COUNTER
, PRIMARY KEY (tid)
);

and it'll be no more and no less "correct" than

CREATE TABLE threads (
  tid THREADID
, title THREADTITLE
, PRIMARY KEY (tid)
);

CREATE TABLE views (
  tid THREADID
, views COUNTER
, PRIMARY KEY (tid)
, FOREIGN KEY (tid)
  REFERENCES threads
);

so it's really up to you.

i'd say: go with the simplest thing first, make it more complicated if you find it's necessary (e. g. for performance reasons). IOW: put the views COUNTER attribute in threads. if it turns out the trafic is hurting performance (too many updates on the threads.views attribute mean the dbms must shuffle around otherwise immutable data in the other attributes), you can always split the table up in two, and replace it with a view that joins them. voila, immutable (or rarely changing) data separated from the volatile data, interface remains the same.

of course, go with PostgreSQL. the above shown code is valid in that dbms, just add these:

CREATE DOMAIN threadid
AS INT NOT NULL;

CREATE DOMAIN threadtitle
AS TEXT NOT NULL
CHECK (LENGTH(VALUE) > 0);

CREATE DOMAIN counter
AS INT NOT NULL
CHECK (VALUE > 0);

edit to refute the comment by OMG Ponies: of course it's safe.

UPDATE threads SET
  views = views + 1
WHERE tid = X

either succeeds or bails out.

edit 2 to add consideration for the voting aspect

let's say the spec is: a user may vote a thread up (+1) or down (-1), the sum of his or her votes on a given thread may not exceed |1|, and the history is irrelevant. iow a user may vote a thread up, then down to reset their vote to "no vote", then down again to "voted down", etc.

CREATE DOMAIN vote
AS INT NOT NULL
CHECK (VALUE BETWEEN -1 AND 1);

CREATE TABLE votes (
  tid THREADID
, uid USERID
, vote VOTE
, PRIMARY KEY (tid, uid)
);

in MySQL, you could

INSERT INTO votes (
  tid
, uid
, vote
) VALUES (
  X
, Y
, Z -- +1 or -1
)
ON DUPLICATE KEY UPDATE
vote = vote + Z

alas, PostgreSQL doesn't (yet) have such functionality built in, so you'd need to use the idiomatic user-level implementation

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top