MySQL query optimization: how to optimize voting calculations?

Question

This is a very subjective question because it very much depends on your exact requirements, and performance testing which nobody here can do on your data. But I can answer your questions and add some generic solutions that might work for you:

Did I make the database design wrong? I mean, could it be better?

No. This is the ideal design for OLTP.

Did I make the query wrong?

No (Although the ORDER BY in the subqueries are redundant). The performance of your query is very much dependent on the indexes on the Vote table, since the main columns queried will be in this part:

SELECT  V.TrackHash, SUM(V.Vote) AS VotesTotal
FROM    Vote V
WHERE   V.CreatedAt > NOW() - INTERVAL 1 MONTH AND V.Vote = 'up'
GROUP BY V.TrackHash

I would suggest 2 indexes, one on TrackHash and one on CreatedAt, Vote AND Type (this may perform better as 3 separate indexes, worth testing both ways). 200k rows is not that much data, so with the right indexes it shouldn't take too long to query data over the last month.

Anything else I could improve?

This is very much a balancing act, it really depends on your exact requirements as to the best way to proceed. There are 3 main ways you could approach the problem.

1. Your current approach (query vote table each time)

As Mentioned before I think this approach should be scalable for your application. The advantage is it does not require any maintenance, and all data sent to the application is up to date and accurate. The disadvantage is performance, it might take a bit longer to insert data (due to updating indexes), and also select data. This would be my preferred approach.

2. OLAP approach

This would involve maintaining a summary table such as:

CREATE TABLE VoteArchive
(       TrackHash           CHAR(40) NOT NULL,
        CreatedDate         DATE NOT NULL,
        AppMadeUpVotes      INT NOT NULL,
        AppMadeDownVotes    INT NOT NULL,
        ImportedUpVotes     INT NOT NULL,
        ImportedDownVotes   INT NOT NULL,
        MergedUpVotes       INT NOT NULL,
        MergedDownVotes     INT NOT NULL,
    PRIMARY KEY (CreatedDate, TrackHash)
);

This can be populated nightly by running a simple query

INSERT VoteArchive
SELECT  TrackHash,
        DATE(CreatedAt),
        COUNT(CASE WHEN Vote = 'Up' AND Type = 0 THEN 1 END),
        COUNT(CASE WHEN Vote = 'Down' AND Type = 0 THEN 1 END),
        COUNT(CASE WHEN Vote = 'Up' AND Type = 1 THEN 1 END),
        COUNT(CASE WHEN Vote = 'Down' AND Type = 1 THEN 1 END),
        COUNT(CASE WHEN Vote = 'Up' AND Type = 2 THEN 1 END),
        COUNT(CASE WHEN Vote = 'Down' AND Type = 2 THEN 1 END)
FROM    Votes
WHERE   CreatedAt > DATE(CURRENT_TIMESTAMP)
GROUP BY TrackHash, DATE(CreatedAt);

You can then use this table in place of your live data. It has the advantage of the date being part of the clustered index, so any query being limited by date should be very fast. The disadvantage of this is that if you query this table you only get statistics accurate up to the last time it was populated, you will get much faster queries though. It is also additional work to maintain the query. However this would be my second choice if I could nto query live data.

3. Update statistics during voting

I am including this for completeness but would implore you not to use this method. You could achieve this in either your application layer or via a trigger and although it does allow for querying of up to date data without having to query the "production" table it is open for errors, and I have never come accross anyone that truly advocates this approach. For every vote you need to do insert/update logic which should turn a very fast insert query into a longer process, depending on how you do the maintenance there is a chance (albeit very small of concurrency issues).

4. A combination of the above

You could always have 2 tables of the same format as your vote table, and one table as set out in solution 2, have one vote table just for storing today's votes, and one for historic votes, and still maintain a summary table, you can then combine today's data with the summary table to get up to date results without querying a lot of data. Again, this is additional maintenance, and more potential for things to go wrong.

MySQL query optimization: how to optimize voting calculations?

Meta

Vote

Track

App

EDIT