Domanda

I need to keep a number of statistics about the videos being watched, and one of them is what parts of the video are being watched most. The design I came up with is to split the video into 256 intervals and keep the floating-point number of views for each of them. I receive the data as a number of intervals the user watched continuously. The problem is how to store them. There are two solutions I see.

Row per every video segment

Let's have a database table like this:

CREATE TABLE `video_heatmap` (
 `id` int(11) NOT NULL AUTO_INCREMENT,
 `video_id` int(11) NOT NULL,
 `position` tinyint(3) unsigned NOT NULL,
 `views` float NOT NULL,
 PRIMARY KEY (`id`),
 UNIQUE KEY `idx_lookup` (`video_id`,`position`)
) ENGINE=MyISAM

Then, whenever we have to process a number of views, make sure there are the respective database rows and add appropriate values to the views column. I found out it's a lot faster if the existence of rows is taken care of first (SELECT COUNT(*) of rows for a given video and INSERT IGNORE if they are lacking), and then a number of update queries is used like this:

UPDATE video_heatmap
SET views = views + ?
WHERE video_id = ? AND position >= ? AND position < ?

This seems, however, a little bloated. The other solution I came up with is

Row per video, update in transactions

A table will look (sort of) like this:

CREATE TABLE video (
 id INT NOT NULL AUTO_INCREMENT,
 heatmap BINARY (4 * 256) NOT NULL,
 ...
) ENGINE=InnoDB

Then, upon every time a view needs to be stored, it will be done in a transaction with consistent snapshot, in a sequence like this:

  1. If the video doesn't exist in the database, it is created.
  2. A row is retrieved, heatmap, an array of floats stored in the binary form, is converted into a form more friendly for processing (in PHP).
  3. Values in the array are increased appropriately and the array is converted back.
  4. Row is changed via UPDATE query.

So far the advantages can be summed up like this:

First approach

  • Stores data as floats, not as some magical binary array.
  • Doesn't require transaction support, so doesn't require InnoDB, and we're using MyISAM for everything at the moment, so there won't be any need to mix storage engines. (only applies in my specific situation)
  • Doesn't require a transaction WITH CONSISTENT SNAPSHOT. I don't know what are the performance penalties of those.
  • I already implemented it and it works. (only applies in my specific situation)

Second approach

  • Is using a lot less storage space (the first approach is storing video ID 256 times and stores position for every segment of the video, not to mention primary key).
  • Should scale better, because of InnoDB's per-row locking as opposed to MyISAM's table locking.
  • Might generally work faster because there are a lot less requests being made.
  • Easier to implement in code (although the other one is already implemented).

So, what should I do? If it wasn't for the rest of our system using MyISAM consistently, I'd go with the second approach, but currently I'm leaning to the first one. But maybe there are some reasons to favour one approach or another?

È stato utile?

Soluzione

Second approach looks tempting at first sight, but it makes queries like "how many views for segment x of video y" unable to use an index on video.heatmap. Not sure if this is a real-life concern for you though. Also, you would have to parse back and forth the entire array every time you need data for one segment only.

But first and foremost, your second solution is hackish (but interesting nonetheless). I wouldn't recommend denormalising your database until you face an acutal performance issue.

Also, try populating the video_headmap table in advance with wiews = 0 as soon as a video is inserted (a trigger can help).

If space is really a concern, remove your surrogate key video_headmap.id and instead make (video_id, position) the primary key (then get rid of the superfluous UNIQUE constraint). But this shouldn't come into the equation. 256 x 12 bytes per video (rough row length with 3 numeric columns, okay add some for the index) is only an extra 3kb per video!

Finally, nothing prevents you from switching your current table to InnoDB and leverage its row-level locking capability.

Please note I fail to undestand why views cannot be an UNSIGNED INT. I would recommend changing this type.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top