Question

I have a site where the users can view quite a large number of posts. Every time this is done I run a query similar to UPDATE table SET views=views+1 WHERE id = ?. However, there are a number of disadvantages to this approach:

  • There is no way of tracking when the pageviews occur - they are simply incremented.
  • Updating the table that often will, as far as I understand it, clear the MySQL cache of the row, thus making the next SELECT of that row slower.

Therefore I consider employing an approach where I create a table, say:
object_views { object_id, year, month, day, views }, so that each object has one row pr. day in this table. I would then periodically update the views column in the objects table so that I wouldn't have to do expensive joins all the time.

This is the simplest solution I can think of, and it seems that it is also the one with the least performance impact. Do you agree?

(The site is build on PHP 5.2, symfony 1.4 and Doctrine 1.2 in case you wonder)

Edit:
The purpose is not web analytics - I know how to do that, and that is already in place. There are two purposes:

  • Allow the user to see how many times a given object has been shown, for example today or yesterday.
  • Allow the moderators of the site to see simple view statistics without going into Google Analytics, Omniture or whatever solution. Furthermore, the results in the backend must be realtime, a feature wich GA cannot offer at this time. I do not wish to use the Analytics API to retrieve the usage data (not realtime, GA requires javascript).
Was it helpful?

Solution

Quote : Updating the table that often will, as far as I understand it, clear the MySQL cache of the row, thus making the next SELECT of that row slower.
There is much more than this. This is database killer. I suggest u make table like this : object_views { object_id, timestamp} This way you can aggregate on object_id (count() function). So every time someone view the page you will INSERT record in the table. Once in a while you must clean the old records in the table. UPDATE statement is EVIL :) On most platforms it will basically mark the row as deleted and insert a new one thus making the table fragmented. Not to mention locking issues .

Hope that helps

OTHER TIPS

Along the same lines as Rage, you simply are not going to get the same results doing it yourself when there are a million third party log tools out there. If you are tracking on a daily basis, then a basic program such as webtrends is perfectly capable of tracking the hits especially if your URL contains the ID's of the items you want to track... I can't stress this enough, it's all about the URL when it comes to these tools (Wordpress for example allows lots of different URL constructs)

Now, if you are looking into "impression" tracking then it's another ball game because you are probably tracking each object, the page, the user, and possibly a weighted value based upon location on the page. If this is the case you can keep your performance up by hosting the tracking on another server where you can fire and forget. In the past I worked this using SQL updating against the ID and a string version of the date... that way when the date changes from 20091125 to 20091126 it's a simple query without the overhead of let's say a datediff function.

First just a quick remark why not aggregate the year,month,day in DATETIME, it would make more sense in my mind.

Also I am not really sure what is the exact reason you are doing that, if it's for a marketing/web stats purpose you have better to use tool made for that purpose.

Now there is two big family of tool capable to give you an idea of your website access statistics, log based one (awstats is probably the most popular), ajax/1pixel image based one (google analytics would be the most popular).

If you prefer to build your own stats database you can probably manage to build a log parser easily using PHP. If you find parsing apache logs (or IIS logs) too much a burden, you would probably make your application ouput some custom logs formated in a simpler way.

Also one other possible solution is to use memcached, the daemon provide some kind of counter that you can increment. You can log view there and have a script collecting the result everyday.

If you're going to do that, why not just log each access? MySQL can cache inserts in continuous tables quite well, so there shouldn't be a notable slowdown due to the insert. You can always run Show Profiles to see what the performance penalty actually is.

On the datetime issue, you can always use GROUP BY MONTH( accessed_at ) , YEAR( accessed_at) or WHERE MONTH(accessed_at) = 11 AND YEAR(accessed_at) = 2009.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top