Question

I have a news site which receives around 58,000 hits a day for 36,000 articles. Of this 36000 unique stories, 30000 get only 1 hit (majority of which are search engine crawlers) and only 250 stories get over 20 impressions. It is a wastage of memory to cache anything, but these 250 articles.

Currently I am using MySQL Query Cache and xcache for data caching. The table is updated every 5-10 mins, hence Query Cache alone is not much useful. How can I detect frequently visited pages alone and cache the data?

Was it helpful?

Solution

I think you can have two options to start with:

  1. You don't cache anything by default.

    You can implement with an Observer/Observable pattern a way to trigger an event when the article's view reaches a threshold, and start caching the page.

  2. You cache every article at creation

In both case, you can use a cron to purge articles which don't reaches your defined threshold.

In any case, you'll probably need to use any heuristic method to determine enough early that your article will need to be cached, and as in any heuristic method, you'll have false-positive and vice-versa.

It'll depend on how your content is read, if articles are realtime news, it'll probably be efficient as it'll quickly generate high traffic.

The main problem with those method is you'll need to store extra information like the last access datetime and its current page views which could result in extra queries.

OTHER TIPS

You can cache only new articles (let's say the ones which have been added recently). I'd suggest having a look at memcached and Redis - they are both very useful, simple and at the same time powerful caching engines.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top