Question

I'm trying to build an aggregated news/blog/forum website from multiple sources.

Because most queries are likely to be within same time periods for written_time column, I'm thinking about taking advantage of clustered index ordered by written_time.

But because it is not unique, i'm thinking about making primary key with unique id like:

(written_time, site_id, article_id)

I think it would require somewhat bigger space but it's much better than having secondary indices. Is it a good approach to make cluster index like this if I would like to take advantage of query results which have proximity about written-time?

Here are some use case scenarios:

  • the website's main page shows recent aggregated articles

    e.g. SELECT .. FROM written_time >= datetime_1weeksago

  • Users can see articles of every board for specific time periods

    e.g. SELECT .. FROM written_time >= datetime1 AND written_time < datetime2

  • Users can see articles which contains specific keyword for specific time chunk (e.g 201207), user can narrow search criteria down into some selected sites, search traffic volume is not high, Going to use full-text engine, frequent search result is cached by keyword*time_chunk.

    e.g. SELECT .. FROM written_time >= '2012-07-01' AND written_time < '2012-08-01' + keyword search using full-text engine

    e.g. SELECT .. FROM written_time >= '2012-07-01' AND written_time < '2012-08-01' AND site_id IN (1,3,5,7,9) + keyword search using full-text engine

  • Background crawler fetches large number of articles in two ways and appends in two directions: (this is why i want to make clustered index with written_time)

    1. periodcally crawls and updates recent articles (appends entries with newer written_time)

    2. scrawls and archives old articles (appends entries with written_time)

  • huge amount of articles from number of highly-active news/blog/forum

Was it helpful?

Solution

For space and time reasons, it's best to use a single AUTO_INCREMENT based primary key for InnoDB tables, as InnoDB stores the PRIMARY KEY values in all other indexes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top