Question

I am using a fact table with the following structure in SQL Server 2012:

CREATE TABLE [dbo].[factTable] (
    [Id]            BIGINT      IDENTITY (1, 1) NOT NULL,
    [Date]          DATE        NOT NULL,
    [MinuteNumber]  SMALLINT    NOT NULL,
    [CityId]        INT         NOT NULL, /* Foreign key to dimCity */
    [Value]         DECIMAL(12, 4)  NULL
)

I have a clustered index on the Date column with a fill factor of 100. The data inserted into this table is almost always in the ascending order of Date and MinuteNumber.

  1. I want to know - if having the Id column is necessary in the given scenario? Does it have any performance implications? Or can I safely eliminate it.

  2. I also want to know if having clustered index on Date column is sufficient (there will be many records with the same date, even same date and same minute-number) or is it better to have a clustered index combining multiple columns; and what are the performance and storage implications for either approach?

I am new to this and any help will be highly appreciated.

Was it helpful?

Solution 2

In your case, I'd probably create a nonclustered primary key on the identity column, to allow for easier FK relationship management and for performance.

The clustered key would be on the date column, to allow for faster range queries. The date column also fulfills the three basic requirements for a clustered index: it's narrow (to make nonclustered indexes smaller), it's stable (because a change on a CI column means reshuffling the NC indexes as well, this is to be avoided) and it's increasing (to avoid bad page splits, the ones not at the end of the table).

WRT non-unique clustered index, SQL Server will add a uniquifier data to it if it's not unique.

OTHER TIPS

A clustered index must be unique, so if you do decide to go with DATE, you'll need another column(s) which together would always be unique. A clustered index also controls the order of the data physically, so the key should be one that's in ever ascending order. Again, something that your DATE seems to have, which you got right.

However, it would be good to know how much data your table is going to have, and how many nonclustered indexes you plan on using? Since every nonclustered index leaf record includes a pointer to the clustered index, you don't generally speaking want your clustered index to be any larger than it has to be.

Basically the advantages of a simple autointeger number as the key column for a clustered index are that it's effective storage-wise, it always increases in order, and it has good synergy with other objects and use cases as well.

marc_s, a user here, posted a link to another site (link), I think you should definitely check it out.

But to summarize, a clear majority of the time the safe bet is to keep this simple and just put a clustered index on your basic int / bigint identity column, then use nonclustered indexes to optimize searches on specific columns in the table. This is more than good enough for most of the time. No need to complicate things and look for 5% improvement on queries already running more than fast enough. So, the question is, is there any reason for you to expect a standard solution would not work in your case? Like, a huge amount of data (talking bigint scale rows here, exceeding several billions for instance), other performance implications (complex conditional joins to other tables in the same db), or other things like that?

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top