Usefulness of a covering index on a fact table

https://stackoverflow.com/questions/11056330

14-06-2021
|

Question

Consider a fact table of the form:

CREATE TABLE Fact1
(
    Dim1 int NOT NULL,
    Dim2 int NOT NULL,
    Dim3 int NOT NULL,
    Data1 int NOT NULL,
    Data2 int NOT NULL
    ...
)

Fact1 has a single column index on each of the dimensions. Dim1 is assumed to be the time dimension with a granularity down to range of hours (e.g. between 2 PM and 6 PM on March 12 2011). Would it be useful to include Dim2 and Dim3 as covering columns within Dim1? Or likewise on any of them?

More generally, would it ever be useful to include the other dimension table FK columns as a covering column on an index for a given dimension?

Note: For the fact table, we are assuming there is no need to uniquely identify a given fact. Hence, the lack of a primary key or surrogate key. The uniqueness is guaranteed by (Dim1, Dim2, Dim3) always being a unique tuple.

Solution

I'm going to try to answer the more general question - "Would it ever be useful to include the other dimension table FK columns as a covering column on an index for a given dimension?"

Yes. If you have a significant number of queries which do things such as COUNT(), where a covering index allows you to scan a smaller data set, then adding those other dimensions may be valuable.

SELECT Dim1, Dim2, count(*)
from Fact1
group by Dim1, Dim2

With an index on only Dim1 or only Dim2, you end up having to do a FTS to do this count. This may be perfectly fine. Full scans are not always bad. However, if you want to speed up these sorts of queries (say the fact table is very wide), then adding a B-tree index on Dim1, Dim2 would allow the DBMS to go to the index to count, instead of having to go to the table to count. Note that it still will do a full scan of the index which may be only marginally faster than a full table scan.

In general, I doubt you would see that much of a performance gain since you are still scanning all the rows of the index anyway, and unless the index was significantly smaller than the table you're probably not going to get a big improvement.

Since it's a fact table, the only queries where covering indexes on dimensions will help is when it's only the dimensions themselves are being queried. Anything that uses the facts will require an index scan, then a lookup in the table for the actual data.

I would probably just build your B-tree indexes on the dims for queries that use the keys (and joins) and then add additional ones as needed when the system has been running for awhile and common queries have been identified.

The other case that I can think of where a "covering" index such as this may help speed up queries is when you have queries that are focusing on the a specific dimension combination, and you only want those specific rows.

SELECT Dim1, Dim2, Data1, Data2
  FROM Fact1 
 WHERE Dim1 = @A and Dim2 = @B;

You may see a very slight performance gain if you have a b-tree index on Dim1, Dim2 rather than just Dim1, since you scan the index for all of the items in the WHERE clause, and then get your fact data.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow