Question

I've be told and read it everywhere (but no one dared to explain why) that when composing an index on multiple columns I should put the most selective column first, for performance reasons. Why is that? Is it a myth?

Was it helpful?

Solution

I should put the most selective column first

According to Tom, column selectivity has no performance impact for queries that use all the columns in the index (it does affect Oracle's ability to compress the index).

it is not the first thing, it is not the most important thing. sure, it is something to consider but it is relatively far down there in the grand scheme of things.

In certain strange, very peculiar and abnormal cases (like the above with really utterly skewed data), the selectivity could easily matter HOWEVER, they are

a) pretty rare b) truly dependent on the values used at runtime, as all skewed queries are

so in general, look at the questions you have, try to minimize the indexes you need based on that.

The number of distinct values in a column in a concatenated index is not relevant when considering the position in the index.

However, these considerations should come second when deciding on index column order. More importantly is to ensure that the index can be useful to many queries, so the column order has to reflect the use of those columns (or the lack thereof) in the where clauses of your queries (for the reason illustrated by AndreKR).

HOW YOU USE the index -- that is what is relevant when deciding.

All other things being equal, I would still put the most selective column first. It just feels right...

Update: Another quote from Tom (thanks to milan for finding it).

In Oracle 5 (yes, version 5!), there was an argument for placing the most selective columns first in an index.

Since then, it is not true that putting the most discriminating entries first in the index will make the index smaller or more efficient. It seems like it will, but it will not.

With index key compression, there is a compelling argument to go the other way since it can make the index smaller. However, it should be driven by how you use the index, as previously stated.

OTHER TIPS

You can omit columns from right to left when using an index, i.e. when you have an index on col_a, col_b you can use it in WHERE col_a = x but you can not use it in WHERE col_b = x.

Imagine to have a telephone book that is sorted by the first names and then by the last names.

At least in Europe and US first names have a much lower selectivity than last names, so looking up the first name wouldn't narrow the result set much, so there would still be many pages to check for the correct last name.

The ordering of the columns in the index should be determined by your queries and not be any selectivity considerations. If you have an index on (a,b,c), and most of your single column queries are against column c, followed by a, then put them in the order of c,a,b in the index definition for the best efficiency. Oracle prefers to use the leading edge of the index for the query, but can use other columns in the index in a less efficient access path known as skip-scan.

The more selective is your index, the fastest is the research.

Simply imagine a phonebook: you can find someone mostly fast by lastname. But if you have a lot of people with the same lastname, you will last more time on looking for the person by looking at the firstname everytime.

So you have to give the most selective columns firstly to avoid as much as possible this problem.

Additionally, you should then make sure that your queries are using correctly these "selectivity criterias".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top