How important is indexing and clustered indexing to database performance?

https://stackoverflow.com/questions/403393

03-07-2019
|

Question

There have been several questions recently about database indexing and clustered indexing and it has been kind of new to me until the last couple weeks. I was wondering how important it is and what kind of performance gains can be expected from creating them.

Edit: What is usually the best type of fields to look at when putting in a clustered index when you are first starting out?

Solution

Very very^A(G,G) important. In my opinion, wise indexing is the absolute most important thing in DB performance optimization.

This is not an easy topic to cover in a single answer. Good indexing requires knowledge of queries going to happen on the database, making a large number of trade-offs and understanding the implication of a specific index in the specific DB engine. But it's very important nevertheless.

EDIT: Basically, clustered indexes usually should have short lengths. They should be created on queries which reflect a range. They should not have duplicate entries. But these guidelines are very general and by no means the right thing. The right thing is to analyze the queries that are gonna be executed. Carefully benchmarking and analyzing execution plans and understanding what is the best way to do it. This requires years of experience and knowledge and by no means it's something to explain in a single paragraph. It's the primary thing that makes DB experts expert (It's not the only thing, but it's primitive to other important things, such as concurrency issues, availability, ...)!

OTHER TIPS

Indexing: extremely important. Having the wrong indexes makes queries harder, sometimes to the point they can't be completed in a sensible time.

Indexes also impact insert performance and disc usage (negatively), so keeping lots of superfluous indexes around on large tables is a bad idea too.

Clustering is something worth thinking about, I think it's really dependent on the behaviour of the specific database. If you can cluster your data correctly, you can dramatically reduce the amount of IOPs required to satisfy requests for rows not in memory.

Without proper indexes, you force the RDBMS to do table scans to query for anything. Terribly inefficient.

I'd also infer that you don't have primary keys, which is a cardinal sin in relational design.

Indexing is very important when the table contains many rows.
With a few rws, performance is better without indexes.
With larger tables indexes are very important to get good performance.
It is not easy to defined them. Clustered means that the data are stored in the clustered index order.
To get good hints of indexes you could use Toad

Indexing is vitally important.

The right index for a query can improve performance so dramatically it can seem like witchcraft.

As the other answers have said, indexing is crucial.

As you might infer from other answers, clustered indexing is much less crucial.

Decent indexing gives you first order performance gains - orders of magnitude are common.

Clustered indexing is a second order or incremental performance gain - usually giving small (<100%) percentages of performance increase.

(We also get into questions of 'what is a 100% performance gain'; I'm interpreting the percentage as ((oldtime - newtime)/newtime) * 100, so if the old time is 10 seconds and the new time is 5 seconds, the performance increase is 100%.)

Different DBMS have different interpretations of what a clustered index means. Beware. In particular, some DBMS cluster the data once and thereafter, the clustering decays over time until the data is reclustered. Others take a more active view of clustering, I believe.

The clustered index is ususally but not always your primary key. One way of looking at a clustered index is to think of the data being physically ordered based on the values of the clustered index.

This may very well not be the case in reality however refrencing clustered indexes ususally gets you the following performance bonuses anyway:

All columns of the table are accessable for free when resolved from a clustered index hit as if they were contained within a covering index. (A query resolvable using just the index data without having to refrence the data pages of the table itself)
Update operations can be made directly against a clustered index without intermediate processing. If you are doing a lot of updates against a table you ususally want to be refrencing the clustered columns.
Depending on implementation there may be a sequential access benefit where data stored on disk is retreived quicker with fewer expensive disk seek operations.
Depending on implementation there may be free index benefit where a physical index is not necessary as data access can be resolved via simple guessing game algorithms.

Don't count on #3 and especially #4. #1 and #2 are ususally safe bets on most RDBMS platforms.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow