Question

I use indexes like most developpers do (mostly on... well ! index), but i'm sure there is a lot of subtle way to optimize a database using index. I'm not sure if it is specific to any implementation of a DBMS.

My question is : what are good examples of how to use index (except for basic, obvious cases), and how does a DBMS optimize its database when you specify an index on a table ?

Was it helpful?

Solution

Think of an index as "table of contents"... that is an ordered list of pointers to positions in a file, aka offsets. Say that you have millions of records stored in a table, rather than search the table for matching criteria, it's much faster to reference an ordered list for matches, then stack the pointers to the specific matching rows. A perfect example of an index is a tables primary key field, most typically its "id" field. If you want row id # 11234566, it's much faster ask the index for a pointer to the data than it is to scan the data source for position 11234566.

Here's a not so obvious use of indexing:

CREATE TABLE activity_log (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
activity_type_id SMALLINT UNSIGNED NOT NULL,
datetime_created DATETIME
KEY(activity_type_id),
PRIMARY KEY(id)
);
CREATE TABLE activity_log_to_date_key (
activity_log_id INT UNSIGNED NOT NULL,
date_created_key  INT UNSIGNED NOT NULL REFERENCES dim_datetime(id),
UNIQUE KEY(activity_log_id),
KEY(date_created_key)
);
CREATE TABLE dim_datetime (
id INT UNSIGNED NOT NULL AUTO_INCREMENT,
date_hour DATETIME NOT NULL,
PRIMARY KEY(id),
KEY(date_hour)
);

Your operation can create your log record, but then create a reference to an indexed datetime that is faster to search/sort than your log table. Then join back your log table on its own primary key. If you need me to expand on this, let me know. I hope this makes sense.

Sample query:

SELECT a.activity_log_id, al.activity_type_id, al.datetime_created
FROM activity_log_to_date_key a 
INNER JOIN dim_datetime d ON (d.id = a.date_created_key)
LEFT JOIN activity_log al ON (al.id = a.activity_log_id)
WHERE d.date_hour BETWEEN '2009-01-01 00:00:00' AND '2009-06-01 12:00:00';

OTHER TIPS

One point that a lot of people seem to miss is that a DBMS will often (or can only) only use one index per table reference in a query, and if it can and does use multiple indexes it would probably be faster to use a combined index if present.

For instance, if searching a large table for rows WHERE AnIntegerColumn = 42 AND AnOtherInt = 69 the fastest route to those rows would be an index on the two columns AnIntegerColumn and AnOtherInt. If you only have an index on each individually but no combined index the DB will either search one or the other index and separately filter the results with the second clause, or scan both and marry the results up afterwards.

Another common simple operation that can be improved with composite indexes is WHERE SomeColumn = <SomeValue> ORDER BY SomeOtherColumn - if there is an index on SomeColumn and SomeOtherColumn (in the right order) the filtering and ordering operations can be performed at the same time in some circumstances.

Adding too many indexes can be a bad optimisation of course, as the extra space used to store the indexes (and the IO-load to maintain them if your DB sees many write operations) may be a worse problem than the slightly less optimal read queries, so don't over-do it.

David and Randy have this covered. I just wanted to add that the EXPLAIN command can be a huge help in figuring out when you will get a big saving out of creating an index, as well as suggesting which indexes are needed. It will display the steps that the database is taking to run your query, so you know which bits are taking the longest time.

Something I haven't seen mentioned yet on here is that when you have more than one disk you probably want to put your index on a different disk than where the data is actually at. This can speed some operations up. I think this deserves a question in it's own right tho.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top