Question

I am confused as to how best to index a table in MySQL and need help on the best type of index construction to use. Currently I am using a unique-key index on this table but do not know if this is the best approach to use and in some situations I cannot use this type of indexing due to MySQL limitations.

The table consists of a primary key and n-columns, in this scenario to keep it simple n=4. So the table looks like this: pk, col1, col2, col3, col4

The values in col1-n are VARCHARs typically with a length between 1 to 4 characters. The primary key is a concatenation of the col values. So typical rows could look like the following:

A:B:C:D, A, B, C, D
A:B:C:E, A, B, C, E
A:B:F:F, A, B, F, F

Where the first element is the primary key, and subsequent elements are col1, col2, etc.

The table needs to be optimised for queries, not inserts. The queries that I wish to perform will have a WHERE clause where we know some of the values in columns 1-4. So for example I might want to find all rows where the second column is 'B' or 'C'. Once I have the primary key I use this to JOIN another table.

I was creating a unique key on col1-4 (as they are unique). The problem is, as soon as n becomes large (>16), I can no longer create a unique key index (MySQL is limited to 16 columns for unique key constraints). This is not a problem as the primary key ensures uniqueness. However, I am unsure of two things:

a) Is the unique key a good index to use in order to optimise the speed of the queries?

b) When I can not use a unique key, what index should I use?

I have the following options, and I’m not sure which (if any) is the best:

a) Create a single index on (col1, col2, col3, col4)

b) Create an index per column (col1), (col2)…(col-n)

c) Create an index per col, with the pk included (pk, col1), (pk, col2), (pk, col-n)

Any help you can provide is greatly appreciated.

Thanks

Phil

Was it helpful?

Solution

An index on (col1, col2, col3, col4) can only be used, if the WHERE clause contains a condition on the first columns. That means, if the query does not contain a condition on col1, the index cannot be used at all (see Multiple-Column Indexes). If you have such queries, additional indices should be defined. These might be (col2, col3, col4), (col3, col4) and (col4).

On the other hand, separate indices on (col1), (col2), (col3) and (col4) are also a good choice. Int that case, there is no need to include the primary key in the indices. I'd prefer this solution over the solution mentioned above.

I find your choice of primary key strange. If (col1, col2, col3, col4) is unique, use that as a primary key. If you do not want a primary key on four columns (most people don't), the next choice is often a surrogate key (i.e. an auto_increment column in MySQL). In that case, a unique key on (col1, col2, col3, col4) enforces data integrity.

OTHER TIPS

MySQL is able to merge join several indexes within a single table on PK, as long as you are searching for exact key values (not ranges).

So if you create separate indexes on col1 to colN, you may run this query:

SELECT  *
FROM    mytable
WHERE   col2 = 'B'
        OR
        col3 = 'C'

which will result in the indexes on col2 and col3 merge joined (you will see it as index_merge using union(col2, col3) in the EXPLAIN output).

To ensure uniqueness, it's enough to declare your first column the PRIMARY KEY, so as long as you maintain your data consistency (PK value is indeed the col* values concatenated and separated), your data uniqueness will be policed by the PK.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top