Question

Suppose that a long (1000000 entries for example) MySQL table t has a column c with less than 10 distinct TINYINT values.

Suppose also that most (say 99% or even 99.9%) of these values are equal to zero.

Will adding an index on this column speed up queries like the following?

SELECT * FROM t WHERE c > 1
Was it helpful?

Solution

The answer is "It Depends, you didn't provide enough information".

Put yourself in the shoes of the optimizer. You see this query that does a SELECT *. C is most likely not the only column in the table, let's say you have both columns A and B as well. That means that an index seek on C will require a lookup operation for each row to bring back A and B from the table. Now it boils down to the cost - If you have 1M rows, and IF the optimizer can realize that your query will return 1% = 10,000 rows. That means the cost of using the index is performing the index seek, getting 10K rows, and then performing 10K lookups to bring back columns A and B.

Unfortunately, MySQL does not maintain histograms like some other engines, but only the density vector... So it may estimate that ~10% of the rows will be returned, and with that info the estimated cost will be much higher.

The alternative, is to scan 1M rows and filter 'on the fly' without using the index. Which is cheaper? I don't know - that depends on the size of the table. If A and B are both BOOLEAN columns which take up very little space, it might think that it will be cheaper to scan the table. If A and B are huge BLOBS, it most likely will increase the estimated cost.

BTW.. if instead of using * (which I assume you only gave as an example), you would list only the minimal set of columns that you need, let's say A and C, then a composite index on (C,A) will always be the cheapest choice as you just saved yourself the lookups..

HTH

OTHER TIPS

in Short the Answer is Yes, in your case... However,

It would be good idea to take into considerations above general answer, regarding histograms and statistics, and write "Select" with " FORCE INDEX (index name)" on that particular table.

As well, take into consideration that you could use partitions and sub-partitions (Innodb engine), that way you could get as a result less index lookup(s).

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top