Explaining slow queries

Question 1

The columns in your index are backwards, based on the query, and that's why you see using where in the query plan.

To invoke a well-worn illustration, let's consider the telephone directory.

Your query is WHERE last_name < 'smith' AND first_name = 'john'.

The fact that the first names are sorted within each sorted group of last names is of no real value, because we still have to consider all of the people in a large portion of the directory (everyone before Smith) and evaluate their first names individually within each distinct last name. That's why your row estimate is so large.

If both expressions were equality comparisons, the server could indeed go directly to the 8 rows. If the leftmost column in the index were subject to the equality comparison and the second column were the "less than" comparison, the server could again go directly to the rows in question, because they would all be adjacent in the index.

An index with the two columns in the opposite order will most likely give very different performance.

Generally, using where with a key value from among possible_keys also shown means that the index is helping some, but the server is still having to evaluate what the index finds and eliminate additional rows using expressions in the where clause.

The faster response on identical queries is probably the query cache in action. The faster response on similar queries possibly means your innodb_buffer_pool_size is too small for your workload and all of the random reads required by the lack of an optimum index means a lot of pages loaded into the pool from disk on first execution.

Question 2

Your existing index on (GWAS_T2D_PVALUE, MOST_DEL_SCORE, _13k_T2D_EA_MAF), I would consider reversing the order of the columns to (MOST_DEL_SCORE, GWAS_T2D_PVALUE, _13k_T2D_EA_MAF) and here is why.

Think of the indexing as this. The first index has the GWAS_T2D_PVALUE. So you have a file cabinet with all these values sorted by value. Then, within EACH of these common value entries, it will put in all the MOST_DEL_SCORE in order within that... then finally all the _13k sorted within that. So, in order to process your query, you need to pull out all the files with the PVALUE < .05 (or whatever). Then, you have to manually run through each file for those that have your specific value for MOST_DEL_SCORE of 1 and pull those out.

Now, try the alternate index. You still have a file cabinet, but each file is for a specific MOST_DEL_SCORE. So, if you have 20 scores, you have 20 files to look at. Since you are always looking for the ONE INSTANCE "MOST_DEL_SCORE = 1", you have one file and you are almost done. Your next criteria is for the GWAS_T2D_PVALUE < .05. Since these were the secondary sort to the index, these are all sorted ready to go. So the engine can quickly start at the first record and go up to the .05 and stop. It doesn't have to keep going through all the other combinations the first index offers.

Just a suggestion, but I've seen historical querying improvements based on the proper index matching the criteria to the more specific and working out to the more generic at the subsequent columns in the index.