Question

I am building a star schema to act as the backend for an analytics app I am building. My query generator is building queries using a regular star-join pattern. A sample query is below, whereby a fact table is joined to two dimension tables and the dimension tables are filtered by constant values chosen by the end user.

I am using MySQL 5.5 and all tables are MyISAM.

In this problem, I am simply trying to pull the first N rows (in this case, the first 1 row)

EXPLAIN
SELECT fact_table.*
FROM
    fact_table
INNER JOIN
    dim1 ON (fact_table.dim1_key = dim1.pkey)  
INNER JOIN
    dim2 ON (fact_table.dim2_key = dim2.pkey)           
WHERE
    dim1.constant_value = 123
    AND dim2.constant_value = 456
ORDER BY
    measure1 ASC LIMIT 1

The explain output follows. Both the dimension keys resolve to constant values since there is a unique key applied to their value.

*************************** 1. row ***************************
       id: 1
select_type: SIMPLE
    table: dim1
     type: const
possible_keys: PRIMARY,dim1_uk
      key: dim1_uk
  key_len: 8
      ref: const
     rows: 1
    Extra: Using filesort
*************************** 2. row ***************************
       id: 1
select_type: SIMPLE
    table: dim2
     type: const
possible_keys: PRIMARY,dim2_uk
      key: dim2_uk
  key_len: 8
      ref: const
     rows: 1
    Extra: 
*************************** 3. row ***************************
       id: 1
select_type: SIMPLE
    table: fact_table
     type: ref
possible_keys: my_idx
      key: my_idx
  key_len: 16
      ref: const,const
     rows: 50010
    Extra: Using where

And here is the index on the fact table:

show indexes from fact_table

*************************** 10. row ***************************
    Table: fact_table
 Non_unique: 1
 Key_name: my_idx
 Seq_in_index: 1
 Column_name: dim1_key
Collation: A
 Cardinality: 24
 Sub_part: NULL
   Packed: NULL
     Null: 
 Index_type: BTREE
  Comment: 
Index_comment: 
*************************** 11. row ***************************
    Table: fact_table
Non_unique: 1
 Key_name: my_idx
Seq_in_index: 2
Column_name: dim2_key
Collation: A
Cardinality: 70
 Sub_part: NULL
   Packed: NULL
     Null: 
Index_type: BTREE
  Comment: 
Index_comment: 
*************************** 12. row ***************************
    Table: fact_table
Non_unique: 1
 Key_name: my_idx
Seq_in_index: 3
 Column_name: measure1
Collation: A
Cardinality: 5643
 Sub_part: NULL
   Packed: NULL
     Null: 
Index_type: BTREE
  Comment: 
Index_comment: 

When profiling this query, I see the query spends the majority of its time performing a filesort operation "sorting result". My question is, even when using the correct index, why can't this query simply pull out the first value without doing a sort? The my_idx is already sorted on the right column and the two columns appearing first in the index resolve as constants, as shown in the plan.

If I rewrite the query, as follows, I am able to get the plan I want, with no file sorting.

SELECT fact_table.*
FROM
    fact_table
WHERE
    dim1_key = (select pkey from dim1 where constant_value = 123)
    AND dim2_key = (select pkey from dim2 where constant_value = 456)
ORDER BY
    measure1 ASC LIMIT 1

It would be expensive to change the tool generating these SQL commands so I would like to avoid this filesort even when the query is written in the original format.

My question is, why is MySQL opting to do a filesort even when the first keys on the index are constants (via an INNER JOIN) and the index is sorted in the right order? Is there a way around this?

Was it helpful?

Solution

My question is, why is MySQL opting to do a filesort even when the first keys on the index are constants (via an INNER JOIN) and the index is sorted in the right order? Is there a way around this?

Because the order of the resultset depends on the index used for reading the first table in the JOIN, but, as you see in EXPLAIN, the JOIN actually starts from dim1 table.

It might seem strange, but to implicitly force MySQL start from fact_table you will need to change the indexes in the dimension tables to (pkey, constantvalue) instead of (constantvalue), otherwise MySQL optimizer will start with a table for which the condition constantvalue=some_value returns minimum rows. The problem is that you might need those indexes for other queries.

Instead, you may try to add STRAIGHT_JOIN option to the SELECT and explicitly force the order.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top