Why is this MySQL query taking so long?
题
I'm looking to do a relatively simply filter from one table called prices.
prices is a very large table (~2G records). Sample data, query and indexes included below. I realize the table being queried is very large, but for a query this simple I would have expected better performance (currently ~5 mins and running). I did notice InnoDB Buffer Usage appears to be at 100% per MySQL Workbench, with ~6K InnoDB reads per second. Guidance on adjustments to make to correct this.
prices
dataDate ticker optionSymbol expDate type price strike last bid ask volume OI
2002-02-08 AAPL AAQ020216C00005000 2002-02-16 call 24.03 5 0 18.8 19.1 0 0
2002-02-08 AAPL AAQ020216P00005000 2002-02-16 put 24.03 5 0 0 0.05 0 0
2002-02-08 AAPL AAQ020216C00007500 2002-02-16 call 24.03 7.5 0 16.3 16.6 0 0
2002-02-08 AAPL AAQ020216P00007500 2002-02-16 put 24.03 7.5 0 0 0.05 0 0
2002-02-08 AAPL AAQ020216C00010000 2002-02-16 call 24.03 10 12.2 13.9 14.2 0 1
2002-02-08 AAPL AAQ020216P00010000 2002-02-16 put 24.03 10 0 0 0.05 0 0
2002-02-08 AAPL AAQ020216C00012500 2002-02-16 call 24.03 12.5 13.5 11.4 11.7 0 8
2002-02-08 AAPL AAQ020216P00012500 2002-02-16 put 24.03 12.5 0.05 0 0.05 0 50
2002-02-08 AAPL AAQ020216C00015000 2002-02-16 call 24.03 15 7.1 8.9 9.1 0 10
2002-02-08 AAPL AAQ020216P00015000 2002-02-16 put 24.03 15 0.1 0 0.05 0 30
2002-02-08 AAPL AAQ020216C00017500 2002-02-16 call 24.03 17.5 5.5 6.4 6.7 0 371
2002-02-08 AAPL AAQ020216P00017500 2002-02-16 put 24.03 17.5 0.05 0 0.05 0 147
2002-02-08 AAPL AAQ020216C00020000 2002-02-16 call 24.03 20 3.9 3.9 4.1 7 1064
2002-02-08 AAPL AAQ020216P00020000 2002-02-16 put 24.03 20 0.1 0 0.1 5 1448
2002-02-08 AAPL AAQ020216C00022500 2002-02-16 call 24.03 22.5 1.7 1.7 1.75 1551 7069
2002-02-08 AAPL AAQ020216P00022500 2002-02-16 put 24.03 22.5 0.2 0.15 0.25 136 3234
2002-02-08 AAPL AAQ020216C00025000 2002-02-16 call 24.03 25 0.3 0.1 0.35 105 4237
2002-02-08 AAPL AAQ020216P00025000 2002-02-16 put 24.03 25 1.25 1.2 1.35 629 589
2002-02-08 AAPL AAQ020216C00027500 2002-02-16 call 24.03 27.5 0.05 0 0.1 0 1097
Query
select *
from op.prices op
where ticker = 'AAPL'
and '2020-04-30' between date_add(expDate, INTERVAL 3 MONTH) and expDate
and '2020-04-30' = date_add(op.dataDate, INTERVAL 14 DAY);
解决方案
Instead of
'2020-04-30' = date_add(op.dataDate, INTERVAL 14 DAY);
Use
op.dataDate = date_sub('2020-04-30, INTERVAL 14 DAY);
Your first statement will be interpreted as "add 14 days to all dataDate
and return when that is 2020-04-30." This will require a full scan of the table.
The second statement will evaluate to: "return records where the dataDate
is 2020-04-16." This allows the engine to perform a seek on your index that begins with dataDate
.
Do whatever weird stuff you want to do to expDate
since that won't factor much into how the query engine will optimize.
其他提示
The formulation is the main problem. After rewriting the query, the index you have can be used for the entire WHERE
clause.
SELECT *
FROM op.prices
WHERE ticker = 'AAPL'
AND expDate >= '2020-04-30'
AND expDate < '2020-04-30' + INTERVAL 3 MONTH
AND dataDate = '2020-04-30' - INTERVAL 14 DAY
See Sargeable in Wikipedia. Phrased differently "Don't hide a column in a function call -- it may not be usable by an index."
Shrinking the file size will help some:
- Consider switching from 8-byte
DOUBLE
(about 16 significant digits) to 4-byteFLOAT
(about 7 significant digits) for the metrics. - A 1-byte
ENUM('put','call')
would save 3 bytes per row. - Normalizing the Option (a long string).
There may be more tips. To see how important shrinking is, answer these: How much RAM do you have? What is the value of innodb_buffer_pool_size
? How big (GB) is the table?
You really should have a PRIMARY KEY
.