I'm looking to do a relatively simply filter from one table called prices.

prices is a very large table (~2G records). Sample data, query and indexes included below. I realize the table being queried is very large, but for a query this simple I would have expected better performance (currently ~5 mins and running). I did notice InnoDB Buffer Usage appears to be at 100% per MySQL Workbench, with ~6K InnoDB reads per second. Guidance on adjustments to make to correct this.

prices

dataDate     ticker  optionSymbol    expDate     type    price   strike  last    bid     ask     volume  OI
2002-02-08   AAPL    AAQ020216C00005000 2002-02-16   call   24.03   5   0   18.8    19.1    0   0
2002-02-08   AAPL    AAQ020216P00005000 2002-02-16   put    24.03   5   0   0   0.05    0   0
2002-02-08   AAPL    AAQ020216C00007500 2002-02-16   call   24.03   7.5 0   16.3    16.6    0   0
2002-02-08   AAPL    AAQ020216P00007500 2002-02-16   put    24.03   7.5 0   0   0.05    0   0
2002-02-08   AAPL    AAQ020216C00010000 2002-02-16   call   24.03   10  12.2    13.9    14.2    0   1
2002-02-08   AAPL    AAQ020216P00010000 2002-02-16   put    24.03   10  0   0   0.05    0   0
2002-02-08   AAPL    AAQ020216C00012500 2002-02-16   call   24.03   12.5    13.5    11.4    11.7    0   8
2002-02-08   AAPL    AAQ020216P00012500 2002-02-16   put    24.03   12.5    0.05    0   0.05    0   50
2002-02-08   AAPL    AAQ020216C00015000 2002-02-16   call   24.03   15  7.1 8.9 9.1 0   10
2002-02-08   AAPL    AAQ020216P00015000 2002-02-16   put    24.03   15  0.1 0   0.05    0   30
2002-02-08   AAPL    AAQ020216C00017500 2002-02-16   call   24.03   17.5    5.5 6.4 6.7 0   371
2002-02-08   AAPL    AAQ020216P00017500 2002-02-16   put    24.03   17.5    0.05    0   0.05    0   147
2002-02-08   AAPL    AAQ020216C00020000 2002-02-16   call   24.03   20  3.9 3.9 4.1 7   1064
2002-02-08   AAPL    AAQ020216P00020000 2002-02-16   put    24.03   20  0.1 0   0.1 5   1448
2002-02-08   AAPL    AAQ020216C00022500 2002-02-16   call   24.03   22.5    1.7 1.7 1.75    1551    7069
2002-02-08   AAPL    AAQ020216P00022500 2002-02-16   put    24.03   22.5    0.2 0.15    0.25    136 3234
2002-02-08   AAPL    AAQ020216C00025000 2002-02-16   call   24.03   25  0.3 0.1 0.35    105 4237
2002-02-08   AAPL    AAQ020216P00025000 2002-02-16   put    24.03   25  1.25    1.2 1.35    629 589
2002-02-08   AAPL    AAQ020216C00027500 2002-02-16   call   24.03   27.5    0.05    0   0.1 0   1097

Query

select *
from op.prices op
where ticker = 'AAPL'
and '2020-04-30' between date_add(expDate, INTERVAL 3 MONTH) and expDate
and '2020-04-30' = date_add(op.dataDate, INTERVAL 14 DAY);

Indexes enter image description here

有帮助吗?

解决方案

Instead of

'2020-04-30' = date_add(op.dataDate, INTERVAL 14 DAY);

Use

op.dataDate = date_sub('2020-04-30, INTERVAL 14 DAY);

Your first statement will be interpreted as "add 14 days to all dataDate and return when that is 2020-04-30." This will require a full scan of the table.

The second statement will evaluate to: "return records where the dataDate is 2020-04-16." This allows the engine to perform a seek on your index that begins with dataDate.

Do whatever weird stuff you want to do to expDate since that won't factor much into how the query engine will optimize.

其他提示

The formulation is the main problem. After rewriting the query, the index you have can be used for the entire WHERE clause.

SELECT  *
    FROM  op.prices
    WHERE  ticker = 'AAPL'
      AND expDate >= '2020-04-30'
      AND expDate  < '2020-04-30' + INTERVAL 3 MONTH
      AND dataDate = '2020-04-30' - INTERVAL 14 DAY

See Sargeable in Wikipedia. Phrased differently "Don't hide a column in a function call -- it may not be usable by an index."

Shrinking the file size will help some:

  • Consider switching from 8-byte DOUBLE (about 16 significant digits) to 4-byte FLOAT (about 7 significant digits) for the metrics.
  • A 1-byte ENUM('put','call') would save 3 bytes per row.
  • Normalizing the Option (a long string).

There may be more tips. To see how important shrinking is, answer these: How much RAM do you have? What is the value of innodb_buffer_pool_size? How big (GB) is the table?

You really should have a PRIMARY KEY.

许可以下: CC-BY-SA归因
不隶属于 dba.stackexchange
scroll top