Question

Consider the following query:

SELECT * FROM Transactions
WHERE day(Stamp - interval 3 hour) = 1;

The Stamp column in the Transactions table is a TIMESTAMP and there is an index on it. How could I change this query so it avoids full table scans? (that is, using Stamp outside of the day() function)

Thanks!

Was it helpful?

Solution

This is how I would do it:

add some extra fields: YEAR, MONTH, DAY or even HOUR, MINUTE depending on the traffic you expect. Then build a trigger to populate the extra fields, maybe subtracting the 3 hour interval in advance. Finally build some index on the extra fields.

OTHER TIPS

If the goal is just to avoid full table scans and you have a PRIMARY KEY (say named PK) for Transactions, consider adding covering index

ALTER TABLE Transactions ADD INDEX cover_1 (PK, Stamp)

Then

SELECT * FROM Transactions WHERE PK IN (SELECT PK FROM Transactions
WHERE day(Stamp - interval 3 hour) = 1
 )

This query should not use full table scans (however optimizer may decide to use full scan, if number of rows in table is small or for whatever other statistical reason :) )

Better way may be is to use temporary table instead of subquery.

You can often rewrite the function so you have something that looks like WHERE Stamp=XXXX and XXXX is some expression. You could create a series of BETWEEN statements for each month, WHERE Stamp BETWEEN timestamp('2010-01-01 00:00:00') AND timestamp ('2010-01-01 23:59:59') OR Stamp BETWEEN ..., but I'm not certain this would use the index in this case. I'd build a column that was the day of the month as @petr suggests.

Calculate your desired Stamp value separately before you run your main query, i.e.

Step 1 - calculate the desired Stamp value

Step 2 - run a query where Stamp > (calculated value)

Because there's no calculation in step 2, you should be able to use your index.

If I understand it correctly, you basically want to return all rows where the stamp falls on the first in each month (having subtracted the 3 hours)? If (and this is a big if), you have a fixed window of, say the latest 6 months, you could just enumerate 6 range tests. But still, I'm not sure indexed access will be faster anyways.

select *
  from transactions
 where stamp between timestamp '2010-06-01 03:00:00' and timestamp '2010-06-02 02:59:59'
    or stamp between timestamp '2010-07-01 03:00:00' and timestamp '2010-07-02 02:59:59'
    or stamp between timestamp '2010-08-01 03:00:00' and timestamp '2010-08-02 02:59:59'
    or stamp between timestamp '2010-09-01 03:00:00' and timestamp '2010-09-02 02:59:59'
    or stamp between timestamp '2010-10-01 03:00:00' and timestamp '2010-10-02 02:59:59'
    or stamp between timestamp '2010-11-01 03:00:00' and timestamp '2010-11-02 02:59:59'
    or stamp between timestamp '2010-12-01 03:00:00' and timestamp '2010-12-02 02:59:59';

NB! I'm not sure how the millisecond part of the timestamp works. You may need to pad it accordingly.

Reworking petr's answer a bit to avoid the IN clause, and to make it for MyISAM or InnoDB.

For MyISAM

ALTER TABLE Transactions ADD INDEX cover_1 (PK, Stamp)

Or, for InnoDB, where the PK is implicitly included in every index,

ALTER TABLE Transactions ADD INDEX Stamp (Stamp)

Then

SELECT * 
FROM Transactions LEFT JOIN
  (
  SELECT PK 
  FROM Transactions 
  WHERE DAYOFMONTH(Stamp - interval 3 hour) = 1
  ) a ON Transactions.PK=a.PK

The subquery will have an index only execution, and the outer query will only pull the rows from the table where a.PK came through.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top