Question

I have a SQLite3 DB which is about 3GB in size, I am trying to query it in order to bring up the latest order for a specific product between 2 dates.

Here is the query used to create the table:

CREATE TABLE "ProductOrders" (
    "ID" INTEGER NOT NULL  UNIQUE,
    "ProductID" INTEGER NOT NULL,
    "AdditionalInfo" TEXT,
    "OrderDateTime" DOUBLE NOT NULL,
    PRIMARY KEY ("ProductID", "OrderDateTime")
)

I created indexes on the ProductID and OrderDateTime in order to ensure no duplicates are in the table.

The query that I am currently using to do this is as follows:

Select ProductID, AdditionalInfo, OrderDateTime
  From ProductOrders a
  Where a.OrderDateTime = ( Select max(OrderDateTime)
                              From ProductOrders b
                              Where a.ProductID = b.ProductID
                                AND b.OrderDateTime < 40544.5
                                AND b.OrderDateTime > 40539.5
                          )

That query works fine and does exactly what I want, however it seems to be scanning the whole table for the 'a' part of the query. My trace is as follows:

0|0|0|SCAN TABLE ProductOrders AS a
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE ProductOrders AS b USING COVERING INDEX sqlite_autoindex_ProductOrders_2 (ProductID=? AND OrderDateTime>? AND OrderDateTime<?)

It is taking at least 2 minutes to run which is impossible to work with. I notice that in the structure the ProductID is showing as Primary Key 1 and the OrderDateTime is showing as Primary Key 2, could this be a reason?

No correct solution

OTHER TIPS

I expect your query would perform better if you rewrote it as a JOIN:

SELECT  a.ProductID, a.AdditionalInfo, a.OrderDateTime
FROM    ProductOrders a
        INNER JOIN
        (   SELECT  b.ProductID, MAX(b.OrderDateTime) AS OrderDateTime
            FROM    ProductOrders b
            WHERE   b.OrderDateTime < 40544.5
            AND     b.OrderDateTime > 40539.5
            GROUP BY b.ProductID
        ) b
            ON a.ProductID = b.ProductID
            AND a.OrderDateTime = b.OrderDateTime;

With a correlated subquery the subquery is reevaluated for every record in ProductOrders (this is why you are seeing a full table scan), whereas with the JOIN version the subquery is evaluated only once, and the result is reused as necessary.

In this case, because your outer query is very wide (the whole table) and your subquery is much narrower (filtered by date), the join method will be better, if your outer query was very narrow (i.e only expected to return a few rows), then the correlated subquery version would be better as it not beneficial to perform the subquery for all values when only a few results are needed.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top