Group By making query astronomically longer

https://stackoverflow.com/questions/13844705

07-12-2021
|

Question

*As a first note, I only have read access to my server. Just, FYI as it seems to come up a lot...

Server:DB2(6.1) for i (IBM)

I have a query I'm running on a table that has 19mil rows in it (I don't design them, I just query them). I've been limiting my return data to 10 rows (*) until I get this query sorted out so that return times are a bit more reasonable.

The basic design is that I need to get data about categories of products we sell on a week by week basis, using columns: WEEK_ID, and CATEGORY. Here's example code (with some important bits #### out.)

SELECT WEEK_ID, CATEGORY
FROM DWQ####.SLSCATW
INNER JOIN DW####.CATEGORY
ON DWQ####.SLSCATW.CATEGORY_NUMBER = DW####.CATEGORY.CATEGORY_NUMBER
WHERE WEEK_ID  
BETWEEN 200952 AND 201230 --Format is year/week
GROUP BY WEEK_ID, CATEGORY

If I comment out that last line I can get back 100 rows in 254 ms. If I put that line back in my return takes longer than I've had patience to wait for :-). (Longest I've waited is 10 minutes.)

This question has two parts. The first question is quite rudimentary: Is this normal? There are 50 categories (roughly) and 140 weeks (or so) that I'm trying to condense down to. I realize that's a lot of info to condense off of 19mil rows, but I was hoping limiting my query to 10 rows returned would minimize the amount of time?

And, if I'm not just a complete n00b, and this in fact should not take several minutes, what exactly is wrong with my SQL?

I've Googled WHERE statement optimization and can't seem to find anything. All links and explanation are more than welcome.

Apologies for such a newbie post... we all have to start somewhere, right?

(*)using SQLExplorer, my IDE, an Eclipse implementation of Squirrel SQL.

Solution

I'm not sure how the server handles group by when there's no aggregating functions in the query. Based on your answers in the comments, I'd just try to add those:

SELECT
    ...,
    SUM(SalesCost) as SalesCost,
    SUM(SalesDollars) as SalesDollars
FROM
    ...

Leave the rest of the query as is.

If that doesn't solve the problem, you might have missing indexes. I would try to find out if there's an index where the WEEK_ID is the only column or where it is the first column. You could also check if you have another temporal column (i.e. TransactionDate or something similar) on the same table that already is indexed. If so, you could use that instead in the where clause.

Without correct indexes, the database server is forced to do a complete table scan, and that could explain your performance issues. 39 million rows does take some not insignificant amount of time to read from disk.

Also check that the data type of WEEK_ID is int or similar, just to avoid unnecessary casting in your query.

To avoid a table scan on the Category table, you need to make sure that Category_Number is indexed as well. (It probably already is, since I assume it is a key to that table.)

OTHER TIPS

Indices on WEEK_ID, CATEGORY (and possibly CATEGORY_NUMBER) is the only way to make it really fast, so you need to convince the DBO to introduce those.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow