Вопрос

Using SQL Server 2008.

(Sorry if this turns out to be an article but I'm trying to give as much info as possible.)

I have multiple locations which each contain multiple departments which each contain multiple Items which can have zero to many scans. Each scan relates to a specific operation which may or may not have a cutoff time. Each item also belongs to a specific package which belongs to a specific job which belongs to a specific project with belongs to a specific client. Each job contains one or more packages which contains one or more items.

                                        +=============+     +=============+
                                        |   Projects  | --> |   Clients   |
                                        +=============+     +=============+
                                              ^
                                              |
+=============+                         +=============+
|  Locations  |                         |     Jobs    |
+=============+                         +=============+
      ^                                       ^
      |                                       |
+=============+     +=============+     +=============+
| Departments | <-- |    Items    | --> |   Packages  |
+=============+     +=============+     +=============+
                          ^
                          |
                    +=============+     +=============+
                    |    Scans    | --> | Operations  |
                    +=============+     +=============+

There are roughly 24,000,000 records in the items table and roughly 48,000,000 records in the scans table. New items are sporadically bulk inserted into the database throughout the day, usually in the tens of thousands at a pop. New scans are bulk inserted every hour, anywhere from a few hundred to a few hundred thousand per.

These tables are heavily queried, sliced and diced every which way. I was writing very specific stored procs but it turned into a maintenance nightmare as I was on the verge of a hundred stored procs with no end in site (e.g. something akin to ScansGetDistinctCountByProjectIDByDepartmentIDGroupedByLocationID, ScansGetDistinctCountByPackageIDByDepartmentIDGroupedByLocationID, etc.) As luck would have it, the requirements change (what feels like) almost daily and every time I have to change/add/delete a column, well...I end up at the bar.

So I created an indexed view and a handful of generic stored procs with parameters to determine filtering and grouping. Unfortunately, performance went down the toilet. I guess the first question is, since select performance is paramount, should I just stick with the specific approach and fight through changes to the underlying tables? Or, can something be done to speed up the indexed view/generic query approach? On top of easing the maintenance nightmare, I was actually hoping that the indexed view would improve performance as well.

Here is the code to generate the view:

CREATE VIEW [ItemScans] WITH SCHEMABINDING AS

SELECT
    p.ClientID          
    , p.ID        AS [ProjectID]            
    , j.ID        AS [JobID]
    , pkg.ID      AS [PackageID]
    , i.ID        AS [ItemID]       
    , s.ID        AS [ScanID]
    , s.DateTime
    , o.Code
    , o.Cutoff
    , d.ID        AS [DepartmentID]
    , d.LocationID
    -- other columns
FROM
    [Projects] AS p
    INNER JOIN [Jobs] AS j
        ON p.ID = j.ProjectID
    INNER JOIN [Packages] AS pkg
        ON j.ID = pkg.JobID
    INNER JOIN [Items] AS i
        ON pkg.ID = i.PackageID
    INNER JOIN [Scans] AS s
        ON i.ID = s.ItemID
    INNER JOIN [Operations] AS o
        ON s.OperationID = o.ID
    INNER JOIN [Departments] AS d
        ON i.DepartmentID = d.ID;   

and the clustered index:

CREATE UNIQUE CLUSTERED INDEX [IDX_ItemScans] ON [ItemScans]
(
    [PackageID] ASC,
    [ItemID] ASC,
    [ScanID] ASC
)

Here's one of the generic stored procs. It gets a count of items that have been scanned and have a cutoff:

PROCEDURE [ItemsGetFinalizedCount] 
    @FilterBy       int = NULL
    , @ID           int = NULL
    , @FilterBy2    int = NULL 
    , @ID2          sql_variant = NULL  
    , @GroupBy      int = NULL        
WITH RECOMPILE
AS
BEGIN

    SELECT
        CASE @GroupBy           
            WHEN 1 THEN
                CONVERT(sql_variant, LocationID)
            WHEN 2 THEN
                CONVERT(sql_variant, DepartmentID)
            -- other cases
       END AS [ID]
       , COUNT(DISTINCT ItemID) AS [COUNT]
    FROM
        [ItemScans] WITH (NOEXPAND)
    WHERE       
        (@ID IS NULL OR
        @ID = CASE @FilterBy            
            WHEN 1 THEN         
                ClientID
            WHEN 2 THEN
                ProjectID
            -- other cases
        END) 
        AND (@ID2 IS NULL OR
        @ID2 = CASE @FilterBy2          
            WHEN 1 THEN         
                CONVERT(sql_variant, ClientID)
            WHEN 2 THEN
                CONVERT(sql_variant, ProjectID)
            -- other cases
        END)
        AND Cutoff IS NOT NULL
    GROUP BY
        CASE @GroupBy           
            WHEN 1 THEN
                CONVERT(sql_variant, LocationID) 
            WHEN 2 THEN
                CONVERT(sql_variant, DepartmentID)
            -- other cases
        END
END

The first time I ran the query and looked at the actual execution plan, I created the missing index that it suggested:

CREATE NONCLUSTERED INDEX [IX_ItemScans_Counts] ON [ItemScans]
(
    [Cutoff] ASC
)
INCLUDE ([ClientID],[ProjectID],[JobID],[ItemID],[SegmentID],[DepartmentID],[LocationID]) 

Creating the index took the execution time down to about five seconds but that is still unacceptable (the "specific" version of the query runs subsecond.) I've tried adding different columns to the index instead of just including them with no gain in performance (doesn't really help that I have no idea what I am doing at this point.)

Here is the query plan:

queryplan

And here are the details for that first index seek (it appears to return all of the rows in the view where Cutoff IS NOT NULL):

operation

Это было полезно?

Решение

A generic proc may not be a bad idea in this case, but you don't have to put all those cases into the final query as you are currently doing. I would try building your "specific queries" using dynamic SQL in your generic proc, much the same way Gail Shaw builds a "catch-all" query here:

SQL in the Wild - Catch-all queries

This way, you're able to cache query plans and utilize indexes as shown in the blog post, and you should be able to get that same sub-second performance that you're after.

Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top