Domanda

There is something I do not get about SQL Server and Indexes. I was working on a table last night that contains 100M rows. I created the following index:

CREATE NONCLUSTERED INDEX [x_acct_x_date_x_type] ON [mail_master] 
(
    [letter_acct] ASC,
    [letter_date] ASC,
    [letter_type] ASC
)

I do not normally create indexes with 3 columns involved. My select statement out of this table for production took 6 seconds, with a WHERE clause that utilizes each of those 3 fields. I referred my code and index to a co-worker who is a bit old school for advice on optimization, and he recommended dropping letter_type. We then ran the same code that took 6 seconds with the replaced index applied to two fields and it now takes 0 seconds.

I asked him why, and he couldn't really give me an answer other than the data at rest with my index is larger than the modified index. Which he is absolutely right, but i really don't see why it would 0 seconds now.

Can anyone tell me why this is happening? Thank you in advance.

Here is the CREATE TABLE statement:

CREATE TABLE [mail_master](
    [client_acct] [varchar](4) NULL,
    [letter_acct] [varchar](11) NULL,
    [letter_date] [datetime] NULL,
    [letter_type] [varchar](25) NULL,
    [letter_balance] [money] NULL,
    [special] [varchar](35) NULL,
    [call] [datetime] NULL,
    [mail_return] [varchar](1) NULL,
    [payment_date] [datetime] NULL,
    [post_date] [datetime] NULL,
    [promise] [datetime] NULL,
    [age] [int] NULL
) ON [PRIMARY]

Here is the tsql code in question:

    DECLARE @ClientTable AS TABLE (
        client_acct VARCHAR(4),
        client_name VARCHAR(40),
        grade VARCHAR(2),
        acct_type VARCHAR(20)
    )

INSERT INTO @ClientTable (
    client_acct,
    client_name,
    grade,
    acct_type
    )
SELECT client_acct_info_t.client_acct,
    client_name,
    grade,
    acct_type
FROM client_acct_info_t,
    client_master_t
WHERE client_master_t.client_acct = client_acct_info_t.client_acct
    AND acct_status = 'A'

SELECT mail_master.client_acct AS 'Client #',
    client_name AS 'Client Name',
    COUNT(*),
    SUM(total_payments) AS 'Total Payments',
    SUM(sum_payments) AS 'Total Payment Dollars'
FROM mail_master,
    @ClientTable AS ClientTable
WHERE mail_master.client_acct = ClientTable.client_acct
    AND letter_date >= '03/01/2014'
    AND letter_date <= '03/25/2014'
    AND letter_type = 'PRECOLLECT'
    AND letter_balance >= 100
    AND letter_balance <= 1000
GROUP BY mail_master.client_acct,
    client_name
È stato utile?

Soluzione

The key to using a multi-column index is for the query to be what is called Sargable, which comes from S earch Arg ument able. Multi-column indexes are sorted primarily by the first column, with ties being sorted by the second column, etc.

In logical order, a three column index would be sorted like this:

 first   second   third
 1       1        1
 1       1        2
 1       1        3
 1       2        1
 1       5        2
 2       1        5
 2       2        1

So, in order to seek to a specific part of the index, the query has to have a value for the first column, and to use the second column in the index, it has to have an exact value for the first column.* If a column has an inequality or range filter, then it can use the index for that column, but not for any columns after that.

From looking at the query, we can tell that if the index was used, it was a full scan, meaning it wasn't really used as an index. You can view the execution plan and look for seek vs scan to tell for sure. Subsequent runs would be faster because the data is cached in memory, so it doesn't have to read it from disk again.

Looking at your query, you have client_acct and letter_type as exact comparisons, so I would use those as the first two columns, the more selective one first, so I'd think client_acct. For the third column, I would guess that letter_date would be more selective, so I suggest that. I would also INCLUDE the letter_balance column on the index so that it can filter rows that don't fit, even if it can't seek to those rows. Also, there are multiple ways that SQL can execute the query, so this isn't necessarily the best possible index, but I'd expect it to be reasonably good.

It isn't clear where total_payments and sum_payments are coming from, but I'm going to assume that they come from the client side. In this case the index is covering, which means that the query can get all of the information that it needs from the index and never needs to look back and the main table.

*True for SQL Server. Some RDMS's can make use of an index even if an earlier column isn't exact, but it is still best to be exact if possible.

Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top