Question

I'm creating a filtered index such that the WHERE filter includes the complete query criteria. WIth such an index, it seems that a key column would be unnecessary, though SQL requires me to add one. For example, consider the table:

CREATE TABLE Invoice
(
    Id INT NOT NULL IDENTITY PRIMARY KEY,
    Data VARCHAR(MAX) NOT NULL,
    IsProcessed BIT NOT NULL DEFAULT 0,
    IsInvalidated BIT NOT NULL DEFAULT 0
)

Queries on the table look for new invoices to process, i.e.:

SELECT *
FROM Invoice
WHERE IsProcessed = 0 AND IsInvalidated = 0

So, I can tune for these queries with a filtered index:

CREATE INDEX IX_Invoice_IsProcessed_IsInvalidated
ON Invoice (IsProcessed)
WHERE (IsProcessed = 0 AND IsInvalidated = 0)
GO

My question: What should the key column(s) for IX_Invoice_IsProcessed_IsInvalidated be? Presumably the key column isn't being used. My intuition leads me to pick a column that is small and will keep the index structure relatively flat. Should I pick the table primary key (Id)? One of the filter columns, or both of them?

Was it helpful?

Solution

Because you have a clustered index on that table it doesn't really matter what you put in the key columns of that index; meaning Id is there free of charge. The only thing you can do is include everything in the included section of the index to actually have data handy at the leaf level of the index to exclude key lookups to the table. Or, if the queue is huge, then, perhaps, some other column would be useful in the key section.

Now, if that table didn't have a primary key then you would have to include or specify as key columns all the columns that you need for joining or other purposes. Otherwise, RID lookups on heap would occur because on the leaf level of indexes you would have references to data pages.

OTHER TIPS

What percentage of the table does this filtered index cover? If it's small, you may want to cover the entire table to handle the "SELECT *" from the index without hitting the table. If it's a large portion of the table though this would not be optimal. Then I'd recommend using the clustered index or primary key. I'd have to research more because I forget which is optimal right now but if they're the same you should be set.

I suggest you declare it as follows

CREATE INDEX IX_Invoice_IsProcessed_IsInvalidated
ON Invoice (Id)
INCLUDE (Data)
WHERE (IsProcessed = 0 AND IsInvalidated = 0)

The INCLUDE clause will mean that the Values of the Data column will be stored as part of the index.

If you didn't have an INCLUDE clause then the query plan for

SELECT Id, Data
FROM Invoice
WHERE IsProcessed = 0 AND IsInvalidated = 0

would involve a two step process

  • use the index to find the list of primary key values that match the criteria
  • get the data from the table that match those primary keys

If, on the other hand, the index includes the [Data] column then it will properly cover the query as there will be no need to look up the data using the primary keys

You don't get something for nothing though

The downside to this is that you will be storing the varchar(MAX) data twice for these records so there will need to be more data written to the database and more storage will be used although this isn't so much of a problem if you're only talking about a small section of the data.

As always the more time and effort you put into putting things away carefully the faster and easier it is to get them back.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top