What key columns to use on filtered index with covering WHERE clause?

Question 1

Because you have a clustered index on that table it doesn't really matter what you put in the key columns of that index; meaning Id is there free of charge. The only thing you can do is include everything in the included section of the index to actually have data handy at the leaf level of the index to exclude key lookups to the table. Or, if the queue is huge, then, perhaps, some other column would be useful in the key section.

Now, if that table didn't have a primary key then you would have to include or specify as key columns all the columns that you need for joining or other purposes. Otherwise, RID lookups on heap would occur because on the leaf level of indexes you would have references to data pages.

Question 2

What percentage of the table does this filtered index cover? If it's small, you may want to cover the entire table to handle the "SELECT *" from the index without hitting the table. If it's a large portion of the table though this would not be optimal. Then I'd recommend using the clustered index or primary key. I'd have to research more because I forget which is optimal right now but if they're the same you should be set.

Question 3

I suggest you declare it as follows

CREATE INDEX IX_Invoice_IsProcessed_IsInvalidated
ON Invoice (Id)
INCLUDE (Data)
WHERE (IsProcessed = 0 AND IsInvalidated = 0)

The INCLUDE clause will mean that the Values of the Data column will be stored as part of the index.

If you didn't have an INCLUDE clause then the query plan for

SELECT Id, Data
FROM Invoice
WHERE IsProcessed = 0 AND IsInvalidated = 0

would involve a two step process

use the index to find the list of primary key values that match the criteria
get the data from the table that match those primary keys

If, on the other hand, the index includes the [Data] column then it will properly cover the query as there will be no need to look up the data using the primary keys

You don't get something for nothing though

The downside to this is that you will be storing the varchar(MAX) data twice for these records so there will need to be more data written to the database and more storage will be used although this isn't so much of a problem if you're only talking about a small section of the data.

As always the more time and effort you put into putting things away carefully the faster and easier it is to get them back.