Question

It's my understanding that nulls are not indexable in DB2, so assuming we have a huge table (Sales) with a date column (sold_on) which is normally a date, but is occasionally (10% of the time) null.

Furthermore, let's assume that it's a legacy application that we can't change, so those nulls are staying there and mean something (let's say sales that were returned).

We can make the following query fast by putting an index on the sold_on and total columns

Select * from Sales 
where 
Sales.sold_on between date1 and date2
and Sales.total = 9.99

But an index won't make this query any faster:

Select * from Sales 
where 
Sales.sold_on is null
and Sales.total = 9.99

Because the indexing is done on the value.

Can I index nulls? Maybe by changing the index type? Indexing the indicator column?

Was it helpful?

Solution

I'm no DB2 expert, but if 10% of your values are null, I don't think an index on that column alone will ever help your query. 10% is too many to bother using an index for -- it'll just do a table scan. If you were talking about 2-3%, I think it would actually use your index.

Think about how many records are on a page/block -- say 20. The reason to use an index is to avoid fetching pages you don't need. The odds that a given page will contain 0 records that are null is (90%)^20, or 12%. Those aren't good odds -- you're going to need 88% of your pages to be fetched anyway, using the index isn't very helpful.

If, however, your select clause only included a few columns (and not *) -- say just salesid, you could probably get it to use an index on (sold_on,salesid), as the read of the data page wouldn't be needed -- all the data would be in the index.

OTHER TIPS

From where did you get the impression that DB2 doesn't index NULLs? I can't find anything in documentation or articles supporting the claim. And I just performed a query in a large table using a IS NULL restriction involving an indexed column containing a small fraction of NULLs; in this case, DB2 certainly used the index (verified by an EXPLAIN, and by observing that the database responded instantly instead of spending time to perform a table scan).

So: I claim that DB2 has no problem with NULLs in non-primary key indexes.

But as others have written: Your data may be composed in a way where DB2 thinks that using an index will not be quicker. Or the database's statistics aren't up-to-date for the involved table(s).

The rule of thumb is that an index is useful for values up on to 15% of the records. ... so an index might be useful here.

If DB2 won't index nulls, then I would suggest adding a boolean field, IsSold, and set it to true whenever the sold_on date gets set (this could be done in a trigger).

That's not the nicest solution, but it might be what you need.

Troels is correct; even rows with a SOLD_ON value of NULL will benefit from an index on that column. If you're doing ranged searches on SOLD_ON, you may benefit even more by creating a clustered index that begins with SOLD_ON. In this particular example, it may not require much additional overhead to maintain the clustering order based on SOLD_ON, since newer rows added will most likely have a newer SOLD_ON date.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top