How to reduce fragmentation of nonclustered index on date partitioned log table

https://dba.stackexchange.com/questions/282576

13-03-2021
|

Domanda

Looking for some advice before I follow my gut on this and rebuild the nonclustered index to lead with our partition key. This is an insert-only table which is never updated or deleted, we keep a sliding window my truncating/merging periods on the left and adding new periods on the right. The nonclustered index is constantly at 99% fragmentation due to the makeup of data we are inserted (many unique accountids and inventoryids). I'm wondering what the best way would be to keep a nonclustered index for looking by accountId without all the fragmentation or maybe I shouldn't worry about the fragmentation?

I'm aware that our table doesn't have an explicit unique index, we rely on the [UNIQUIFIER] that is added to changedAt automatically.

Usage pattern: Table is always queried with a date filter (ChangedAt)

Partition function:

CREATE PARTITION FUNCTION pf_Weekly_QuantityHistory (datetime2(2)) AS RANGE RIGHT FOR VALUES ( '01 Jul 2019','08 Jul 2019','15 Jul 2019',etc )

CREATE PARTITION SCHEME [ps_Weekly_QuantityHistory]  AS PARTITION [pf_Weekly_QuantityHistory] 
ALL TO ( [ExampleFG] );

CREATE TABLE [dbo].[QuantityHistory](
    [AccountId] [int] NOT NULL,
    [InventoryID] [int] NOT NULL,
    [QuantityBefore] [int] NULL,
    [QuantityAfter] [int] NULL,
    [ChangedAt] [datetime2](2) NOT NULL DEFAULT (getutcdate())
) ON [ps_Weekly_QuantityHistory](ChangedAt)

CREATE CLUSTERED INDEX [CX_QuantityHistory] ON [dbo].[QuantityHistory]
(
    [ChangedAt] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION=PAGE)

CREATE NONCLUSTERED INDEX [IX_AccountId] ON [dbo].[QuantityHistory]
(
    [AccountId] ASC,
    [InventoryId] ASC,
    [ChangedAt] ASC
) INCLUDE(QuantityBefore,QuantityAfter) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION=PAGE)

Soluzione

I'm wondering what the best way would be to keep a nonclustered index for looking by accountId without all the fragmentation

You've got the partition key in the nonclustered index, so it's on the partition scheme (by default), and only the head partition is going to get inserts and additional fragmentation.

So you can rebuild the older partitions rarely and they won't get fragmented, and rebuild the index for just the head partition on a more frequent basis.

But for an index with insertions spread across the sort order you'll always generate fragmentation. And fragmentation doesn't always matter. It really depends on the storage design and the workload.

Also you're keeping two complete copies of this table, one sorted by ChangedAt and one sorted by (AccountId,InventoryId,ChangedAt). You can probably just make the (AccountId,InventoryId,ChangedAt) index the clustered index, and only store the table one time.

Altri suggerimenti

The problem here is that your secondary nonclustered index is not partiotion aligned so when a new record is added it generates fragmentation because that index follow a different order. New records don't generate fragmentation for the clustered index and this is very good. You can't prevent fragmentation on indexes ordered by accountid like in the secondary.

You can eventually align the secondary index with the partition to take advantage of the partition switching to execute periodic cleaing and partition defragmentation. This bring less performance in the query when you look up for accountid records but usually it's sustainable in a log table. But it's depend on you goal.

If your storage is ssd based you might as well skip defragmentation and do only statistics update on the secondary index.

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a dba.stackexchange