Question

I have a clustered columned stored index on table used for logging - only inserts (but not bulk inserts). The current table stats are:

  • 3541 millions rows
  • 6.6 GB reserved space

I see this morning the following operation via sp_whoisactive:

ALTER INDEX [...] ON [...].[...] 
REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = COLUMNSTORE_ARCHIVE);

I use the following query to check how many rows per row_group_id we have:

SELECT
    tables.name AS table_name,
    indexes.name AS index_name,
    partitions.partition_number,
    dm_db_column_store_row_group_physical_stats.row_group_id,
    dm_db_column_store_row_group_physical_stats.total_rows,
    dm_db_column_store_row_group_physical_stats.deleted_rows,
    dm_db_column_store_row_group_physical_stats.state_desc,
    dm_db_column_store_row_group_physical_stats.trim_reason_desc
FROM sys.dm_db_column_store_row_group_physical_stats
INNER JOIN sys.indexes
ON indexes.index_id = 
    dm_db_column_store_row_group_physical_stats.index_id
AND indexes.object_id = 
    dm_db_column_store_row_group_physical_stats.object_id
INNER JOIN sys.tables
ON tables.object_id = indexes.object_id
INNER JOIN sys.partitions
ON partitions.partition_number = 
    dm_db_column_store_row_group_physical_stats.partition_number
AND partitions.index_id = indexes.index_id
AND partitions.object_id = tables.object_id

and we 3383 row groups with 1048576 rows and few at end like this:

enter image description here

The issue is that we are using standard edition (on premise) and the rebuild operation is not performed online and cause a lot of blocking.

I have never seen such issue before. A few weeks ago we have upgraded from SQL Server 2016 SP1 to SQL Server 2019.

My questions are:

  • if only inserts are applied, should be the operation reorganize and be faster
  • if not, if we apply partitioning, for example on year basis as the table is used for logging, will the automation process only rebuild the data for the last partition

enter image description here

Was it helpful?

Solution

I have a clustered columned stored index on table used for logging - only inserts

if only inserts are applied, should be the operation reorganize and be faster

You should not even bother. Reorganize for a columnstore does:

Physically removes rows from a rowgroup when 10% or more of the rows have been logically deleted. The deleted bytes are reclaimed on the physical media. For example, if a compressed row group of 1 million rows has 100K rows deleted, SQL Server will remove the deleted rows and recompress the rowgroup with 900k rows. It saves on the storage by removing deleted rows.

Combines one or more compressed rowgroups to increase rows per rowgroup up to the maximum of 1,048,576 rows. For example, if you bulk import 5 batches of 102,400 rows you will get 5 compressed rowgroups. If you run REORGANIZE, these rowgroups will get merged into 1 compressed rowgroup of size 512,000 rows. This assumes there were no dictionary size or memory limitations.

For rowgroups in which 10% or more of the rows have been logically deleted, the Database Engine tries to combine this rowgroup with one or more rowgroups. For example, rowgroup 1 is compressed with 500,000 rows and rowgroup 21 is compressed with the maximum of 1,048,576 rows. Rowgroup 21 has 60% of the rows deleted which leaves 409,830 rows. The Database Engine favors combining these two rowgroups to compress a new rowgroup that has 909,830 rows.

Considerations specific to reorganizing a columnstore index

So all it will do is combine the open row groups into a new compressed rowgroup. Then the next time you insert anything you'll get new open rowgroups. So there's no real benefit to REORGANIZE in your scenario.

As J.D. advises, you might partition this table so if you want to apply archival compression only to older partitions. But your compression is already quite good.

OTHER TIPS

Reorganizing your indexes will always be faster than a full rebuild, and typically the difference in performance gains from one vs the other is more bang for your buck with reorganizing.

If you use Partitioning, then you can specify which partitions you want to rebuild and/or reorganize. Partitioning is a good solution to break up large tables / indexes to improve performance with maintenance tasks on your data. So yes you can choose to only rebuild / reorganize the last partition, or schedule whichever partitions you want to maintain at any interval.

I'm not sure if when you said you have "3541 millions rows" you meant 3.5 billion rows, because 6.6 GB reserved space is quite surprisingly small for that many rows, but in my opinion once you start surpassing about half a billion rows in a single table is when Partitioning may be a good choice to look into implementing.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top