Question

Clustered column store indexed tables in general are useful for large tables. Ideally with million of rows. And also useful with queries, which selects only the subset of available columns in such tables.

What happens if we break these two "rules"/best practices?

  1. Like having a clustered column store indexed table which will only store few thousand, or hundreds of thousands of rows max.
  2. And running queries against those clustered column store table where all the columns are needed.

My tests don't reveal any performance degradation comparing to row stored clustered index table. Which is great in our case.

Is there any "long term" effects breaking these two rules? Or any hidden pitfalls which haven't showed up just yet?

Context why is it needed: I designed a database model which will be used for many instances of different vendor databases. The schema remains the same in every database, but different vendors have different amount of data. Hence few small vendors may end up with small amount of data (<1 000 000) in their tables. I can't allow myself to keep up two different database for row-store and column-store model.

Was it helpful?

Solution

To @YunusUYANIK point for the potential downfalls of designing your schema catered to just one side, why not create both rowstore and columnstore indexes on your table catered appropriately to both scenarios? Sure you may end up indexing the same fields both ways, but the main drawback there would just be the increased use of storage space which generally is much less of a concern when planning for performance.

It will depend on your schema and the amount of data in your tables for each vendor, so you'll have to test to ensure your design of the indexes are being used in the appropriate queries for the different amounts of data based on your vendor predicates. In the worst case, you might have to use index hints sometimes too, but I feel that if you design both types of indexes correctly, that is not very likely.

OTHER TIPS

Columnstore index has a big advantage on compress data size. The general aim of the Columnstore index is quickly read a bunch of data due to its compression.

enter image description here

CCI is Columnstore Clustered Index, Clustered is Clustered Index

Columnstore Index compresses data size from 4MB to 2MB.

We can look at the performance in two tables and three parts.

The first one is the minimal SELECT operation:

SELECT * FROM Users_CCI WITH(INDEX=CCI_Users) WHERE Id=12333

enter image description here

enter image description here

SELECT * FROM Users_Clustered WHERE Id=12333

enter image description here

enter image description here

Result: There is a Columnstore Scan here and wrong estimated. Further, logical read difference. You can say it doesn't matter for you, but, If you use a minimal SELECT query, probably use thousands of times. And it will affect total performance.

The second one is the minimal UPDATE operation:

UPDATE Users_CCI SET Age=10 WHERE  Id=2

enter image description here

UPDATE Users_Clustered SET Age=10 WHERE  Id=2

enter image description here

Result: There are read, CPU, and time differences as we can see.

The third one is the REBUILD operation:

USE [StackOverflow2013]
GO
ALTER INDEX [CCI_Users] ON [dbo].[Users_CCI] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = COLUMNSTORE)
GO
USE [StackOverflow2013]
GO
ALTER INDEX [PK_Users_Clustered_Id] ON [dbo].[Users_Clustered] REBUILD PARTITION = ALL WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
GO

UPDATE Users_CCI SET Age=10 
UPDATE Users_Clustered SET Age=10 

If I rebuild and update all rows for index fragmentation, I will see a more fragmented Clustered Columnstore Index than the Clustered Index. And I didn't show, but the Clustered Columnstore Index Rebuild process produces transaction log more than Clustered Index.

enter image description here

enter image description here

Like the document says

  • More than 10% of the operations on the table are updates and deletes. Large numbers of updates and deletes cause fragmentation.The fragmentation affects compression rates and query performance until you run an operation called reorganize that forces all data into the columnstore and removes fragmentation. For more information, see Minimizing index fragmentation in columnstore index.

If you have a small table you don't need to Columnstore index.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top