How many partitions should I make for my clustered columnstore index tables? Should I partition the rowstore tables also?

https://dba.stackexchange.com/questions/220363

13-01-2021
|

Question

I have a data warehouse comprised of four clustered columnstore index tables (CCI) and nine rowstore tables. These tables are used only for analytics and the CCI data is inserted from staging tables every 15 minutes. I am looking to optimize query performance by adding partitions and sorting.

All queries of this data are predicated on an integer field with about 350 distinct values.The leftmost CCI has 100M records and 125 columns. There are three child CCIs that have that same integer field. CCI 2 has 15M records and 150 columns, CCI 3 and 4 both have about 30M records and 25 columns each.

Of these 350 distinct integers the distribution of record count in the leftmost table is as follows:

5% Greater than 1M
46% Greater than 100K
83% Greater than 10K

Additionally, there are nine other rowstore tables that also join to the CCIs. These have trickle inserts, are children of the CCIs, and they all contain the same integer field. These rowstores have similar or smaller record volumes, < 10 columns each, two contain LOBS, and two undergo mass-updates frequently (these updates are also predicated on the ID field).

How many partitions should I make?

Should I partition the rowstore tables also?

Are there important considerations I am overlooking?

Note regarding the "sorting" I mentioned earlier:

A date field in the leftmost CCI is often a secondary predicate in these queries, therefore I am looking into re-sorting that CCI by date every four weeks or so as maintenance. I will achieve this sort by dropping the CCI, adding a clustered rowstore index on the date, dropping that index, and then re-adding the CCI with MAXDOP=1. I am also looking at sorting the child CCIs by the join key to their parent.

Solution 2

Update having taken partitioning all the way to production:

Deciding the right partitioning for a clustered columnstore index (CCI) is a very bespoke process. If the wrong partitions are chosen, performance and compression will be worse than in a non-partitioned scheme.

Because I was partitioning four CCIs, I chose the CCI with the least records and divided its record count by 1,048,576 (the ideal CCI rowgroup size). I used that quotient as my proposed number of partitions. Then I ran record count queries based on that scheme to return the actual row counts per partition. This step was to make sure there was reasonably even distribution of records among the partitions. There was. Lucky me.

An obstacle appeared: this production analysis process helped me arrive at the correct number of partitions, but only for production. My lower environments are much smaller than production. The chosen level of partitioning sliced the data so fine that I had no full rowgroups at all. The lower databases got bigger and query times stayed the same. IO did go down dramatically and I had to point to that repeatedly as the gains of this initiative were questioned. It was hard to prove that partitioning was really going to help until I got to production.

The Outcome: Partitioning has been a big success in production. IO is way down and my query times have been reduced by 70% or more. I also have more options for managing these tables in small chunks.

Some notes: pick the correct field to partition on. If you have queries that have to navigate a lot of partitions you may find degraded performance. Also, I have left room for growth, adding partitions and ranges to my partition function for data that is not there now but will be one day.

Original answer from only local testing:

Since asking this question I have been doing more research and a POC locally. It was suggested I share this POC in an answer.

In my POC I chose to use a partition function of:

CREATE PARTITION FUNCTION [MyIntPF](int) 
AS RANGE LEFT 
FOR VALUES (
    N'50'
    , N'100'
    , N'150'
    , N'200'
    , N'250'
    , N'300'
    , N'350'
    , N'400'
    , N'450'
    , N'500'
);

CREATE PARTITION SCHEME [MyIntPS] 
AS PARTITION [MyIntPF] 
TO (
    [MyInt050fg]
    , [MyInt100fg]
    , [MyInt150fg]
    , [MyInt200fg]
    , [MyInt250fg]
    , [MyInt300fg]
    , [MyInt350fg]
    , [MyInt400fg]
    , [MyInt450fg]
    , [MyInt500fg]
    , [MyInt000fg]
);

This function assigns 50 MyInts to each partition with room for a little growth.

Remember that I have roughly 350 distinct MyInts across the 170M records in the PROD CCIs. David Browne suggested a minimum record size of 1M in a partition, which make sense in order to optimize a CCI compressed segment. I am erring larger for two reasons. The first reason is to avoid creating a 100 partition POC monster. The second is that I presume that 1M applies to each table in the partition. I am partitioning four columnstores, the smallest of which has 25M records. If I broke it into 100 pieces it would never achieve a full segment.

In my local development DB I have 2.2M records in the leftmost CCI, and even less than that in the child CCIs. This presents a problem for creating a realistic replication of PROD. I really should prioritize a little extra time to make a big local DB for this, but in the meantime, here are the before/after IO results of the local partition. I have queried for an aggregate from my leftmost CCI predicated on MyInt = a single value.

Not Partitioned

Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 1548, 
lob physical reads 0, lob read-ahead reads 44.
Segment reads 4, segment skipped 0.

Partitioned

Scan count 1, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 268, 
lob physical reads 0, lob read-ahead reads 0.
Segment reads 1, segment skipped 0.

As expected, SQL Server was able to skip all but one of my partitions in a query with a MyInt equality predicate.

I am continuing to work on this and should have time to update here as things progress.

OTHER TIPS

Benefits of partitioning a CCI:

Query performance can be improved because a minimum level of rowgroup elimination is guaranteed, despite how the data gets loaded or modified. Most of the generic SQL Server partitioning guidance out there doesn't take this into account.
Improved flexibility with maintenance operations in that you can do rebuilds at the partition level or do reorgs at the partition level (after partition switching out). You can also send different partitions to different filegroups, but I need to caution you that doing so will almost never improve performance. Filegroups are a maintenance feature. Increasing the file count can improve performance sometimes. Depending on your storage setup you almost certainly want the data relevant to your queries to be spread over multiple files to improve I/O.
Partition elimination covers more scenarios than rowgroup elimination on the same column. For example, a filter of WHERE ID < 0 OR ID > 10 will not quality for rowgroup elimination but will qualify for partition elimination.
Looping by partition can be helpful when performing maintenance operations that require all rows to be changed. For example, suppose you're adding a new column to a table that can be derived from existing columns in that table. Partitioning allows you to efficiently split that work into batches if desired.

Downsides of partitioning a CCI:

Without maintenance the number of rows in delta rowgroups can dramatically increase. Consider an unpartitioned CCI that is loaded with parallel inserts at MAXDOP 8. At most you'll have 4194304 rows in the delta store. If the table is changed to have 50 partitions it's now possible to have 209715200 rows in the delta store.
Query plans for inserts and deletes into the columnstore may contain a sort operator as a child of the DML operator. If this sort cannot get enough memory you can end up with extreme performance degradations. I recommend only modifying one partition at a time if using parallel insert.
If you choose your partition function unwisely you could end up with partitions that are too small. Many people will point you to the 1048576 row limit for a rowgroup as the ideal size, but personally I consider the benefits of getting there to be overblown. You probably do want to avoid many tiny partitions if you can help it though.
If you have too many partitions in your table or your database then bad things might happen. Unfortunately, this isn't very well defined and it's hard to find a credible source for what "too many partitions" actually means. I've heard of and seen issues with query compilation times. There was a recent answer here about DBCC CHECKTABLE as well.

Applying the above to your scenario: with the row counts that you have you shouldn't run into any of the really bad cases. For query performance, some folks need really fast query execution times and they need to skip as many rowgroups as possible. Others just need a minimum level of rowgroup elimination because most of the work done in the query is outside of the columnstore scans. That makes it difficult for someone on the outside to give you a recommendation for the number of partitions. For the 100 million table, anything from 4-100 could be reasonable.

You could try testing some of your queries with different numbers of rows in the partitions to see how performance changes. That can be simulated by creating copies of the tables or by creating a partition function on one table with deliberate skewness and changing what ID you filter by. If you take what results in good enough query performance and verify that you won't have any issues with loading data then you should be good.

The rowstores aren't relevant to the question, or rather, they're a totally different question. Partitioning is not the right tool to improve performance on rowstore query. I've seen performance gains on systems just by partitioning columnstore tables and leaving the rowstore tables alone.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange