Difference between Clustered Columnstore Index Table in Azure SQL and Table in Azure Data Warehouse

https://dba.stackexchange.com/questions/218701

11-01-2021
|

Question

What is the essential difference between a Clustered Columnstore Index table in Regular Azure SQL, and a table in Azure Data Warehouse?

They both have columnar storage, no foreign keys, no primary keys , etc. Seems structurally they are the same. This is what our team is trying to understand. So what would be the difference?

Solution

(IMHO) Azure SQL Data Warehouse is a Massively Parallel Processing (MPP) engine with a shared nothing architecture designed to handle hundreds of terabytes of data. It does this by dividing your data (either by a distribution key you specify or round robin algorithm) across 60 distributions split across n nodes, depending on the performance tier you are at. It's really a big data platform and I wouldn't bother using if you have less than say 1-10TB, or your long-term data growth exceeds 4TB. A recent update (known as Gen 2) adds new service tier levels, increased concurrency and improved caching amongst other things. I believe they've just announced a cheaper tier at Ignite; details to follow. SQL DW has a reduced T-SQL surface area and is kind of expensive to keep running 24x7 so you might make use of its pause feature.

Azure SQL DB certainly a powerful database with tier-based service levels but it's capped at 4TB making it a different proposition to SQL DW. It's targeted at smaller OLTP or BI dbs or warehouses that won't exceed 4TB. Microsoft have just announced Azure SQL DB Hyperscale in public preview allowing up to 100TB.

Other than the distribution element, I'm not aware of any columnstore-specific differences between the two products. Bear in mind the massive compression you get with columnstore when estimating the size of your data.

HTH

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange