Clustered Index Misunderstanding

https://dba.stackexchange.com/questions/136147

01-10-2020
|

Question

I've read in couple of articles that clustered index is as the same as table, or if you read using clustered index, you have the whole record. Some say, clustered index stores the whole records in deepest leaf of index.

It is foggy for me, does clustered index has a copy of data?!! for sure not, but I don't why people say that.

I read couple of articles but still it is not clear to me. Is clustered index is something virtual? Does it use B-Tree structure (like non-clustered index)?

Is there any schematic or model that describes how SQL Server stores/manages clustered index?

Solution

some say, cluster index stores the whole records in deepest leaf of index.

Yes this is correct. The leaf pages of the clustered index contain the actual table rows. This is the actual data. There is no other primary copy of it held elsewhere.

The leaf pages also allocate a few bytes to hold the address (file and page) of the next and previous pages in index key order to form a doubly linked list. Following the index in key order may or may not be the same as following it in physical order depending on the level of fragmentation.

The rows on the individual leaf pages may not be written physically exactly in index key order but the slot array on the page, with pointers to the rows it contains, is ordered by key order.

Does it use B-Tree structure (like non-clustered index)?

As long as the table doesn't fit on a single page then there will be one or more levels of the B-tree above the leaf. These higher levels contain index key values and pointers to pages in the level below to allow efficiently looking up key values. This is exactly the same principle used in nonclustered indexes too.

The leaf level pages of a clustered index are categorised as data pages and non-leaf pages as index pages. But the data pages are still part of the index.

enter image description here

(Image Source)

Also see my answer to a related question on Stack Overflow:

What do Clustered and Non clustered index actually mean?

OTHER TIPS

This can get confusing for people to wrap their heads around. Let's get a couple of points clear in MS SQL Server.

MS SQL Server has 2 types of tables: Heaps and Clustered Indexes. A heap has no order what so ever to it. Imagine taking a thousand names and throwing them in the air, then scrambling them around. That's essentially a heap. A heap uses RID (row ids) and other methods for the storage engine to find it.

A clustered index is a table that is ordered. Let's say your clustering key (what it's ordered by) is last_name. In this example the names will be in sequential order by last name.

Now this could be great or bad. If you make the key the last name, and a record comes in with the last name of 'Gonzales', then the sql server will have to find the page that Gonzales would be in on the data leaf level of the clustered index. If there's room it'll write it there. If there isn't it'll do a page split, write it in the new page, and point to the new page. As you can see it's very important to pick a clustered key that isn't going to cause fragmentation like this. Picking a clustered key such as datetime which typically keeps growing sequentially is a good pick, however internal ID is also a good one if there isn't any you can find. You can also set a fillfactor on your table to tell it how much to fill up before moving to the next page, effectively keeping page space free for inserts without causing fragmentation, but you'll have to do more reads as more pages need to be opened to read the same amount of data.

SQL Server is optimized to work with clustered keys in almost every scenario.

Licensed under: CC-BY-SA with attribution

Not affiliated with dba.stackexchange