Question

I have a many to many link table CategoryProduct with 2 columns, which will have multi-million records:

CREATE TABLE [dbo].[CategoryProduct](
[Category_ID] [int] NOT NULL,
[Product_ID] [int] NOT NULL,
CONSTRAINT [PK_dbo.CategoryProduct] PRIMARY KEY CLUSTERED 
(
    [Category_ID] ASC,
    [Product_ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]

Based on the clustered index, I expected to see the physical records to be stored in the following structure:

CategoryID    ProductID
1             2
1             3
2             1
2             3

However, the result with Select is

CategoryID    ProductID
2             1
1             2
1             3
2             3

Why is data stored in group of ProductID? Does this reflect the actual order of data? How can I save data in group of CategoryID so that a query like below can be optimised with a consecutive read when a matched CategoryID is hit.

select ProductID from CategoryProduct where CategoryID = value
Was it helpful?

Solution

When Sql Server fetch data by doing table scan or clustered index scan (if your table is clustered), it may choose to follow the leaf pages chaining because of search args, lock hints and other parameters, or it may follow the index allocation map that in most cases is not in the same order due to pages splits that occured.

Using a clustered index is not a guarantee of speed, Sql server computes different way to retrieve data for each request, even for simple requests (the Sql Query optimizer is a very complex system).

It is not a way to get data in a specific order either, the only way to get data in a specific order is to specify an ORDER BY clause in your query (this is an ANSI specification).

If you want to improve performance, you should study the query plan of your request. There are several ways to get the query plan of your request, the simplest one is to select the "include actual query plan" button in Sql Magenement Studio toolbar before executing your request.

Followup: with a clustered index, data is physically stored in the order of the cluster definition, until the cluster gets fragmented. The ONLY way to get data in a specific order in a SELECT is to add an ORDER BY clause to the SELECT, not creating indexes.

OTHER TIPS

You should not rely on the clustered key for the ordering of the data. It is stored on the disk in the order of the clustered key but it does not mean that the returned data is guaranteed to be returned in any order. If you need to have your data ordered, you need to use ORDER BY clause.

Your query will be fine in terms of it's usage of the index. Ordering of the data is not the way to verify it, anyway. You should execute your query, check the execution plan and verify that the index is indeed used.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top