Does adding a primary key cause restructuring of underlying data

Question 1

When you execute

ALTER TABLE someTable ADD CONSTRAINT PK_someTable PRIMARY KEY (id);

if there is no clustered index on someTable then the PK will be a clustered PK. Otherwise, if there is a clustered index before executing ALTER .. ADD ... PRIMARY KEY (id) the PK will be a non-clustered PK.

-- Test #1

BEGIN TRAN;
CREATE TABLE dbo.MyTable
(
    id INT NOT NULL,
    Col1 INT NOT NULL,
    Col2 VARCHAR(50) NOT NULL
);
SELECT  i.name, i.index_id, i.type_desc
FROM    sys.indexes i
WHERE   i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name index_id    type_desc
---- ----------- ---------
NULL 0           HEAP
*/
ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable PRIMARY KEY (id);

SELECT  i.name, i.index_id, i.type_desc
FROM    sys.indexes i
WHERE   i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name        index_id    type_desc
----------- ----------- ---------
PK_MyTable  1           CLUSTERED
*/
ROLLBACK;

-- Test #2

BEGIN TRAN;
CREATE TABLE dbo.MyTable
(
    id INT NOT NULL,
    Col1 INT NOT NULL,
    Col2 VARCHAR(50) NOT NULL
);
SELECT  i.name, i.index_id, i.type_desc FROM    sys.indexes i WHERE i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name index_id    type_desc
---- ----------- ---------
NULL 0           HEAP
*/
CREATE CLUSTERED INDEX ix1
ON dbo.MyTable(Col1);

SELECT  i.name, i.index_id, i.type_desc FROM    sys.indexes i WHERE i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name index_id    type_desc
---- ----------- ---------
ix1  1           CLUSTERED
*/

ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable PRIMARY KEY (id);

SELECT  i.name, i.index_id, i.type_desc FROM    sys.indexes i WHERE i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name       index_id    type_desc
---------- ----------- ------------
ix1        1           CLUSTERED
PK_MyTable 2           NONCLUSTERED
*/
ROLLBACK;

Question 2

In sql server, a primary keys defaults to clustered if no clustered index exists. A clustered index really means that the "index" is not kept in a separate storage area (as is a non-clustered index), but that the index data is "interspersed" with the corresponding regular table data. If you thing about this, you will realize that they can only be 1 cluster index.

The real advantage of a clustered index is that the data is near the index data, so you can grab both while the drive head is "in the area". A clustered index is noticebly faster than a non-clusted index when the data you are processing exhibits locality of reference -- when rows of nearly the same value tend to be read at the same time.

For example, if you primary key is SSN, you do not get large advantage unless you are processing data that is randomly ordered with respect to SSN -- though you do get an advantage due to the nearness of data. But, if you can presort the input by SSN a clustered key is a large advantage.

So yes, a clustered index does reorder the data so that it is comingled with the clustered index.

Question 3

Thanks for a nice demonstration of the subject !

The conclusions in the above is not wrong, but it shows the structure of the index, and not of the the table. I think the following SQL will show information for the actual table:

select 
    o.name, 
    o.object_id, 
    case 
      when p.index_id = 0 then 'Heap'
      when p.index_id = 1 then 'Clustered Index/b-tree'
      when p.index_id > 1 then 'Non-clustered Index/b-tree'
    end as 'Type'
from sys.objects o
inner join sys.partitions p on p.object_id = o.object_id
where o.name = 'MyTable';

You will see that MyTable is clustered:

name    object_id   Type
------- ----------- -------------------
MyTable 1237579447  Clustered Index/b-tree