Question

I'm importing a fairly hefty amount of data into a SQL Server database. The source data originates from PgSql (including table defs), which I throw through some fairly simple regex to translate to TSql. This creates tables with no primary key.

As far as I understand, lack of a primary key/clustering index means that the data is stored in a heap.

Once the import is complete, I add PKs as follows:

ALTER TABLE someTable ADD CONSTRAINT PK_someTable PRIMARY KEY (id);

(note the lack of CLUSTERED keyword). What's going on now? Still a heap? What's the effect on lookup by primary key? Is this really any different to adding a standard index?

Now, say instead I add PKs as follows:

ALTER TABLE someTable ADD CONSTRAINT PK_someTable PRIMARY KEY CLUSTERED (id);

I assume this now completely restructures the table into a row based structure with more efficient lookup by PK but less desirable insertion characteristics.

Are my assumptions correct?

If my import inserts data in PK order, is there any benefit to omitting the PK in the first place?

Était-ce utile?

La solution

When you execute

ALTER TABLE someTable ADD CONSTRAINT PK_someTable PRIMARY KEY (id);

if there is no clustered index on someTable then the PK will be a clustered PK. Otherwise, if there is a clustered index before executing ALTER .. ADD ... PRIMARY KEY (id) the PK will be a non-clustered PK.

-- Test #1

BEGIN TRAN;
CREATE TABLE dbo.MyTable
(
    id INT NOT NULL,
    Col1 INT NOT NULL,
    Col2 VARCHAR(50) NOT NULL
);
SELECT  i.name, i.index_id, i.type_desc
FROM    sys.indexes i
WHERE   i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name index_id    type_desc
---- ----------- ---------
NULL 0           HEAP
*/
ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable PRIMARY KEY (id);

SELECT  i.name, i.index_id, i.type_desc
FROM    sys.indexes i
WHERE   i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name        index_id    type_desc
----------- ----------- ---------
PK_MyTable  1           CLUSTERED
*/
ROLLBACK;

-- Test #2

BEGIN TRAN;
CREATE TABLE dbo.MyTable
(
    id INT NOT NULL,
    Col1 INT NOT NULL,
    Col2 VARCHAR(50) NOT NULL
);
SELECT  i.name, i.index_id, i.type_desc FROM    sys.indexes i WHERE i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name index_id    type_desc
---- ----------- ---------
NULL 0           HEAP
*/
CREATE CLUSTERED INDEX ix1
ON dbo.MyTable(Col1);

SELECT  i.name, i.index_id, i.type_desc FROM    sys.indexes i WHERE i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name index_id    type_desc
---- ----------- ---------
ix1  1           CLUSTERED
*/

ALTER TABLE dbo.MyTable
ADD CONSTRAINT PK_MyTable PRIMARY KEY (id);

SELECT  i.name, i.index_id, i.type_desc FROM    sys.indexes i WHERE i.object_id = OBJECT_ID(N'dbo.MyTable');
/*
name       index_id    type_desc
---------- ----------- ------------
ix1        1           CLUSTERED
PK_MyTable 2           NONCLUSTERED
*/
ROLLBACK;

Autres conseils

In sql server, a primary keys defaults to clustered if no clustered index exists. A clustered index really means that the "index" is not kept in a separate storage area (as is a non-clustered index), but that the index data is "interspersed" with the corresponding regular table data. If you thing about this, you will realize that they can only be 1 cluster index.

The real advantage of a clustered index is that the data is near the index data, so you can grab both while the drive head is "in the area". A clustered index is noticebly faster than a non-clusted index when the data you are processing exhibits locality of reference -- when rows of nearly the same value tend to be read at the same time.

For example, if you primary key is SSN, you do not get large advantage unless you are processing data that is randomly ordered with respect to SSN -- though you do get an advantage due to the nearness of data. But, if you can presort the input by SSN a clustered key is a large advantage.

So yes, a clustered index does reorder the data so that it is comingled with the clustered index.

Thanks for a nice demonstration of the subject !

The conclusions in the above is not wrong, but it shows the structure of the index, and not of the the table. I think the following SQL will show information for the actual table:

select 
    o.name, 
    o.object_id, 
    case 
      when p.index_id = 0 then 'Heap'
      when p.index_id = 1 then 'Clustered Index/b-tree'
      when p.index_id > 1 then 'Non-clustered Index/b-tree'
    end as 'Type'
from sys.objects o
inner join sys.partitions p on p.object_id = o.object_id
where o.name = 'MyTable';

You will see that MyTable is clustered:

name    object_id   Type
------- ----------- -------------------
MyTable 1237579447  Clustered Index/b-tree
Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top