Sequential Guid and fragmentation

https://stackoverflow.com/questions/3450072

27-09-2019
|

Question

I'm trying to understand how sequential guid performs better than a regular guid.

Is it because with regular guid, the index use the last byte of the guid to sort? Since it's random it will cause alot of fragmentation and page splits since it will often move data to another page to insert new data?

Sequential guid sine it is sequential it will cause alot less page splits and fragmentation?

Is my understanding correct?

If anyone can shed more lights on the subject, I'll appreciated very much.

Thank you

EDIT:

Sequential guid = NEWSEQUENTIALID(),

Regular guid = NEWID()

Solution

You've pretty much said it all in your question.

With a sequential GUID / primary key new rows will be added together at the end of the table, which makes things nice an easy for SQL server. In comparison a random primary key means that new records could be inserted anywhere in the table - the chance of the last page for the table being in the cache is fairly likely (if that's where all of the reads are going), however the chance of a random page in the middle of the table being in the cache is fairly low, meaning additional IO is required.

On top of that, when inserting rows into the middle of the table there is the chance that there isn't enough room to insert the extra row. If this is the case then SQL server needs to perform additional expensive IO operations in order to create room for the record - the only way to avoid this is to have gaps scattered amongst the data to allow for extra records to be inserted (known as a Fill factor), which in itself causes performance issues because the data is spread over more pages and so more IO is required to access the entire table.

OTHER TIPS

I defer to Kimberly L. Tripp's wisdom on this topic:

But, a GUID that is not sequential - like one that has it's values generated in the client (using .NET) OR generated by the newid() function (in SQL Server) can be a horribly bad choice - primarily because of the fragmentation that it creates in the base table but also because of its size. It's unnecessarily wide (it's 4 times wider than an int-based identity - which can give you 2 billion (really, 4 billion) unique rows). And, if you need more than 2 billion you can always go with a bigint (8-byte int) and get 263-1 rows.

To visualize the whole picture util named ostress might be used. E.g. you can create two tables: one with normal GUID as PK, another with sequential GUID:

-- normal one
CREATE TABLE dbo.YourTable(
   [id] [uniqueidentifier] NOT NULL,
   CONSTRAINT [PK_YourTable] PRIMARY KEY NONCLUSTERED (id)
);
-- sequential one
CREATE TABLE dbo.YourTableSeq(
   [id] [uniqueidentifier] NOT NULL CONSTRAINT [df_yourtable_id]  DEFAULT (newsequentialid()),
   CONSTRAINT [PK_YourTableSeq] PRIMARY KEY NONCLUSTERED (id)
);

Then with a given util you run a numbero of inserts with selection of statistics about index fragmentation:

ostress -Slocalhost -E -dYourDB -Q"INSERT INTO dbo.YourTable VALUES (NEWID()); SELECT count(*) AS Cnt FROM dbo.YourTable; SELECT AVG_FRAGMENTATION_IN_PERCENT AS AvgPageFragmentation, PAGE_COUNT AS PageCounts FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS INNER JOIN sysindexes SI ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID WHERE SI.NAME = 'PK_YourTable';" -oE:\incoming\TMP\ -n1 -r10000

ostress -Slocalhost -E -dYourDB -Q"INSERT INTO dbo.YourTableSeq DEFAULT VALUES; SELECT count(*) AS Cnt FROM dbo.YourTableSeq; SELECT AVG_FRAGMENTATION_IN_PERCENT AS AvgPageFragmentation, PAGE_COUNT AS PageCounts FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, N'LIMITED') DPS INNER JOIN sysindexes SI ON DPS.OBJECT_ID = SI.ID AND DPS.INDEX_ID = SI.INDID WHERE SI.NAME = 'PK_YourTableSeq';" -oE:\incoming\TMP\ -n1 -r10000

Then in file E:\incoming\TMP\query.out you will find your statistics. My results are:

"Normal" GUID:
Records    AvgPageFragmentation     PageCounts           
---------------------------------------------- 
1000       87.5                     8                    
2000       93.75                    16                   
3000       96.15384615384616        26                   
4000       96.875                   32                   
5000       96.969696969696969       33                   
10000      98.571428571428584       70                   


Sequential GUID:
Records    AvgPageFragmentation     PageCounts           
---------------------------------------------- 
1000       83.333333333333343       6                    
2000       63.636363636363633       11                   
3000       41.17647058823529        17                   
4000       31.818181818181817       22                   
5000       25.0                     28                   
10000      12.727272727272727       55

As you can see with sequentially generated GUID being inserted, index is much less fragmented as the insert operation leads to new page allocation rarer.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow