Question

Possible Duplicate:
How do you like your primary keys?

I'm aware of the benefits of using a GUID, as well as the benefits of using and INT as a PK in a database. Considering that a GUID is in essence a 128 bit INT and a normal INT is 32 bit, the INT is a space saver (though this point is generally moot in most modern systems).

In the end, in what circumstances would you see yourself using an INT as a PK versus a GUID?

Was it helpful?

Solution

Kimberley Tripp (SQLSkills.com) has an article on using GUID's as primary keys. She advices against it because of the unnecessary overhead.

OTHER TIPS

Apart from being a poor choice when you need to synchronize several database instances, INT's have one drawback I haven't seen mentioned: inserts always occur at one end of the index tree. This increases lock contention when you have a table with a lot of movement (since the same index pages have to be modified by concurrent inserts, whereas GUID's will be inserted all over the index). The index may also have to be rebalanced more often if a B* tree or similar data structure is used.

Of course, int's are easier on the eye when doing manual queries and report construction, and space consumption may add up through FK usages.

I'd be interested to see any measurements of how well e.g. SQL Server actually handles insert-heavy tables with IDENTITY PK's.

To answer your question: In the end, in what circumstances would you see yourself using an INT as a PK versus a GUID?

I would use a GUID if my system would have an online/offline version that inside the offline version you can save data and that data is transferred back to the server one day during a synch. That way, you are sure that you won't have the same key twice inside your database.

the INT is a space saver (though this point is generally moot in most modern systems).

Not so. It may seem so at first glance, but note that the primary key of each table will be repeated multiple times throughout the database in indexes and as foreign key in other tables. And it will be involved in nearly any query containing its table - and very intensively when it's a foreign key used for a join.

Furthermore, remember that modern CPUs are very, very fast, but RAM speeds have not kept up. Cache behaviour becomes therefore increasingly important. And the best way to get good cache behaviour is to have smaller data sets. So the seemingly irrelevant difference between 4 and 16 bytes may well result in a noticeable difference in speed. Not necessarily always - but it's something to consider.

We have Guids in our very complex enterprise software everywhere. Works smoothly.

I believe Guids are semantically more suitable to serve as identifiers. There is also no point in unnecessarily worrying about performance until you are faced with that problem. Beware premature optimization.

There is also an advantage with database migration of any sort. With Guids you will have no collisions. If you attempt to merge several DBs where ints are used for identity, you will have to replace their values. If these old values were used in urls, they will now be different following SEO hit.

When comparing values such as Primary to Foreign key relationship, the INT will be faster. If the tables are indexed properly and the tables are small, you might not see much of a slow down, but you'd have to try it to be sure. INTs are also easier to read, and communicate with other people. It's a lot simpler to say, "Can you look at record 1234?" instead of "Can you look at record 031E9502-E283-4F87-9049-CE0E5C76B658?"

Some OSes don't generate GUIDs anymore based on unique hardware features (CPUID,MAC) because it made tracing users to easy (privacy concerns). This means the GUID uniqueness is often no longer as universal as many people think.

If you use some auto-id function of your database, the database could in theory make absolutely sure that there is no duplication.

If the data lives in a single database (as most data for the applications that we write in general does), then I use an IDENTITY. It's easy, intended to be used that way, doesn't fragment the clustered index and is more than enough. You'll run out of room at 2 billion some records (~ 4 billion if you use negative values), but you'd be toast anyway if you had that many records in one table, and then you have a data warehousing problem.

If the data lives in multiple, independent databases or interfaces with a third-party service, then I'll use the GUID that was likely already generated. A good example would be a UserProfiles table in the database that maps users in Active Directory to their user profiles in the application via their objectGUID that Active Directory assigned to them.

If you are planning on merging database at some stage, ie for a multi-site replication type setup, Guid's will save a lot of pain. But other than that I find Int's easier.

I always think PK's should be numeric where possble. Dont forget having GUIDs as a PK will probably mean that they are also used in other tables as foriegn keys, so paging and index etc will be greater.

I think the database also matters. From a MySQL perspective - generally, the smaller the datatype the faster the performance.

It seems to hold true for int vs GUID too - http://kccoder.com/mysql/uuid-vs-int-insert-performance/

I would use GUID as PK only if this key bounds to similar value. For example, user id (users in WinNT are describes with GUIDs), or user group id. Another one example. If you develop distributed system for documents management and different parts of system in different places all over the world can create some documents. In such case I would use GUID, because it guaranties that 2 documents created in different parts of distributed system wouldn't have same Id.

An INT is certainly much easier to read when debugging, and much smaller.

I would, however, use a GUID or similar as a license key for a product. You know it's going to be unique, and you know that it's not going to be sequential.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top