Question

I would like to build a distributed system. I need to store data in databases and it would be helpful to use an UUID or a GUID as a primary key on some tables. I assume it's a drawbacks with this design since the UUID/GUID is quite large and they are almost random. The alternative is to use an auto-incremented INT or LONG.

What are the drawbacks with using UUID or GUID as a primary key for my tables?

I will probably use Derby/JavaDB (on the clients) and PostgreSQL (on the server) as DBMS.

Was it helpful?

Solution

It depends on your generation function and size of the final tables

GUIDs are intended to be globally unique identifiers. As discussed in the Postgres 8.3 documentation there are no methodologies that are universally appropriate to generate these identifiers, but postgreSQL does ship with a few more useful candidates.

From the scope of your problem, and the need for offline writes, you've quite neatly boxed out the use of anything but a GUID, and therefore there are no compensatory advantages of other schemes.

From a functional standpoint, the key length is usually not an issue on any kind of modern system, depending on the number of reads and size of the table. As an alternative methodology, offline clients could batch new records without a primary key and simply insert them when reconnecting. As postgreSQL offers the "Serial" datatype, clients will never need to determine the ID if they can perform a simple write to the database.

OTHER TIPS

One more advice - never use GUIDs as part of clustered index. GUIDs are not sequential, thus if they are part of clustered index, every time you insert new record, database would need to rearrange all its memory pages to find the right place for insertion, in case with int(bigint) auto-increment, it would be just last page.

Now if we look to some db realizations: 1.) MySQL - primary keys are clustered, with no option to change behavior - the recomendation is not to use GUIDs at all here 2.) Postgres, MS-SQL - you can make GUID as primary key unclustered, and use another field as clustered index, for example autoincrement int.

It depends.

Seriously, with all you've given so far, this is about as far as you can go.

Why would it be helpful to use UUIDs? Why won't you use INTs? Why can't you just index on UUIDs later? Do you understand what it means to have a sorted list with the key of a UUID and insert a random (non-sequential) UUID after a few million rows?

What platform will this run on? How many disks? How many users? How many records?

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top