Question

I'm writing a simple content management system. I need to store SHA1 hash values that are computed externally as the primary key for my biggest table.

I can obviously use a sequence as a primary key and index the SHA1 hex-string for look-up... However, I'm looking for a more elegant solution, where I will simply use the 20-byte SHA1 computed values as the given key to the rows I am about to insert/delete/update in the database table. Is there an efficient storage type that I can use to store and later on use the SHA1 keys as primary keys?

I will obviously need postgres to support using 20-byte values as keys to get this done.

Anyone with any ideas?

Was it helpful?

Solution

Be careful with what this can do to your index btrees. Since the SHA1 won't be sequential, your writes will be very slow due to all the jumping around in the btree.

If a sequence won't work, I usually would recommend a sequential GUID/UUID (see SQL Server's NEWSEQUENTIALID() for example) of some sort.

If you want to make the SHA1 your primary key after knowing this, you can convert it to a standard hex format that SHA1 is usually shown in (makes it easy to type). I wouldn't recommend a binary format as you won't be able to type it for debugging, etc.

OTHER TIPS

Particularly if you will do binary parameters into the db (through libpq for example), use bytea. If you want to do lots of manipulation through simple text queries, convert to hext and store in a text or varchar column.

PostgreSQL will of course have no problems in general with 20 byte keys, other than that the performance overhead is of course greater than with a sequence.

You could either convert to hex or base64 and use a varchar column, or try just storing it in a bytea-typed column. I'd try making tables with a bunch of random values in both formats and see how they perform.

See the PostgreSQL docs on bytea for info on that type.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top