Question

I am designing a system within which I will need to be able to search a database using a file or string SHA3-256 hash.

For example the hash of test is

36f028580bb02cc8272a9a020f4200e346e276ae664e45ee80745574e2f5ab80
  1. Which DATATYPE should I set the column that stores this value? I've heard about using BINARY instead of VARCHAR.
  2. How should I index this column so that searches by hash are fastest?
  3. Which database management system (like PostgreSQL, DB2, SQL Server), or document software (like MongoDB) should I use that most efficiently allows me to store and retrieve records by hash? I only have experience with relational database management systems like MySQL, but maybe a document store like MongoDB would be the better way to go.
Was it helpful?

Solution

SHA3-256 is 256bits. That's 32 bytes.

  • Don't store it as TEXT any form, or it'll be 256 bytes (8x the size).
  • I would store it as inline binary if your database supports it because I doubt anything it's going to have a data type for a 32 byte data type (which nothing uses except AVX2).

On indexing, any HASH index would work if your database provides it.

From PostgreSQL,

CREATE DOMAIN sha3_256 AS bytea;

CREATE TABLE datastore (
  id     int  PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
  myhash sha3_256
);

CREATE INDEX ON datastore USING HASH (myhash);

INSERT INTO datastore (id,myhash) VALUES
  (1, decode('36f028580bb02cc8272a9a020f4200e346e276ae664e45ee80745574e2f5ab80', 'hex'));

As a special optimization you could store it as two UUIDs. Though I probably wouldn't bother.

Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange
scroll top