Definition of indexing, datatype configuration, and DBMS for SHA3-256 hashes
-
22-01-2021 - |
Question
I am designing a system within which I will need to be able to search a database using a file or string SHA3-256 hash.
For example the hash of test is
36f028580bb02cc8272a9a020f4200e346e276ae664e45ee80745574e2f5ab80
- Which DATATYPE should I set the column that stores this value? I've heard about using BINARY instead of VARCHAR.
- How should I index this column so that searches by hash are fastest?
- Which database management system (like PostgreSQL, DB2, SQL Server), or document software (like MongoDB) should I use that most efficiently allows me to store and retrieve records by hash? I only have experience with relational database management systems like MySQL, but maybe a document store like MongoDB would be the better way to go.
Solution
SHA3-256 is 256bits. That's 32 bytes.
- Don't store it as
TEXT
any form, or it'll be 256 bytes (8x the size). - I would store it as inline binary if your database supports it because I doubt anything it's going to have a data type for a 32 byte data type (which nothing uses except AVX2).
On indexing, any HASH index would work if your database provides it.
From PostgreSQL,
CREATE DOMAIN sha3_256 AS bytea;
CREATE TABLE datastore (
id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
myhash sha3_256
);
CREATE INDEX ON datastore USING HASH (myhash);
INSERT INTO datastore (id,myhash) VALUES
(1, decode('36f028580bb02cc8272a9a020f4200e346e276ae664e45ee80745574e2f5ab80', 'hex'));
As a special optimization you could store it as two UUIDs
. Though I probably wouldn't bother.
Licensed under: CC-BY-SA with attribution
Not affiliated with dba.stackexchange