How should column-level PostgresSQL encryption implemented in SQLAlchemy using pgcrypt? [closed]

Question

There are a bunch of stages to this kind of decision making, it's not just "shove a plugin into the stack and that encryption thing is taken care of"

First, you really need to classify each column for its attractiveness to attackers & what searches/queries need to use it, whether it's a join column / index candidate, etc. Some data needs much stronger protection than other data.

Consider who you're trying to protect against:

Casual attacker (e.g SQL injection holes used for remote table copies)
Stolen database backup (tip: Encrypt these too)
Stolen/leaked log files, possibly including queries and parameters
Attacker with direct non-superuser SQL level access
Attacker with direct superuser SQL-level access
Attacker who gains direct access to the "postgres" OS user, so they can modify configuration, copy/edit logs, install malicious extensions, alter function definitions, etc
Attacker who gains root on the DB server

Of course, there's also the app server, upstream compromise of trusted sources for programming languages and toolkits, etc. Eventually you reach a point where you have to say "I can't realistically defend against this". You can't protect against somebody coming in, saying "I'm from the Government and I'll do x/y/z to you unless you allow me to install a rootkit on this customer's server". The point is that you've got to decide what you do have to protect against, and make your security decisions based on that.

A good compromise can be to do as much of the crypto as possible in the app, so PostgreSQL never sees the encryption/decryption keys. Use one-way hashing whenever possible, rather than using reversible encryption, and when you hash, properly salt your hashes.

That means pgcrypto doesn't actually do you much good, because you're never sending plaintext to the server, and you're not sending key material to the server either.

It also means that two people with the same plaintext for column SecretValue have totally different values for SecretValueSalt, SecretValueHashedBytes in the database. So you can't join on it, use it in a WHERE clause usefully, index it usefully, etc.

For that reason, you'll often compromise with security. You might do an unsalted hash of part of the datum, so you get a partial match, then fetch all the results to your application and filter them on the application side where you have the full information required. So your storage for SecretValue now looks like SecretValueFirst10DigitsUnsaltedHash, SecretValueHashSalt, SecretValueHashBytes. But with better column names.

If in doubt, just don't send plaintext of anything sensitive to the database. That means pgcrypto isn't much use to you, and you'll be doing mostly application-side crypto. The #1 reason for that is that if you send plaintext (or worse, key material) to the DB, it might get exposed in log files, pg_stat_activity, etc.

You'll pretty much always want to store encrypted data in bytea columns. If you really insist you can hex- or base64 encode it and shove it in a text column, but developers and DBAs who have to use your system later will cry.