Question

Imagine a value, say '1234'. I want to map that value to an other value, say 'abcd'. The constrains:

  1. The length of the target value is equal to the start value
  2. The mapping should be unique. E.g. 1234 should only map to abcd and viseversa
  3. The mapping process should be (very) difficult to guess. E.g. multiplying by 2 does count
  4. The mapping should be reversible
  5. The start value is an integer
  6. The target value can be of any type

This should be a basic algorithm, eventually I'll write it in Ruby but that is of no concern here.

I was thinking along the following lines:

SECRET = 1234
def to(int)
  SECRET + int * 2
end

def fro(int)
  (int - SECRET)  / 2
end

Obviously this violates constrains 1 and 3.

The eventual goal is to anonymize records in my database. I might be over thinking this.

Was it helpful?

Solution

First off, I rather think your objectives are too ambitious: why constraint 6?

Second, what you need is technically a bijection from the domain of integers.

Third, your constraint 3 goes against Kerkhoff's principle. You'd be better off with a well-known algorithm governed by a secret key, where the secret key is hard to derive even if you know the results for a large set of integers.

Fourth, what are you anonymizing against? If you are dealing with personal information, how will you protect against statistical analysis revealing that Xyzzy is actually John Doe, based on the relations to other data? There's some research on countering such attack vectors (google for e.g. 'k-anonymization').

Fifth, use existing cryptographic primitives rather than trying to invent your own. Encryption algorithms exist (e.g. AES in cipher-block-chaining mode) that are well-tested -- AES is well supported by all modern platforms, presumably Ruby as well. However, encryption still doesn't give records anonymity in any strong sense.

OTHER TIPS

It might be worth you giving a little more detail on what you're trying to acheive. Presumably you're worried about some evil person getting hold of your data, but isn't it equally possible that this evil person will also have access to the code that accessed your database? What's to stop them learning the algorithm by inspecting your code?

If you truely want to anonymize the data then that's generally a one way thing (names are removed, credit card values are removed etc). If you're trying to encrypt the contents of the database then many database engines provide well tested mechanisms to do this. For example:

Best practices for dealing with encrypted data in MSSQL

database encryption

It's always better to use a product's encryption mechanism than roll your own.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top