Question

In one of my school works I am required to use a Deterministic algorithm (http://en.wikipedia.org/wiki/Deterministic_encryption) to cipher several fields.

In this specific case I have to cipher a table with booleans. This would be fine, except that using a deterministic algorithm to do so is pretty much useless.

Why is that so? (you may ask)

It happens that when I cipher (per example) the value "true", I always get the cipher text of "AB1" and when I cipher the value false, I always get the cipher text of "SQ2". So instead of having a table filled of values "true" and "false" I have a table filled with values "AB1" and "SQ2". Any attacker will understand immediately that my table stores booleans and it won't take him long to figure that AB1=true and SQ2=false.

This is what I want to prevent in my homework. To avoid this problem I tried using numbers with certain properties. Per example, a value of "true" is replace by a prime number, while a value of "false" is replaced by a non-prime number. Thus, my table will be filled with many different prime numbers and non prime numbers.

This would be an acceptable solution if not for one little thing: the number of primes we can calculate is "limited" (it takes REALLY LONG to calculate big primes). In an interval of 10000000 numbers only 664579 are prime (only 6.64579% ).

So I considered using odd numbers instead of prime numbers, but I am not sure about the quality of odd numbers. I think that an attacker will be able to retrieve the property of "oddness" from the ciphers and thus make an attack.

Is my assumption about oddness correct? Are there any other solutions? Do you guys have any ideas?

I would really appreciate any help or ideas, thx in advance Pedro.

Was it helpful?

Solution 4

Well, I don't know if this will help you people, but my final conclusion is a mix of what I had with some of your suggestions.

I solved the problem by making the following assignment:

true = random int PRIME number

false = random int number (not prime)

And now the obvious question comes:

Q: If I am selecting random numbers, how do I guarantee that they are not repeated without saving each generated number?

A: I convert the random number to bytes, and mix it with an ID of the same table. This way i doesn't matter if I generated a random repeated number because when I mixed it with the ID (that is unique) I messed up the output.

Calculating primes however can be quite heavy, so I will post here a small part of my report that has other suggestions:

Although we selected the property of being prime as an equivalent to the boolean of true, we could have also used other properties, such as:
  <li>
    <ul>
        Representing a true boolean by a random odd number and 
        a false boolean by a random even value
    </ul>
    <ul>
        Representing a true boolean by a sequence of numbers 
        that belongs to PI and a false boolean by a random 
        number sequence that doesn't belong to PI
    </ul>
  </li>

This has been quite an interesting and unexpected challenge to me. It never occurred to me that something so simple could require such ingenuity.

I hope my solution helps you all, and if you love cryptography, give the project a look :P

OTHER TIPS

According to the papers, CryptDB works by adjusting the encryption-level depending on which queries are executed.

When no queries are executed, all values are stored using what they call Random encryption. This is basically AES in CBC mode with a random IV. Since the IV is different for all fields, all fields will have different encryptions.

When they need to perform equality check on a column (for JOINs, GROUP BYs etc.) they downgrade the encryption to Deterministic encryption. The primary requirement for this level of encryption is that a given plaintext is always encrypted to the same ciphertext. So for this level a boolean column will only have two possible values for the ciphertext. Yes, this leaks information to an attacker that can observe the table, but there is no way around it. Storing primes/non-primes instead will not work: the database will then be unable to perform the required operation.

Sorry, but you will need to find another idea for your homework.

You could take groups of booleans and treat them as binary numbers

The most significant misunderstanding here is that you don't encrypt each value on its own. What you should be doing is packing all values into some kind of data structure (e.g. an array of bool) and then encrypting the data structure as a whole (using CBC mode, if you are not doing that already).

The data structure chosen (i.e. how you choose to pack the table into a single binary entity) does not matter as far as encryption is concerned.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top