문제

I recently posted this question about codes for a gift-card-like voucher that users can redeem online. I wanted to find the best tradeoff between large keyspace, low guessability, and human readability. Now that I'm into implementation I realize I've got another problem altogether, more of an algorithmic challenge.

Let's assume I adopt some code format - say 10 characters from A to Z for simplicity, and I start generating vouchers. What is the correct algorithm to do this?!

My first approach is to number all possible codes from 0 to 308,915,776, then start generating random numbers in that range. This obviously has a big problem though - I have to check my random number against all previously generated voucher codes and if it collides with an existing one I'll have to discard the code and try another. As the system accumulates more data it will slow down. At the extreme when there is only one code left it will be nearly impossible for the system to guess it correctly.

I could pre-generate all codes and shuffle them, then consume them in order. But this means I have to store many codes, and in fact my keyspace is bigger than the one i described, so we're talking about a very large amount of data. So that's also not too desirable.

So this leaves me with using the codes sequentially. I do not want guessable voucher codes though. The user who buys voucher "AAAAAAAAAY" should not have a good chance of getting another valid code if they type in "AAAAAAAAAZ".

I can shuffle my alphabet and my positions so that instead of

'ABCDEFGHIJKLMNOPQRSTUVWXYZ' i use

'LYFZTGKBNDRAPWEOXQHVJSUMIC'

and so that instead of positions

9 8 7 6 5 4 3 2 1 0 the positions are

1 8 0 7 5 4 3 9 2 6

Using this logic, given the code

LNWHDTECMA

the next code would be

LNEHDTECMA

This is definitely way less guessable. But they're still only one character off from each other, and given just two of these vouchers you would know which position is incrementing, and you would have a 90% chance of getting the next code in 24 guesses or less.

My "escape hatch" is to ditch all this and go with GUIDs. They have more characters than I wanted my users to have to type in, and contain similar characters like I/1 and O/0, but they magically make all of the above headaches go away. Still, I'm having fun thinking about this, maybe you are too. I'd love to hear some alternate suggestions. What have you got?

Thanks!

도움이 되었습니까?

해결책

The likelihood of two randomly generated code colliding is basically the same as a user guessing a valid code - and you cannot prevent users from guessing. So you must have a key space so much larger than the number of actually used codes that random collisions are extremely unlikely as well (though, thanks to the birthday paradox, probably not unlikely enough to ignore them completely, at least if you want your codes to be reasonably short), and checking against existing codes and re-generating in case of a collision is a perfectly viable strategy.

다른 팁

Use an N-bit serial number R, combined with an M-bit hash H of the concatenated pair (R, S) where S is some secret "salt" S which you do NOT publish. Then encode the pair (R,H) alphanumerically in any reversible way you like. If you like algorithms like MD5* or SHA, but the bit count is too high, then just take the M least significant bits of a standard hash algorithm.

You can verify easily: decode the alphanumeric encoding so you can see R and H. Then compute H' = hash(R+S) and verify that H = H'.

edit: R can be an incrementing serial number or random or whatever, just make sure you use each value not more than once.

*before someone says "MD5 is broken", let me remind you that the known weaknesses for MD5 are collision attacks, and not preimage attacks. Also, by using an unpublished, secret salt value, you deny an attacker the ability to test your security mechanism, unless he/she can guess the salt value. If you feel paranoid, pick two salt values Sprefix and Ssuffix, and calculate the hash of the concatenated triple (Sprefix,R,Ssuffix).

Some random number generators have an interesting property: Used right they do not generate duplicate numbers in a long time. They produce something called a full cycle. Use one of the algorithms described there, seed it, and you will have many unique numbers,

Add a smart way to map digits to characters and you got your codes.

I would say to use a "perfect hash" - http://en.wikipedia.org/wiki/Perfect_hash_function combined with a 4-digit random number...

So just increment your voucher code each time, then hash it, add a 4 digit random number and I would also add a check digit to the end (as Alix Axel suggested).

This would be very secure with no clashes - for example if someone worked out your hashing algorithm, they would also have to guess the 4-digit code at the end...

Programming Pearls has several examples of algorithms to generate sets of random numbers, you should read it if you're interested in this kind of problem.

The book shows that if you generate m random numbers with value less than n, the simple approach of generating numbers and throwing out duplicates will generate no more than 2m random numbers if m < n / 2. Here it is, in C++:

void gensets(int m, int n)
{
    set<int> S;
    set<int>::iterator i;
    while (S.size() < m) {
        int t = bigrand() % n;
        S.insert(t);
    }
    for (i = S.begin(); i != S.end(); ++i)
        cout << *i << "\n";
}

Obviously, if you're worried about people guessing values, you will want m to be much less than n / 2.

There's even a set-based algorithm to generate m random numbers less than n with each value being equally likely, no duplicates, and a guarantee not to generate more than m random numbers:

void genfloyd(int m, int n)
{
    set<int> S;
    set<int>::iterator i;
    for (int j = n-m; j < n; j++) {
        int t = bigrand() % (j+1);
        if (S.find(t) == S.end())
            S.insert(t); // t not in S
        else
            S.insert(j); // t in S
    }
    for (i = S.begin(); i != S.end(); ++i)
        cout << *i << "\n";
}

The order of the numbers isn't random, though, so this is probably not a good choice for you.

I answered the other question too :P

The best way is to generate one alphanumeric character at a time, randomly, until you have 8 of them. This will then be your voucher.

Ideally the best way would be to choose a sequence long enough so that you can safely assume if there will be any duplicates. Do note that, perhaps counter-intuitively, this happens more often than you think because of the Birthday problem.

For example, with 8 characters you have 1785793904896 possible combinations, but if you generate only 1,573,415 vouchers you will have a 50% chance to have a duplicate.

So, it all depends on how many you want to generate, and the maximum length of the code you're comfortable with. If you are generating many and you want to keep it short, you should save the ones you previously generated, and check against the database for duplicates.

This is a summary of the best bits of all the other answers. :)

You need to generate gift card numbers that are:

  • unique
  • unguessable

Random numbers are unguessable but not necessarily unique. The numbers produced by various algorithms are unique but guessable (the algorithm can be reverse-engineered). I don't know of a single algorithm that gives both properties, and because of the need to defy reverse engineering, it falls in the domain of cryptography. Non-experts, of course, shouldn't try to design cryptosystems.

Fortunately you don't have to get both properties from the same algorithm. Your gift card codes can consist of two parts: a part that is unique (generated using a linear congruential generator, perhaps, or modulo arithmetic, or even just an integer that you increment each time) and a part that is unguessable (just random numbers).

I read the whole comment and I found out something many people in other to protect use very clever and sophisticated means. the chances of getting a guess on my algorithm is 1/2600000 all you have to do is to change the salt prefix salt suffix after each generation

  • I chose a salt prefix of 4 numbers
  • and suffix of 4 numbers
  • then the main code is 9 numbers interchangeable
  • then using this format sprefix +random_numbers+ssuffix
  • I'll hash the format storing it into database immediately
  • the query can help to remove similar codes
  • and the suffix and prefix should be changed once you very printed 9! (362880) times.

I think the best way to go is that suggested by Andreas. But my answer is about an interesting related discussion.

You want to generate a sequence of numbers that together form a permutation of S = {1, ..., MAX}. One way to do this is to take the elements of a cyclic group over S. For example, the numbers R = {x modulo p, x^2 modulo p, x^3 modulo p, ..., x^(p-1) modulo p} form a cyclic group over {1, ..., p-1}, provided p is a prime and x is coprime to p. So if you choose MAX as a prime number you do use this sequence.

You want a "tough-to-crack" sequence. A generator for the sufficiently-tough-to-crack sequence is called a pseudorandom generator (ofcourse you probably don't need that tough-to-crack). An example is the last digit of elements in R above, provided p is kept secret (am I correct?). But the answer by Andreas already uses a source of (pseudo-) random numbers, so cannot be called a pseudorandom generator.

If you are interested in pseudorandom generators, they are discussed in detail in volume 2 of Knuth's well-known book.

Based on Jason Orendoff's answer, I put together an algorithm to generate gift card codes. Basically, it has two 40-bit numbers: one of them to assure it is unique and another to assure it is hard to guess.

  • the 40-bit random number part is enough for 1 in 2^40 chances of guessing;
  • the 40-bit sequential number part is enough for 34.8 years of uniqueness (assuming we generate one gift card per ms.)

The total 80-bit sequence is then converted to a 16-character string using Base32.

import java.security.SecureRandom;
import java.util.Random;
import java.util.concurrent.atomic.AtomicLong;

import org.apache.commons.codec.binary.Base32;

public class GiftCardUtil {

    private AtomicLong sequence;
    private Random random;

    public GiftCardUtil() {
        // 1325383200000L == 1 Jan 2012
        sequence = new AtomicLong(System.currentTimeMillis() - 1325383200000L);
        random = new SecureRandom();
    }

    public String generateCode() {
        System.out.println(sequence.get());
        byte[] id = new byte[10];
        longTo5ByteArray(sequence.incrementAndGet(), id);
        byte[] rnd = new byte[5];
        random.nextBytes(rnd);
        System.arraycopy(rnd, 0, id, 5, 5);
        return new Base32().encodeAsString(id);
    }

    private void longTo5ByteArray(long l, byte[] b) {
        b[0] = (byte) (l >>> 32);
        b[1] = (byte) (l >>> 24);
        b[2] = (byte) (l >>> 16);
        b[3] = (byte) (l >>> 8);
        b[4] = (byte) (l >>> 0);
    }
}

What may work effectively is simply using the time of creation to your advantage. Say, last two digits of the year, the two digit month, the two digit day, the two digit hour, two digit minutes, two digit seconds, then carry the seconds out to, say, the microsecond. If further obfuscation is desired, have them prescrambled (e.g. MYmdshhdMmYs instead of YYMMddhmmss). Then change the base (to pentadecimal, perhaps) to turn away any guessing attempts further. This caries two major benefits: 1-Using the date, including the year, will destroy any duplication as the same time will not pass twice. Only after a hundred years is there any risk. The only concern is potentially having two created at the same microsecond, for which it would be a simple task to disallow the creation of more than one at a time. A millisecond delay would fix the problem.

2-Guessing will be very difficult. Not only is figuring out what base and order the numbers (and letters!) are in going to be a daunting task, but going out to the microsecond makes sequence largely irrelevant. Not mention how hard it would be for a customer to figure how what microsecond they bought at and how their clock matches up with yours.

The objection may be "Wait! That is 17 digits (YYMMDDhhmmss.sssss) but brought out to a larger base afterwards would diminish it. Going to base 36, using 10 numbers and 26 letters, means a 11 digit code would cover every possibility. If uppercase and lowercase are not interchangeable, the data can be compressed to the goal of 10 digits with zero problems.

Here is a though:

  • ID = each voucher has an unique (auto-incremented?) ID
  • CHECKSUM = apply N iterations of the Verhoeff or Luhn algorithm on the ID
  • VOUCHER = base convert the generated CHECKSUM from base 10 to base 36

See also this related SO question: Ideas to create a small (<10 digits), not (very) secure “hash”.


One simple way to make this method more secure is to use a non auto-incremented ID value, one option might be to use ID as the last 6 or 7 digits of a UNIX timestamp and calculate the checksum.

I second the use of a cryptographic hash—taking bits from MD5 is very simple. To make things readable, I hit on the following idea: take a list of words, and use bits of the key to index a list of words. My word list is about 100 000 words, so about 16 bits per word, which for four words gives a 64-bit keyspace. The results are usually quite readable.

For example, the cryptographic signature of the preceding paragraph is

kamikaze's freshet mansion's expectorating

(My word list is tilted toward a larger keyspace; if you want shorter phrases, you have fewer words.)

If you have an MD5 library handy, this strategy is very easy to implement—I do it in about 40 lines of Lua.

라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top