質問

Possible Duplicate:
Mapping two integers to one, in a unique and deterministic way

I'm trying to create unique identificator for pair of two integers (Ruby) :

f(i1,i2) = f(i2, i1) = some_unique_value

So, i1+i2, i1*i2, i1^i2 -not unique as well as (i1>i2) ? "i1" + "i2" : "i2" + "i1".

I think following solution will be ok:

(i1>i2) ? "i1" + "_" + "i2" : "i2" + "_" + "i1"

but:

  1. I have to save result in DB and index it. So I prefer it to be an integer and as small as it possible.
  2. Is Zlib.crc32(f(i1,i2)) can guaranty uniqueness?

Thanks.

UPD:

Actually, I'm not sure the result MUST be integer. Maybe I can convert it to decimal: (i1>i2) ? i1.i2 : i2.i1

?

役に立ちましたか?

解決

What you're looking for is called a Pairing function.

The following illustration from the German wikipedia page clearly shows how it works:

http://upload.wikimedia.org/wikipedia/commons/thumb/4/41/Pairing-function.svg/350px-Pairing-function.svg.png

Implemented in Ruby:

def cantor_pairing(n, m)
    (n + m) * (n + m + 1) / 2 + m
end

(0..5).map do |n|
  (0..5).map do |m|
    cantor_pairing(n, m)
  end
end
=> [[ 0,  2,  5,  9, 14, 20],
    [ 1,  4,  8, 13, 19, 26],
    [ 3,  7, 12, 18, 25, 33],
    [ 6, 11, 17, 24, 32, 41],
    [10, 16, 23, 31, 40, 50],
    [15, 22, 30, 39, 49, 60]]

Note that you will need to store the result of this pairing in a datatype with as many bits as both your input numbers put together. (If both input numbers are 32-bit, you will need a 64-bit datatype to be able to store all possible combinations, obviously.)

他のヒント

No, Zlib.crc32(f(i1,i2)) is not unique for all integer values of i1 and i2.

If i1 and i2 are also 32bit numbers then there are many more combinations of them than can be stored in a 32bit number, which is returned by CRC32.

CRC32 is not unique, and wouldn't be good to use as a key. Assuming you know the maximum value of your integers i1 and i2:

unique_id = (max_i2+1)*i1 + i2

If your integers can be negative, or will never be below a certain positive integer, you'll need the max and min values:

(max_i2-min_i2+1) * (i1-min_i1) + (i2-min_i2)

This will give you the absolute smallest number possible to identify both integers.

Well, no 4-byte hash will be unique when its input is an arbitrary binary string of more than 4 bytes. Your strings are from a highly restricted symbol set, so collisions will be fewer, but "no, not unique".

There are two ways to use a smaller integer than the possible range of values for both of your integers:

  1. Have a system that works despite occasional collisions
  2. Check for collisions and use some sort of rehash

The obvious way to solve your problem with a 1:1 mapping requires that you know the maximum value of one of the integers. Just multiply one by the maximum value and add the other, or determine a power of two ceiling, shift one value accordingly, then OR in the other. Either way, every bit is reserved for one or the other of the integers. This may or may not meet your "as small as possible" requirement.

Your ###_### string is unique per pair; if you could just store that as a string you win.

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top