Question

Given a vector of numbers: V=(v1, v2, ..., vn). These n numbers don't need to be distinct or sorted.

Suppose that we have a few vectors V1, V2, ..., Vm. Is possible to use a number (integer or float number) to uniquely denote each vector, such that for any Vi not equal Vj, the corresponding numbers f(Vi) and f(Vj) are not equal either.

A simple solution is to have one number in the range from 0 to m-1 as an ID to represent a vector, however we assume that this kind of solution cannot work in the case that each vector is stored in a few distributed machines. That is, the portions of vectors in two machines might overlap, and the algorithm doesn't know the distribution of vectors globally.

Was it helpful?

Solution 2

I'm assuming the inputs are in principle unbounded and so is the output number, as it's trivial otherwise. A simple way is just concatenations the representations of n and v1, v2, .. vn in some base b. Represent them in k-bit digits, then annotate each k-bit digit with a continuation bit (0 if the next k-bit group starts a new number, 1 if it belongs to the same number). This isn't of much use for anything except equality tests, but you did not mention anything else.

If you also care about preserving locality (i.e. nearby points p, q frequently have nearby values f(p), f(q)), some space-filling curves can be used for this purpose. The Hilbert curve is a bit complicated to generalize to higher dimensions, and the calculation is nontrivial. The Z-order curve isn't as good at preserving locality, but it's almost trivial to implement for any number of dimensions -- just interleave the bits of the binary representation.

OTHER TIPS

Of course if you have n numbers you can't compress them to one number of the same length without losing information (e.g. if you calculate some kind of hash from the vector, there will be hash collisions).

If you have unlimited space (like a BigInteger in Java), you can encode the vectors. Assuming that the vector length is fixed, you can simply use some "interlocking" pattern:

vector = [12345,4711,42]

1  2  3  4  5
 0  4  7  1  1
  0  0  0  4  2
100240370414512 <-- your unique number

It shouldn't be too hard to encode the vector size as well, so this would work for vectors of different sizes as well (e.g. you use the length in octal and an 8 as "prefix").

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top