Calculate the Hilbert value of a point for use in a Hilbert R-Tree?

https://stackoverflow.com/questions/106237

01-07-2019
|

Question

I have an application where a Hilbert R-Tree (wikipedia) (citeseer) would seem to be an appropriate data structure. Specifically, it requires reasonably fast spatial queries over a data set that will experience a lot of updates.

However, as far as I can see, none of the descriptions of the algorithms for this data structure even mention how to actually calculate the requisite Hilbert Value; which is the distance along a Hilbert Curve to the point.

So any suggestions for how to go about calculating this?

Solution

Fun question!

I did a bit of googling, and the good news is, I've found an implementation of Hilbert Value.

The potentially bad news is, it's in Haskell...

http://www.serpentine.com/blog/2007/01/11/two-dimensional-spatial-hashing-with-space-filling-curves/

It also proposes a Lebesgue distance metric you might be able to compute more easily.

OTHER TIPS

Below is my java code adapted from C code in the paper "Encoding and decoding the Hilbert order" by Xian Lu and Gunther Schrack, published in Software: Practice and Experience Vol. 26 pp 1335-46 (1996).

Hope this helps. Improvements welcome !

Michael

/**
 * Find the Hilbert order (=vertex index) for the given grid cell 
 * coordinates.
 * @param x cell column (from 0)
 * @param y cell row (from 0)
 * @param r resolution of Hilbert curve (grid will have Math.pow(2,r) 
 * rows and cols)
 * @return Hilbert order 
 */
public static int encode(int x, int y, int r) {

    int mask = (1 << r) - 1;
    int hodd = 0;
    int heven = x ^ y;
    int notx = ~x & mask;
    int noty = ~y & mask;
    int temp = notx ^ y;

    int v0 = 0, v1 = 0;
    for (int k = 1; k < r; k++) {
        v1 = ((v1 & heven) | ((v0 ^ noty) & temp)) >> 1;
        v0 = ((v0 & (v1 ^ notx)) | (~v0 & (v1 ^ noty))) >> 1;
    }
    hodd = (~v0 & (v1 ^ x)) | (v0 & (v1 ^ noty));

    return interleaveBits(hodd, heven);
}

/**
 * Interleave the bits from two input integer values
 * @param odd integer holding bit values for odd bit positions
 * @param even integer holding bit values for even bit positions
 * @return the integer that results from interleaving the input bits
 *
 * @todo: I'm sure there's a more elegant way of doing this !
 */
private static int interleaveBits(int odd, int even) {
    int val = 0;
    // Replaced this line with the improved code provided by Tuska
    // int n = Math.max(Integer.highestOneBit(odd), Integer.highestOneBit(even));
    int max = Math.max(odd, even);
    int n = 0;
    while (max > 0) {
        n++;
        max >>= 1;
    }

    for (int i = 0; i < n; i++) {
        int bitMask = 1 << i;
        int a = (even & bitMask) > 0 ? (1 << (2*i)) : 0;
        int b = (odd & bitMask) > 0 ? (1 << (2*i+1)) : 0;
        val += a + b;
    }

    return val;
}

See uzaygezen.

The code and java code above are fine for 2D data points. But for higher dimensions you may need to look at Jonathan Lawder's paper: J.K.Lawder. Calculation of Mappings Between One and n-dimensional Values Using the Hilbert Space-filling Curve.

I figured out a slightly more efficient way to interleave bits. It can be found at the Stanford Graphics Website. I included a version that I created that can interleave two 32 bit integers into one 64 bit long.

public static long spreadBits32(int y) {
    long[] B = new long[] {
        0x5555555555555555L, 
        0x3333333333333333L,
        0x0f0f0f0f0f0f0f0fL,
        0x00ff00ff00ff00ffL,
        0x0000ffff0000ffffL,
        0x00000000ffffffffL
    };

    int[] S = new int[] { 1, 2, 4, 8, 16, 32 };
    long x = y;

    x = (x | (x << S[5])) & B[5];
    x = (x | (x << S[4])) & B[4];
    x = (x | (x << S[3])) & B[3];
    x = (x | (x << S[2])) & B[2];
    x = (x | (x << S[1])) & B[1];
    x = (x | (x << S[0])) & B[0];
    return x;
}

public static long interleave64(int x, int y) {
    return spreadBits32(x) | (spreadBits32(y) << 1);
}

Obviously, the B and S local variables should be class constants but it was left this way for simplicity.

Michael,

thanks for your Java code! I tested it and it seems to work fine, but I noticed that the bit-interleaving function overflows at recursion level 7 (at least in my tests, but I used long values), because the "n"-value is calculated using highestOneBit()-function, which returns the value and not the position of the highest one bit; so the loop does unnecessarily many interleavings.

I just changed it to the following snippet, and after that it worked fine.

  int max = Math.max(odd, even);
  int n = 0;
  while (max > 0) {
    n++;
    max >>= 1;
  }

If you need a spatial index with fast delete/insert capabilities, have a look at the PH-tree. It partly based on quadtrees but faster and more space efficient. Internally it uses a Z-curve which has slightly worse spatial properties than an H-curve but is much easier to calculate.

Paper: http://www.globis.ethz.ch/script/publication/download?docid=699

Java implementation: http://globis.ethz.ch/files/2014/11/ph-tree-2014-11-10.zip

Another option is the X-tree, which is also available here: https://code.google.com/p/xxl/

Suggestion: A good simple efficient data structure for spatial queries is a multidimensional binary tree.

In a traditional binary tree, there is one "discriminant"; the value that's used to determine whether you take the left branch or the right branch. This can be considered to be the one-dimensional case.

In a multidimensional binary tree, you have multiple discriminants; consecutive levels use different discriminants. For example, for two dimensional spacial data, you could use the X and Y coordinates as discriminants. Consecutive levels would use X, Y, X, Y...

For spatial queries (for example finding all nodes within a rectangle) you do a depth-first search of the tree starting at the root, and you use the discriminant at each level to avoid searching down branches that contain no nodes in the given rectangle.

This allows you to potentially cut the search space in half at each level, making it very efficient for finding small regions in a massive data set. (BTW, this data structure is also useful for partial-match queries, i.e. queries that omit one or more discriminants. You just search down both branches at levels with an omitted discriminant.)

A good paper on this data structure: http://portal.acm.org/citation.cfm?id=361007 This article has good diagrams and algorithm descriptions: http://en.wikipedia.org/wiki/Kd-tree

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow