implementing a patricia trie in java

https://stackoverflow.com/questions/13111532

14-07-2021
|

Question

I'm trying to rewrite a c++ patricia trie in java. The c++ code is from here

I'm a bit stuck.

So here's my understanding:

#define ZEROTAB_SIZE 256
head->key = (char*)calloc(ZEROTAB_SIZE, 1);

we create an array of 256 bits for the key, so we can have a string with a maximum length of 32 characters and every character is represented with 8 bits. Can i implement this with a char array in java?

template <class T>
int PatriciaTrie<T>::bit_get(PatriciaTrieKey bit_stream, int n) {
  if (n < 0) return 2; // "pseudo-bit" with a value of 2.
  int k = (n & 0x7);
  return ( (*(bit_stream + (n >> 3))) >> k) & 0x1;
}

k gets the last 7 bits of n, we move to the n/8 character of the string (not exactly n/8 since shifting to the right would remove anything lower than 8 to zero) then we shift the value of bit_stream[n>>3] by k and then we get last bit. if i use arrays in java could i rewrite this as

return (bit_stream[n>>3] >> k) & 0x1;

template <class T>
int PatriciaTrie<T>::bit_first_different(PatriciaTrieKey k1, PatriciaTrieKey k2) {
    if (!k1 || !k2)
        return 0; // First bit is different!
    int n = 0;
    int d = 0;
    while ( (k1[n] == k2[n]) &&
            (k1[n] != 0) &&
            (k2[n] != 0) )
        n++;
    while (bit_get(&k1[n], d) == bit_get(&k2[n], d))
        d++;
    return ((n << 3) + d);
}

now this is where it gets confusing, the first part until the second while loop looks clear enough, loop and check how many bits are equal and non zero, but the i'm not sure what the second loop is doing, we take the address of the two keys and check the first bits if they're equal and if they are we check again until we find unequal bits?

Mainly i'm not sure how the address of the key is used here, but i might be confused on bit shifting in bit_get class too.

I want to do a comparison between there trie in c++ and java for my java class and i want to keep the implementations as similar as possible.

Solution

I'm not familiar with this data structure, but there are some problems with your understanding of this code.

First, calloc allocates 256 bytes, not bits. new byte[256] Would be comparable in java.

Second, n & 0x7 gets three bits of n, not seven. A clearer way to write this would be n/8 and n%8 instead of n>>3 and n & 7, but the bitwise operations might be slightly faster if your compiler is stupid.

You are correct that (bit_stream[n>>3]>>k) & 1 is the same.

Now, the first loop in bit_first_different loops over bytes, not bits. The check for 0 is to prevent running off the end of the keys. Once that loop terminates, n refers to the first differing byte. The second loop is then looking for which bit is different.

Note that if the two keys are not different, then the second loop may run off the end of the keys, potentially causing a segmentation fault.

Now, the & is taking the address of k1[n] because the bit_get function is expecting a pointer to a character...this passes in the nth element of the bit stream. After the loop, d is the offset of the first different bit of k[n].

Finally the code combines n (which byte?) With d (which bit in that byte?) to give the bit. Again I would advocate 8*n+d for clarity, but that's a matter of taste.

OTHER TIPS

Can i implement this with a char array in java?

My java is a bit rusty but I believe char is signed in java which means that >> won't do what you expect it to. That's because shifting a signed number will not shift the sign bit so what you really want is the >>> operator or just use the byte type which is unsigned. I have a feeling that this is all kinds of wrong so please double-check.

return (bit_stream[n>>3] >> k) & 0x1;

In C or C++, *(array + k) is just another way to write array[k] so your translation looks correct. As for the interpretation, bit_stream[n>>3] essentially fetches the byte in which the desired bit is located. >> k Moves the desired bit in the least-significant bit position. Finally, we remove all the bits we're not interested in by masking them out with & 0x1. This leaves us with a value of either 0 or 1 depending on whether the bit was set or not.

What the final function does is compare 2 bit strings and returns the bit position where the 2 strings first differ. The first loop is essentially an optimized version of the second loop where instead of doing a bit by bit comparaison, it checks whole bytes instead.

In other words, it first loops over every bytes and find the first 2 that differ. It then takes those 2 differing bytes and loops over them until it finds the first 2 bit that differ. Note that the bit_get function is never going to receive an n greater 7 in this scenario because we know there's a difference somewhere in the byte. The final bit position is then constructed from the the result of both loops like so: (number_of_equal_bytes * 8) + number_of_equal_bits).

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow