Question

In the Kademlia protocol node IDs are 160 bit numbers. Nodes are stored in buckets, bucket 0 stores all the nodes which have the same ID as this node except for the very last bit, bucket 1 stores all the nodes which have the same ID as this node except for the last 2 bits, and so on for all 160 buckets.

What's the fastest way to find which bucket I should put a new node into?

I have my buckets simply stored in an array, and need a method like so:

Bucket[] buckets; //array with 160 items

public Bucket GetBucket(Int160 myId, Int160 otherId)
{
    //some stuff goes here
}

The obvious approach is to work down from the most significant bit, comparing bit by bit until I find a difference, I'm hoping there is a better approach based around clever bit twiddling.

Practical note: My Int160 is stored in a byte array with 20 items, solutions which work well with that kind of structure will be preferred.

Was it helpful?

Solution

Would you be willing to consider an array of 5 32-bit integers? (or 3 64-bit integers)? Working with whole words may give you better performance than working with bytes, but the method should work in any case.

XOR the corresponding words of the two node IDs, starting with the most significant. If the XOR result is zero, move on to the next most significant word.

Otherwise, find the most significant bit that is set in this XOR result using the constant time method from Hacker's Delight.. This algorithm results in 32 (64) if the most significant bit is set, and 1 if the least significant bit is set, and so on. This index, combined with the index of the current word, will will tell you which bit is different.

OTHER TIPS

For starters you could compare byte-by-byte (or word-by-word), and when you find a difference search within that byte (or word) for the first bit of difference.

It seems vaguely implausible to me that adding a node to an array of buckets will be so fast that it matters whether you do clever bit-twiddling to find the first bit of difference within a byte (or word), or just churn in a loop up to CHAR_BIT (or something). Possible, though.

Also, if IDs are essentially random with uniform distribution, then you will find a difference in the first 8 bits about 255/256 of the time. If all you care about is average-case behaviour, not worst-case, then just do the stupid thing: it's very unlikely that your loop will run for long.

For reference, though, the first bit of difference between numbers x and y is the first bit set in x ^ y. If you were programming in GNU C, __builtin_clz might be your friend. Or possibly __builtin_ctz, I'm kind of sleepy...

Your code looks like Java, though, so I guess the bitfoo you're looking for is integer log.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top