I lack to see why you would want to use a random number generator here... however, I can help you to speed things up.
A bloom filter is basically a bitvector, where you set bits. If you want to figure out if an item exists, the bloom filter will give you a true if the item possibly exists and a false if the item for sure doesn't exist.
(I'm doing this in a simple text editor so there might be some bugs in the code)
I'm going to assume your hash space here can use 32-bit integer calculations; if you have a very large bloom table, you probably want to use a 64-bit integer.
The easiest (and probably the fastest) implementation of a bloom filter is thus:
byte[] bloomFilter = new byte[MyBloomFilterSize];
foreach (var item in myItems)
{
int hash = Hash(item) & 0x7FFFFFFF;
int bit = 1 << (hash & 7); // you have 8 bits
int index = (hash >> 3) % MyBloomFilterSize;
bloomFilter[hash % MyBloomFilterSize] |= bit;
}
You can experiment with changing the byte[]
to a uint[]
or a ulong[]
; I'm not sure if that makes a difference.
If you then want to check if an item might exist, you calculate the same index and bit, and get the result.
public bool PossiblyExists(MyItem item)
{
int hash = Hash(item) & 0x7FFFFFFF;
int bit = 1 << (hash & 7); // you have 8 bits
int index = (hash >> 3) % MyBloomFilterSize;
return (bloomFilter[hash % MyBloomFilterSize] & bit) != 0;
}
The only thing that remains here is the speed at which you can calculate the hash. If you're using an integer, I'd simply multiply it with a large prime number; if you're using a SHA256 fixed-length byte[] (which you seem to be doing), you need to make it an integer (or long).
I'm using a little trick here with Buffer.BlockCopy to convert the types. Just for safety I prefer using a few bytes from the data, but since SHA256 should already be random, a simple BitConverter.ToInt32(data, [0..28])
should also do the trick.
public int CalculateHash(byte[] data)
{
// Data = >128 bits = >16 bytes -- which is the same as >4 integers
int[] tmp = new int[4];
Buffer.BlockCopy(data, 0, tmp, 0, data.Length);
return tmp[0] ^ tmp[1] ^ tmp[2] ^ tmp[3];
}
That should do it.