Testing Unpredictable Functions

https://stackoverflow.com/questions/8569217

21-03-2021
|

سؤال

I'm currently messing about with implementing interesting data structures in Ruby and have reached a problem with testing functions that do not have a predictable output. I'm currently working on a Bloom Filter that I have included the implementation of below for completeness:

require "zlib"

class BloomFilter
  def initialize(size=100, hash_count=3)
    raise(ArgumentError, "negative or zero buffer size") if size <= 0
    raise(ArgumentError, "negative or zero hash count") if hash_count <= 0

    @size = size
    @hash_count = hash_count
    @buffer = Array.new(size, false)
  end

  def insert(element)
    hash(element).each { |i| @buffer[i] = true}
  end

  def maybe_include?(element)
    hash(element).map { |i| @buffer[i] }.inject(:&)
  end

  private :hash
  def hash(element)
    hashes = []

    1.upto(@hash_count) do |i|
      hashes << Zlib.crc32(element, i)
    end

    hashes.map { |h| h % @size }
  end
end

One of the problems with a Bloom Filter is that it has the possibility of returning false positives by falsely returning true for the inclusion of elements that have never been inserted into the filter.

Sometimes the filter behaves in a way that is easily testable:

b = BloomFilter.new(50, 5)

b.insert("hello")
puts b.maybe_include?("hello") # => true
puts b.maybe_include?("goodbye") # => false

However it sometimes bucks the trend and behaves in an unpredictable way. (I've reduced the size of the buffer here to find a conflict quickly.)

b = BloomFilter.new(5, 4)

b.insert("testing")
puts b.maybe_include?("testing") # => true
puts b.maybe_include?("not present") # => false
puts b.maybe_include?("false positive") # => true (oops)

So all of a sudden we have the string "false positive" providing a... false positive. My question is how can we test this?

If we choose values that just happen to work with our tests then I feel like the tests become far too fragile. For example, if we change the hashing function then we may still have a perfectly correct Bloom Filter that starts to fail some tests because of the values we chose to test the original implementation.
My second thought was to test that the filter behaves in a expected way by just checking that we get roughly the expected number of false positives from it by varying the number of hash functions and size of the internal buffer. While this approach may test the overall rough correctness of the filter I worry that it will not be able to catch bugs that cause it to report incorrect values for individual cases (such as false negatives).

Am I being too pessimistic about the effectiveness of the two methods of testing it above or am I missing a way to test classes such as the Bloom Filter which the output is unpredictable?

المحلول

You're right that choosing values that just happen to work is a bad idea. However, your second idea is not so bad.

You should always be able to test that the values that should be in the bloom filter are there. You could randomly generate a number of strings, and check that a threshold amount are false positives. This way if you change the hash function your unit tests will still work and will still report that the filter has an acceptable false positive ratio.

نصائح أخرى

Testing is about confirming your expectations. If you can't reason for yourself what the Bloom filter will return (considering the fragility, as you mentioned), you can't expect to have that expectation. (I swear I wasn't trying to make a pun :P)

My first gut feeling would be to confirm false positive percentage on N generated inputs on all the interesting hashing algorithms. This automates you as much security as you would have doing these tests manually.

To achieve this, I would recommend having the test code factored enough for you to express as simple as:

<warning> Unverified code </warning>

class BloomFilterTestCase << TestCase
  def bloom_incidence(alg, pop, false_positives)
    define_method("test_bloom_incidence_${alg}_${pop}_${false_positives}") do  
      # code code code
    end
  end

  bloom_incidence :naive, 50, 0.05
end

A Bloom filter is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not.

Just from the description of what the Bloom filter does, it should be clear that it makes no sense to test for false positives. It's inherently undefined what the result of a positive test is, so you can't make tests for it that expect a certain result. The only things you can guarantee and hence test for are:

the function returns a boolean value
the function doesn't throw any errors
there are no false negatives

مرخصة بموجب: CC-BY-SA مع الإسناد

لا تنتمي إلى StackOverflow