Populating a vector of numbers to expose an error in a function implementation

https://cs.stackexchange.com/questions/130116

29-09-2020
|

Question

So lets say I'm writing an algorithm that takes a vector as input. I want to know that I'm writing this algorithm correctly however so I of course write tests to see if the output equals what I expect on specific inputs. In fact I might do this for all vector sizes under a certain size, but I would never enumerate all possible values, just all possible vector sizes.

It occurs to me however that it might be possible to populate the values of the vector such that I can still verify every possible vector under a certain size as long as I make a certain assumption about the implementation of the function under test.

Say I have a function $f : \mathbb{N}^n \to \mathbb{N}$ such that $f(x) = \sum_{i=1}^nc_i\prod_{j=1}^n x_j^{e_{ij}}$ for some $e$ and $c$ such that $e_{ij} \in \{0,1\}$ and $c_i \in \{0, 1\}$. I want to see if this is equal to some known golden function $g$ but I'll have to manually work out $g$ for specific inputs. I belive it to be true that there exists an input $r$ such that $f(r)$ uniquely determines $f$. This makes testing to see if a particularly implementation of $f$ is the one we expect trivially easy.

for $n = 2$ the possible values of $f(3, 7)$ are the following: 0, 1, x = 3, y = 7, xy = 21, 2, 1+x = 4, 1+y = 8, 1+xy = 22, x+x = 6, x+y = 10, x+xy = 24, xy+xy = 42

So if I make the assumption that a partiuclar function I'm testing has the above from characterized by $c$ and $e$ then I only need one input to fully test an implementation of $f$.

A good heuristic for generating these inputs is to start with the first prime higher than the number of terms you have, and then try successively higher primes for each next input until you work out the few kinks that remain. The intuition is that no two products will be the same because you chose primes and that each unique product will be spaced out by so much eventually that the sums will also all be unique as well.

Is there an algorithm to generate an input to such a function that makes all possible such functions result in unique values? What if we're willing to be probabilistic in some sense such as "the expected output is probably unique for this input"? Are there other forms of functions for which this question has been studied?

Solution

Multilinear polynomials

If you're willing to use probabilistic methods, I suggest using a randomized algorithm for polynomial identity testing.

You want to test whether $f(x)=g(x)$ holds for all $x$, where $f,g$ are multilinear oolynomials. This is an instance of the polynomial identity testing problem. There are effective randomized algorithms for polynomial identity testing.

To make this answer self-contained, I'll give you a brief overview of a reasonable algorithm. First, randomly pick a large prime $p$. Next, randomly pick $x=(x_1,\dots,x_n)$, with each element $x_i$ chosen uniformly at random modulo $p$. Finally, check whether $p(x)\equiv g(x) \pmod p$. If no, then you know that your implementation is faulty. If yes, then your implementation has a good chance of being good. (In particular, a faulty implementation that computes some other multilinear polynomial has a low probability of going undetected, thanks to the Schwartz-Zipple lemma.) You can repeat this procedure multiple times for stronger guarantees.

This procedure assumes you can inspect the code of the algorithm and verify that it computes a multilinear polynomial, which seems to be something your question proposes that we can assume. If you have a complete black box and you have no knowledge whether it computes a multilinear polynomial or not, you can't get strong guarantees that $f$ will be correct everywhere: for instance, your implementation $f$ might be correct on all but one input where it is special-cased to outputs the wrong answer.

More general cases

You might be interested in the theory of self-testing, pioneered by Blum, Luby, Rubinfeld and further developed by others. See, e.g., the following seminal paper:

Self-Testing/Correcting with Applications to Numerical Problems. Manual Blum, Michael Luby, Ronitt Rubinfeld. Journal of computer and system sciences, 47(3), 549-595.

They generalize your idea by considering testers that invoke the program-under-test multiple times; by allowing probabilistic tests, where there is a chance we fail to notice problems, but this probability can be controlled or bounded; and by allowing for self-correction, where we use the possibly-faulty program as a subroutine to construct an output that will be correct with high probability. Nonetheless, despite these generalizations, the class of functions that we know how to build efficient self-testers for is fairly limited, and we don't know how to do it for arbitrary functions.

Licensed under: CC-BY-SA with attribution

Not affiliated with cs.stackexchange