Random generation of C programs with floating-point

https://stackoverflow.com/questions/8459976

13-03-2021
|

Question

Does anyone know a random generator of C programs that include floating-point computations?

I am looking for something that would be a little bit like Csmith, except that Csmith does not generate floating-point expressions, and that it generates tons of other constructs, making it a little difficult to modify. Generating sequential computations would be a good start for my purpose as long as these included some floating-point ones. Conditionals would be even better, but I wouldn't need loops, pointers, or even arrays.

Since so many languages use a C-like syntax, such a generator may not have to be specific to C. Even if it's specific to another C-like language, I might be able to text-process a generated program for that language into a C program.

EDIT: here is a snippet of a Csmith-generated program to clarify what I am looking for.

...
int64_t *l_374 = &g_189;
int32_t l_375 = (-1L);
int i, j, k;
l_375 &= ((g_106 == ((*l_374) = (&g_324[4] == l_373[0][0][5]))) < 0x80C8L);
return (*g_207);
...

I should also clarify that while taking a Csmith program and substituting, say, int64_t with float may give a syntactically correct C program, it will almost certainly not give a defined program. I can test whether a substituted program contains undefined behavior, but this is not cheap, and if I have to reject 99% of substituted programs because they are undefined, the process will be too slow to be useful.

No correct solution

OTHER TIPS

I have started on a small floating-point fuzzer. It does little for now, but you have to start with something.

Here is an example of use for comparing compilers generating SSE2 instructions, that I claim have no excuse for generating differing results:

#include <stdio.h>
double x0 = 35945970.47e-83;
double x1 = (973e-37+(5626073.612783921311173024e-76*231.106261545926274055e1*66390306733994e-1*420514.99786508*654374994.1249111e-35*5201.6039804e56)+(2.93604195+33e-50)+(969222843.32046212043603+1734e01)+(0166605914e8+6701040019050623e-23+32591206968562.6e-11+90771798.753788905)+(328e-49/944642906580982081e7));

int main(){
  x0 = (((x1*534425399171e0)*(x1*x0*x0)*(x1*x0*57063248719.703555336277392e-36*x0*472e57*65189741246535e-1)*x1*(x1/22393742341e70)*(x1+x0+x0+x0))-((843193503867271987e3*61.949746266e23*x1*x1*x0)/(x1/x1)));
  x0 = ((x0+x1+x1+x1+x0)-(x0*506680.0005767722e66*396.650621163*70798334426455964.1*x1*305369e14));
  x1 = 660098705340e-21;
  printf("%a\n", x0);
}

For this program, gcc and clang (which on this platform generate SSE2 instructions) generate executables that compute the same thing:

~/genfloat $ gcc t.c ; ./a.out 
0x1.5c5a77a63c1d6p+430
~/genfloat $ clang t.c ; ./a.out 
0x1.5c5a77a63c1d6p+430

I also intend to test a static analyzer that is supposed to predict all possible results that can be obtained with a program compiled with x87 instructions, spilling some intermediate results to double-precision memory locations in an unpredictable fashion:

~/genfloat $ frama-c -val -float-hex -all-rounding-modes t.c 
...
      x0 ∈ [0x1.5c5a77a63c1cap430 .. 0x1.5c5a77a63c1e8p430]

The above is a strong claim that needs to be tested.

My manydl.c program is doing something very similar (on integers). You might adapt it quite easily to your needs.

I wrote that as a tiny hack to convince some people, notably Jacques Pitrat, that a Linux system can dlopen a very big lot (more than hundred of thousands) of shared objects, that program generate random C code -focused on integers- and compiles and dlopen-s then executes a lot of them. You could adapt it to floating point needs. I designed my manydl.c so that it generates random but terminating C programs, so you could adapt it to float (just choose operations which are terminating and cheap, like I did).

Ask me more at coffee time

^{(since we are close colleagues)}

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow