Question

I'm looking for an efficient way to generate random floating-point numbers on the open-open interval (0,1). I currently have an RNG that generates random integers on the closed-closed interval of [0, (2^32)-1]. I've already created a half-open floating point RNG on the interval [0,1) by simply multiplying my result from the integer RNG by 1/((2^32)-1) rather than dividing by (2^32)-1 since it's inefficient.

The way I'm currently going about generating numbers on the interval (0,1) is with a conditional statement like the one below:

float open_open_flt = (closed_open_flt==0) ? closed_open_flt : FLT_MIN; 

Unfortunately, this is rather inefficient since it is control code and I feel like it introduces some bias.

Can anybody suggest an alternative?

Was it helpful?

Solution

You are already there.

The smallest distance between two floats your current generator produces is 1/(2^32).

So, your generator is efectively producing [0,1-1/(2^32)].

1/(2^32) is greater than FLT_MIN.

Thus if you add FLT_MIN to your generator,

float open_open_flt = FLT_MIN + closed_open_flt;

you'll get [FLT_MIN,1-(1/(2^32))+FLT_MIN], which works as a (0,1) generator.

OTHER TIPS

Since the probability of actually observing 0 probability is very small, and checking if a number is equal to 0 is least expensive (as compared to addition or multiplication), I would regenerate the random number repeatedly until it is not equal to 0.

Given a sample x selected randomly from [0, 232), I propose using:

0x1.fffffep-32 * x + 0x1p-25

Reasoning:

  • These values are such that the highest x produces slightly less than 1-2-25 before rounding, so it is rounded to the largest float less than 1, which is 1-2-24. If we made it any larger, some values would round to 1, which we do not want. If we made it smaller, fewer values would round to 1-2-24, so it would be less represented than we desire (more on this below).
  • The values are such that the lowest x produces 2-25. This produces some symmetry: The distribution is compelled to stop at the high side 1-2-25 before rounding, as explained above, so we make it symmetric on the bottom side, stopping at 0+2-25. To some extent, it is as if we are binning the real number line in bins of width 2-24 and then removing the bins centered on 0 and 1 (which extend 2-25 to either side of those numbers).
  • Each bin that we retain contains about the same number of sample values. However, different float values show up in the bins, because the resolution of float varies. It is finer near 0 and coarser near 1. With this arrangement, each bin is about uniformly represented, but the lower bins will have more samples with lower probability each. The overall distribution remains uniform.
  • We could extend the low end so that it is closer to zero. But then, for most d in (0, ½), there would be more samples in (0, d) than in (1-d, 1), so the distribution would be asymmetric.

As you can see, the floating-point format forces some irregularities in a distribution from 0 to 1. This issue has been raised in other Stack Overflow questions but never thoroughly discussed, to my knowledge. Whether it suits your purposes to leave these irregularities as described above depends on your application.

Potential variations:

  • Quantize all the samples so they occur at regularly spaced intervals, 2-24, rather than being finer where the float format is finer.
  • Allow values closer to 1 before rounding but convert them to 1-2-24 after rounding, and lower the bottom endpoint to match. This reduces the excluded segments around 0 and around 1 at the expense of increasing the number of values clumped into 1-2-24 because the resolution is not fine enough for more distinction.
  • Switch to double. Then there is a 1-1 map from original x values to floating-point values, and you can likely get as close to 0 and 1 as desired.

Also, contrary to ElKamina’s answer, floating-point comparison (even to zero) is not generally faster than addition. Comparison requires branching on the result, which is an issue in many modern CPUs.

I'm looking for an efficient way to generate random floating-point numbers on the open-open interval (0,1). I currently have an RNG that generates random integers on the closed-closed interval of [0, (2^32)-1]. I've already created a half-open floating point RNG on the interval [0,1) by simply multiplying my result from the integer RNG by 1/((2^32)-1)

This means that your generator 'tries' to produce 2^32 different values. Problem is, float type is 4 bytes long, thus having less than 2^32 distinct defined values overall. To be precise, there can be only 2^23 values on interval [1/2, 1). Depending on what you need it may be a problem or not.

You may want to use lagged Fibonacci generator (wiki) with iteration iteration formula from russian wiki
This already produces numbers from [0,1), given that initial values belong to that interval and may be good enough for your purposes.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top