Strange behaviour with Gaussian random distribution

https://stackoverflow.com/questions/21458906

05-10-2022
|

Question

I'm running a bit of code whose purpose is to take a list/array of floats and an associated list/array of the same length acting as an "error" and shuffle the first list around according to a Gaussian distribution.

This is a MWE of the code:

import random
import numpy as np

def random_data(N, a, b):
    # Generate some random data.
    return np.random.uniform(a, b, N).tolist()

# Obtain values for x.
x = random_data(100, 0., 1.)
# Obtain error/sigma values for x.
x_sigma = random_data(100, 0., 0.2)

# Generate new x values shuffling each float around a
# Gaussian distribution with a given sigma.
x_gauss = random.gauss(np.array(x), np.array(x_sigma))

print x-x_gauss

What I find is that the result of doing x-x_gauss is a list of floats that is always either positive or negative. This means the random.gauss call is always assigning either a larger new value for each float in x or a smaller one for all values in x.

I would expect the random.gauss call to shuffle the floats in x around its values both to the right and to the left, since this is a random process.

Why is this not happening? Am I understanding something wrong about the process?

La solution

This is the definition of random.gauss:

def gauss(self, mu, sigma):
    random = self.random
    z = self.gauss_next
    self.gauss_next = None
    if z is None:
        x2pi = random() * TWOPI
        g2rad = _sqrt(-2.0 * _log(1.0 - random()))
        z = _cos(x2pi) * g2rad
        self.gauss_next = _sin(x2pi) * g2rad

    return mu + z*sigma

Notice that is is generating one value for z, and returning mu + z*sigma. Since mu and sigma are numpy arrays, this calculation is being done element-wise. Since sigma is positive, the shift z*sigma is either always positive or negative, depending on the sign of z

If you are using NumPy, unless there is a specific reason to do otherwise, I would use the np.random module to generate these values. It would be quicker than using a Python loop with calls to random.gauss:

import numpy as np

N = 100
x = np.random.uniform(0., 1., size=N)
x_sigma = np.random.uniform(0., 0.2, size=N)

z = np.random.normal(0, 1, size=N)
x_gauss = x + z*x_sigma

print x-x_gauss

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow