Question

I am looking for an algorithm to generate an array of N random numbers, such that the sum of the N numbers is 1, and all numbers lie within 0 and 1. For example, N=3, the random point (x, y, z) should lie within the triangle:

x + y + z = 1
0 < x < 1
0 < y < 1
0 < z < 1

Ideally I want each point within the area to have equal probability. If it's too hard, I can drop the requirement. Thanks.

Was it helpful?

Solution

Let us first assume that you want to sample within

x + y + z = 1
0 ≤ x ≤ 1
0 ≤ y ≤ 1
0 ≤ z ≤ 1

This doesn't make quite a difference, since the sample point will still lie in your requested area with high probability.

Now you are left with sampling a point from a simplex. In the 3d example you get a 2d simplex (triangle) realized in 3d.

How to pick a point uniformly at random was discussed in this blog post (see the comments).

For your problem it would mean that you take $n-1$ random numbers from the interval $(0,1)$, then you add a $0$ and $1$ to get a list of $n+1$ numbers. You sort the list and then you record the differences between two consecutive elements. This gives you a list of $n$ number that will sum up to $1$. Moreover this sampling is uniform. This idea can be found in Donald B. Rubin, The Bayesian bootstrap Ann. Statist. 9, 1981, 130-134.

For example ($n=4$) you have the three random numbers 0.4 0.2 0.1 then you obtain the sorted sequence 0 0.1 0.2 0.4 1 and this gives the differences 0.1 0.1 0.2 0.6, and by construction these four numbers sum up to 1.

Another approach is the following: first sample from the hypercube (that is you forget about x+y+z=1) and then normalize the sample point. The normalization is a projection from the $d$-hypercube to the $d-1$-simplex. It should be intuitively clear that the points at the center of the simplex have more "pre-image-points" than at the outside. Hence, if you sample uniformly from the hypercube, this wont give you a uniform sampling in the simplex. However, if you sample from the hypercube with an appropriate Exponential Distribution, than this effect cancels out. The Figure gives you an idea how both methods will sample. However, I prefer the "sorting" method due to its simple form. It's also easier to implement.

Example of the 2 sampling methods

OTHER TIPS

This is to add to the existing answers.

Devroye is an excellent reference for questions of this sort. Chap.7 gives the algorithms needed to generate uniform order statistics, which the OP is after.

For generating uniform order statistics, sorting $n$ samples of $[0,1]$ will do. This approach takes $O(n \log n)$ time. A quicker way ( available in the book) involves sampling $n$ random numbers $x_1,\ldots,x_n$ from an $\mathrm{Exp}(1)$ pdf. (These are the spacings of the uniform pdf). Then, return the values $$ (y_i)_{1\leq i\leq n} = \frac{\sum \limits_{1\ldots i} x_j}{\sum \limits_{1\ldots n} x_j} $$ which are automatically sorted, in $O(n)$ time overall. (I'm overlapping with A.Schulz's answer here- just making the computation more explicit).

The same approach can be adapted, via inverse CDF Sampling, to sample any non-uniform pdf over $[0,1]$. There's also a trick that enables you to sample uniformly over a simplex other than the canonical simplex (say $2x+3y+z = 5$).

X[0] = 0
for i = 1 to N-1
    X[i] = uniform(0,1)
X[n] = 1
sort X[0..N]
for i = 1 to N
    Z[i] = X[i] - X[i-1]
return Z[1..N]

Here, uniform(0,1) returns a real number independently and uniformly distributed between 0 and 1.

See this paper: Smith, N. and Tromble, R., Sampling uniformly from the unit simplex.

Licensed under: CC-BY-SA with attribution
Not affiliated with cs.stackexchange
scroll top