Calculate exact result of complex throw of two D30

https://stackoverflow.com/questions/302379

08-07-2019
|

Question

Okay, this bugged me for several years, now. If you sucked in statistics and higher math at school, turn away, now. Too late.

Okay. Take a deep breath. Here are the rules. Take two thirty sided dice (yes, they do exist) and roll them simultaneously.

Add the two numbers
If both dice show <= 5 or >= 26, throw again and add the result to what you have
If one is <= 5 and the other >= 26, throw again and subtract the result from what you have
Repeat until either is > 5 and < 26!

If you write some code (see below), roll those dice a few million times and you count how often you receive each number as the final result, you get a curve that is pretty flat left of 1, around 45° degrees between 1 and 60 and flat above 60. The chance to roll 30.5 or better is greater than 50%, to roll better than 18 is 80% and to roll better than 0 is 97%.

Now the question: Is it possible to write a program to calculate the exact value f(x), i.e. the probability to roll a certain value?

Background: For our role playing game "Jungle of Stars" we looked for a way to keep random events in check. The rules above guarantee a much more stable outcome for something you try :)

For the geeks around, the code in Python:

import random
import sys

def OW60 ():
    """Do an open throw with a "60" sided dice"""
    val = 0
    sign = 1

    while 1:
        r1 = random.randint (1, 30)
        r2 = random.randint (1, 30)

        #print r1,r2
        val = val + sign * (r1 + r2)
        islow = 0
        ishigh = 0
        if r1 <= 5:
            islow += 1
        elif r1 >= 26:
            ishigh += 1
        if r2 <= 5:
            islow += 1
        elif r2 >= 26:
            ishigh += 1

        if islow == 2 or ishigh == 2:
            sign = 1
        elif islow == 1 and ishigh == 1:
            sign = -1
        else:
            break

        #print sign

    #print val
    return val

result = [0] * 2000
N = 100000
for i in range(N):
    r = OW60()
    x = r+1000
    if x < 0:
        print "Too low:",r
    if i % 1000 == 0:
        sys.stderr.write('%d\n' % i)
    result[x] += 1

i = 0
while result[i] == 0:
    i += 1

j = len(result) - 1
while result[j] == 0:
    j -= 1

pSum = 0
# Lower Probability: The probability to throw this or less
# Higher Probability: The probability to throw this or higher
print "Result;Absolut Count;Probability;Lower Probability;Rel. Lower Probability;Higher Probability;Rel. Higher Probability;"
while i <= j:
    pSum += result[i]
    print '%d;%d;%.10f;%d;%.10f;%d;%.10f' % (i-1000, result[i], (float(result[i])/N), pSum, (float(pSum)/N), N-pSum, (float(N-pSum)/N))
    i += 1

Solution

I had to first rewrite your code before I could understand it:

def OW60(sign=1):
    r1 = random.randint (1, 30)
    r2 = random.randint (1, 30)
    val = sign * (r1 + r2)

    islow  = (r1<=5)  + (r2<=5)
    ishigh = (r1>=26) + (r2>=26)

    if islow == 2 or ishigh == 2:
        return val + OW60(1)
    elif islow == 1 and ishigh == 1:
        return val + OW60(-1)
    else:
        return val

Maybe you might find this less readable; I don't know. (Do check if it is equivalent to what you had in mind.) Also, regarding the way you use "result" in your code -- do you know of Python's dicts?

Anyway, matters of programming style aside: Suppose F(x) is the CDF of OW60(1), i.e.

F(x) = the probability that OW60(1) returns a value ≤ x.

Similarly let

G(x) = the probability that OW60(-1) returns a value ≤ x.

Then you can calculate F(x) from the definition, by summing over all (30×30) possible values of the result of the first throw. For instance, if the first throw is (2,3) then you'll roll again, so this term contributes (1/30)(1/30)(5+F(x-5)) to the expression for F(x). So you'll get some obscenely long expression like

F(x) = (1/900)(2+F(x-2) + 3+F(x-3) + ... + 59+F(x-59) + 60+F(x-60))

which is a sum over 900 terms, one for each pair (a,b) in [30]×[30]. The pairs (a,b) with both ≤ 5 or both ≥26 have a term a+b+F(x-a-b), the pairs with one ≤5 and one ≥26 have a term a+b+G(x-a-b), and the rest have a term like (a+b), because you don't throw again.

Similarly you have

G(x) = (1/900)(-2+F(x-2) + (-3)+F(x-3) + ... + (-59)+F(x-59) + (-60)+F(x-60))

Of course, you can collect coefficients; the only F terms that occur are from F(x-60) to F(x-52) and from F(x-10) to F(x-2) (for a,b≥26 or both≤5), and the only G terms that occur are from G(x-35) to G(x-27) (for one of a,b≥26 and the other ≤5), so there are fewer terms than 30 terms. In any case, defining the vector V(x) as

V(x) = [F(x-60) G(x-60) ... F(x-2) G(x-2) F(x-1) G(x-1) F(x) G(x)]

(say), you have (from those expressions for F and G) a relation of the form

V(x) = A*V(x-1) + B

for an appropriate matrix A and an appropriate vector B (which you can calculate), so starting from initial values of the form V(x) = [0 0] for x sufficiently small, you can find F(x) and G(x) for x in the range you want to arbitrarily close precision. (And your f(x), the probability of throwing x, is just F(x)-F(x-1), so that comes out as well.)

There might be a better way. All said and done, though, why are you doing this? Whatever kind of distribution you want, there are nice and simple probability distributions, with the appropriate parameters, that have good properties (e.g. small variance, one-sided errors, whatever). There is no reason to make up your own ad-hoc procedure to generate random numbers.

OTHER TIPS

I've done some basic statistics on a sample of 20 million throws. Here are the results:

Median: 17 (+18, -?) # This result is meaningless
Arithmetic Mean: 31.0 (±0.1)
Standard Deviation: 21 (+1, -2)
Root Mean Square: 35.4 (±0.7)
Mode: 36 (seemingly accurate)

The errors were determined experimentally. The arithmetic mean and the mode are really accurate, and changing the parameters even quite aggressively doesn't seem to influence them much. I suppose the behaviour of the median has already been explained.

Note: don't take these number for a proper mathematical description of the function. Use them to quickly get a picture of what the distribution looks like. For anything else, they aren't accurate enough (even though they might be precise.

Perhaps this is helpful to someone.

Edit 2:

graph

Based on just 991 values. I could've crammed more values into it, but they would've distorted the result. This sample happens to be fairly typical.

Edit 1:

here are the above values for just one sixty-sided die, for comparison:

Median: 30.5
Arithmetic Mean: 30.5
Standard Deviation: 7.68114574787
Root Mean Square: 35.0737318611

Note that these values are calculated, not experimental.

Compound unbounded probability is... non-trivial. I was going to tackle the problem the same way as James Curran, but then I saw from your source code that there could be a third set of rolls, and a fourth, and so on. The problem is solvable, but far beyond most die rolling simulators.

Is there any particular reason that you need a random range from -Inf to +Inf with such a complex curve around 1-60? Why is the bell curve of 2D30 not acceptable? If you explain your requirements, it is likely someone could provide a simpler and more bounded algorithm.

Well, let's see. The second throw (which will sometimes be added or subtracted to the first roll) has a nice easily predictable bell curve around 31. The first roll, of course, is the problem.

For the first roll, we have 900 possible combinations.

50 combinations result in adding the second roll.
25 combinations result in subtracting the second roll.
Leaving 825 combinations which match the bell curve of the second roll.

The subtracting set (pre-subtraction) will form a bell curve in the range (27..35). The lower half of the adding set will form a bell curve in the range (2..10), while the upper half will form a bell curve in the range (52...60)

My probablity is a bit rusty, so I can't figure the exact values for you, but it should be clear that these lead to predictable values.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow