Determine Frequency of numbers showing up in dice rolls

https://stackoverflow.com/questions/493239

20-08-2019
|

Question

For a game I'm trying to determine the frequency that a certain # will show up at a given # of dice being rolled. I know... that question seems odd. Let me try to explain it with real numbers.

So, for 1 die, the frequency for each number will be identical. 1-6 will show up equal number of times.

Now for 2 dice, things get different. I imagine 5,6,7 are going to be the most frequently rolled, while numbers at both ends of the spectrum will show up less or not at all (in the case of 1). I'd like to know how to calculate this list and show them in the proper order, from most frequent to less frequent.

Any thoughts?

@duffymo - It would be really nice though to have some sort of an algorithm to come up with it. It seems that the above way is going to require a lot of hand picking and placing of numbers. If my die count is dynamic up to say 10, doing that by hand will be ineffecient and troublesome I think. :)

Solution

Rough draft of a recursive way to do it:

public static IEnumerable<KeyValuePair<int, int>> GetFrequenciesByOutcome(int nDice, int nSides)
{
    int maxOutcome = (nDice * nSides);
    Dictionary<int, int> outcomeCounts = new Dictionary<int, int>();
    for(int i = 0; i <= maxOutcome; i++)
        outcomeCounts[i] = 0;

    foreach(int possibleOutcome in GetAllOutcomes(0, nDice, nSides))
        outcomeCounts[possibleOutcome] = outcomeCounts[possibleOutcome] + 1;

    return outcomeCounts.Where(kvp => kvp.Value > 0);
}

private static IEnumerable<int> GetAllOutcomes(int currentTotal, int nDice, int nSides)
{
    if (nDice == 0) yield return currentTotal;
    else
    {
        for (int i = 1; i <= nSides; i++)
            foreach(int outcome in GetAllOutcomes(currentTotal + i, nDice - 1, nSides))
                yield return outcome;
    }
}

Unless I'm mistaken, this should spit out KeyValuePairs organized like [key, frequency].

EDIT: FYI, after running this, it shows the frequencies for GetFrequenciesByOutcome(2, 6) to be:

2: 1

3: 2

4: 3

5: 4

6: 5

7: 6

8: 5

9: 4

10: 3

11: 2

12: 1

OTHER TIPS

There are 6*6 = 36 combinations for two dice.

2 = 1+1 can only appear once, so its frequency is 1/36. 3 = 1+2 or 2+1, so its frequency is 2/36 = 1/18. 4 = 1+3, 2+2, or 3+1, so its frequency is 3/36 = 1/12.

You can do the rest out to twelve.

Any backgammon player knows these well.

There is no real "algorithm" or simulation necessary - it's a simple calculation based on a formula derived by De Moivre:

http://www.mathpages.com/home/kmath093.htm

And it's not a "bell curve" or normal distribution.

Add up the array of frequency of previous rolls, 'side number' times by shifting it's position, then you will get the array of frequencies each numbers show up.

1, 1, 1, 1, 1, 1  # 6 sides, 1 roll

1, 1, 1, 1, 1, 1
   1, 1, 1, 1, 1, 1
      1, 1, 1, 1, 1, 1
         1, 1, 1, 1, 1, 1
            1, 1, 1, 1, 1, 1
+              1, 1, 1, 1, 1, 1
_______________________________
1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1  # 6 sides, 2 rolls

1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1
   1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1
      1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1
         1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1
            1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1
+              1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1
______________________________________________
1, 3, 6,10,15,21,25,27,27,25,21,15,10, 6, 3, 1  # 6 sides, 3 rolls

This is much faster than brute force simulation, since simple equation is the best. Here is my python3 implementation.

def dice_frequency(sides:int, rolls:int) -> list:
    if rolls == 1:
        return [1]*sides
    prev = dice_frequency(sides, rolls-1)
    return [sum(prev[i-j] for j in range(sides) if 0 <= i-j < len(prev))
            for i in range(rolls*(sides-1)+1)]

for example,

dice_frequency(6,1) == [1, 1, 1, 1, 1, 1]
dice_frequency(6,2) == [1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1]
dice_frequency(6,3) == [1, 3, 6, 10, 15, 21, 25, 27, 27, 25, 21, 15, 10, 6, 3, 1]

Note that, you should use 'target number - roll count' as index of the list to get frequency of each number. If you want to get probabilities, use 'side number'^'roll count' as a denominator.

sides = 6
rolls = 3
freq = dice_frequency(sides,rolls)
freq_sum = sides**rolls
for target in range(rolls,rolls*sides+1):
    index = target-rolls
    if 0 <= index < len(freq):
        print("%2d : %2d, %f" % (target, freq[index], freq[index]/freq_sum))
    else:
        print("%2d : %2d, %f" % (target, 0, 0.0))

This code yeilds

 3 :  1, 0.004630
 4 :  3, 0.013889
 5 :  6, 0.027778
 6 : 10, 0.046296
 7 : 15, 0.069444
 8 : 21, 0.097222
 9 : 25, 0.115741
10 : 27, 0.125000
11 : 27, 0.125000
12 : 25, 0.115741
13 : 21, 0.097222
14 : 15, 0.069444
15 : 10, 0.046296
16 :  6, 0.027778
17 :  3, 0.013889
18 :  1, 0.004630

There's lots of stuff online about dice probability. Here's one link that helped me out with a Project Euler question:

http://gwydir.demon.co.uk/jo/probability/calcdice.htm

Neat factoid...

Did you know that Pascal's triangle is the probability distribution of the sums of N 2-sided dice?

   1 1    - 1 die, 1 chance at 1, 1 chance at 2
  1 2 1   - 2 dice, 1 chance at 2, 2 chances at 3, 1 chance at 4
 1 3 3 1  - 3 dice, 1 chance at 3, 3 chances at 4, 3 chances at 5, 1 chance at 6 
1 4 6 4 1 - etc.

JavaScript implementation using dynamic function creation:

<script>
var f;
function prob(dice, value)
 {
var f_s = 'f = function(dice, value) {var occur = 0; var a = [];';
for (x = 0; x < dice; x++)
 {
f_s += 'for (a[' + x + '] = 1; a[' + x + '] <= 6; a[' + x + ']++) {';
 }
f_s += 'if (eval(a.join(\'+\')) == value) {occur++;}';
for (x = 0; x < dice; x++)
 {
f_s += '}';
 }
f_s += 'return occur;}';
eval(f_s);
var occ = f(dice, value);
return [occ, occ + '/' + Math.pow(6, dice), occ / Math.pow(6, dice)];
 };

alert(prob(2, 12)); // 2 die, seeking 12
                    // returns array [1, 1/36, 0.027777777777777776]
</script>

EDIT: Rather disappointed nobody pointed this out; had to replace 6 * dice with Math.pow(6, dice). No more mistakes like that...

There seems to be some mystery surrounding exactly "why" this is, and although duffymo has explained part of it, I'm looking at another post that says:

There should be no reason why 5, 6 and 7 should be rolled more [than 2] since the first roll of the die is a independent event from the second roll of the die and both of them have equal probablity of 1-6 of being rolled.

There's a certain appeal to this. But it's incorrect...because the first roll affects the chances. The reasoning can probably most easily be done through an example.

Say I'm trying to figure out if the probability of rolling 2 or 7 is more likely on two dice. If I roll the first die and get a 3, what are my chances now of rolling a total of 7? Obviously, 1 in 6. What are my chances of rolling a total of 2? 0 in 6...because there's nothing I can roll on the second die to have my total be 2.

For this reason, 7 is very (the most) likely to be rolled...because no matter what I roll on the first die, I can still reach the correct total by rolling the right number on the second die. 6 and 8 are equally slightly less likely, 5 and 9 more so, and so on, until we reach 2 and 12, equally unlikely at 1 in 36 chance apiece.

If you plot this (sum vs likelyhood) you'll get a nice bell curve (or, more precisely, a blocky aproximation of one because of the discrete nature of your experiment).

After a lot of searching on the Internet and stackoverflow, I found Dr. Math explains it well in a working function (a link in another answer has an incorrect formula). I converted Dr. Math's formula to C# and my nUnit tests (which had been failing before with other attempts at the code) all passed.

First I had to write a few helper functions:

  public static int Factorial(this int x)
  {
     if (x < 0)
     {
        throw new ArgumentOutOfRangeException("Factorial is undefined on negative numbers");
     }
     return x <= 1 ? 1 : x * (x-1).Factorial();
  }

Because of the way choose works in mathematics, I realized I could cut down on the calculations if I had an overloaded Factorial function with a lower bound. This function can bail out when the lower bound is reached.

  public static int Factorial(this int x, int lower)
  {
     if (x < 0)
     {
        throw new ArgumentOutOfRangeException("Factorial is undefined on negative numbers");
     }
     if ((x <= 1) || (x <= lower))
     {
        return 1;
     }
     else
     {
        return x * (x - 1).Factorial(lower);
     }
  }

  public static int Choose(this int n, int r)
  {
     return (int)((double)(n.Factorial(Math.Max(n - r, r))) / ((Math.Min(n - r, r)).Factorial()));
  }

When those were in place, I was able to write

  public static int WaysToGivenSum(int total, int numDice, int sides)
  {
     int ways = 0;
     int upper = (total - numDice) / sides; //we stop on the largest integer less than or equal to this
     for (int r = 0; r <= upper; r++)
     {
        int posNeg = Convert.ToInt32(Math.Pow(-1.0, r)); //even times through the loop are added while odd times through are subtracted
        int top = total - (r * sides) - 1;
        ways += (posNeg * numDice.Choose(r) * top.Choose(numDice - 1));
     }
     return ways;
  }

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow