How can I sample from a complex or compound distribution in Haskell?

https://stackoverflow.com/questions/10827221

11-06-2021
|

Question

I'm trying to generate random masses for hypothetical planets in Haskell. I want to produce these masses by sampling a bi-modal distribution (ideally the superposition of two normal distributions: one corresponding to small planets and one corresponding to gas giants). I've looked at the statistics package, which provides the quantile function, which can turn a uniformly distributed Double into a Double on a number of distributions. But there doesn't seem to be any support for composing distributions.

This particular case could be hacked around by picking one distribution or the other to sample before-hand, but I'd like to do it with a single distribution, especially since I might need to tweak the overall distribution later. Eventually I might replace the normal distribution with real data from sky surveys.

I'm considering implementing rejection sampling myself, which can handle arbitrary distributions fairly simply, but it seems rather inefficient, and it certainly wouldn't be a good idea to implement it if a solution exists already as a library.

Is there a Haskell library that supports sampling from composed or explicitly specified distributions? Or an existing Haskell implementation of rejection sampling? Alternatively, is there an explicit formula for the inverse of the CDF of the sum of two normal distributions?

Solution

In the case of a simple mixture of distributions, you can get an efficient sampler via the 'hack' you first mentioned:

This particular case could be hacked around by picking one distribution or the other to sample before-hand, but I'd like to do it with a single distribution, especially since I might need to tweak the overall distribution later.

This is actually a case of Gibbs sampling, which is very prevalent in statistics. It's very flexible, and if you know the number of mixtures you're using, it will probably be hard to beat. Choose one individual distribution from the entire ensemble to sample from, and then sample from that conditional distribution. Rinse and repeat.

Here's a simple, unoptimized Haskell implementation for a mixture-of-Gaussians Gibbs sampler. It's pretty basic, but you get the idea:

import System.Random
import Control.Monad.State

type ModeList = [(Double, Double)]                 -- A list of mean/stdev pairs, for each mode.

-- Generate a Gaussian (0, 1) variate.
boxMuller :: StdGen -> (Double, StdGen)
boxMuller gen = (sqrt (-2 * log u1) * cos (2 * pi * u2), gen'')
    where (u1, gen')  = randomR (0, 1) gen 
          (u2, gen'') = randomR (0, 1) gen'

sampler :: ModeList -> State StdGen Double
sampler modeInfo = do
    gen <- get
    let n           = length modeInfo
        (z0, g0)    = boxMuller gen
        (c,  g1)    = randomR (0, n - 1) g0        -- Sample from the components.
        (cmu, csig) = modeInfo !! c                
    put g1
    return $ cmu + csig * z0                       -- Sample from the conditional distribution.

Here's a example run: sampling 100 times from a one-dimensional mixture of two Gaussians. The modes are at x = -3 and x = 2.5, and each mixture component has its own separate variance. You could add as many modes as you want here.

main = do
let gen      = mkStdGen 42
    modeInfo = [(2.5, 1.0), (-3, 1.5)]
    samples     = (`evalState` gen) . replicateM 100 $ sampler modeInfo
print samples

Here's a smoothed density plot of those 100 samples (using R and ggplot2):

a mixture of gaussians

A more general purpose algorithm would be a rejection or importance sampler, and in the case of more complicated distributions you're probably going to want to hand-roll an appropriate MCMC routine. Here is a good introduction to Monte Carlo and MCMC.

OTHER TIPS

Hmmmm. The best way I'm familiar with is to adapt the MonadRandom package to get a "probability monad", borrowing some tools from http://en.wikipedia.org/wiki/Normal_distribution#Generating_values_from_normal_distribution :

getRandomStrictlyBetween :: (Ord a, Random a, RandomGen m) => 
    (a, a) -> a
getRandomStrictlyBetween (lo, hi) = do
  x <- getRandomR (lo, hi)
  -- x is uniformly randomly chosen from the *closed* interval
  if lo < x && x < hi then return x else getRandomStrictlyBetween (lo, hi)

normalValue :: MonadRandom m => m Double
normalValue = do
  u <- getRandomStrictlyBetween (0, 1)
  v <- getRandomStrictlyBetween (0, 2 * pi)
  return (sqrt (-2 * log u) * cos v) -- according to Wikipedia

and then you can derive more or less arbitrary distributions; for example, to get the distribution of a random variable that is y with probability p and z with probability (1 - p), you just write

do alpha <- getRandom -- double chosen from [0, 1)
   if alpha < p then y else z

of which bimodal distributions appear to be a special case. To sample from these distributions, just do evalRandIO distribution to sample in the IO monad.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow