Pergunta

Let me start with an easy example of my problem.

Assume a universe of 10 people, where 1 person owns product A and 2 persons own product B
U=10, A=1, B=2

Now I want to find the chances that:
1) a person owns no products ==> (1 - 1/10) * (1 - 2/10) = 0.72
2) a person owns at least 1 product ==> 1 - ((1 - 1/10) * (1 - 2/10)) = 0.28
3) a person owns 2 products ==> (1/10) * (2/10) = 0.02

However, I would like to have a general algorithm that sorts all these options out if there are n products.

Input is given as follows
U <- 10
products <- c('A','B')
owned_by <- c(1, 2)
df <- data.frame(products, owned_by)

Foi útil?

Solução

I think a closed form solution would involve too many terms and become overly complex. So this problem looks like a perfect candidate for Monte Carlo method.

set.seed(1984)

U <- 10
products <- c('A','B')
owned_by <- c(1,2) 
df <- data.frame(products, owned_by)
p = rep(0, nrow(df)+1)
num.runs = 1000

for(n in 1:num.runs)
{
  x=c()  ## list of people who own a product
  for (i in 1:nrow(df))
    x = c(x, sample(1:U, df$owned_by[i]))

  ## get the number of people who own 0, 1, 2...products
  p[1] = p[1] + (sum(hist(x,breaks=0:U,plot=F)$counts == 0) / U)
  for(i in 1:nrow(df))
    p[i+1] = p[i+1] + (sum(hist(x,breaks=0:U,plot=F)$counts >= i) / U)
}

p = p / num.runs ## average over all runs
p

## 0.7197 0.2803 0.0197
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top