Sample classification by probability

https://stackoverflow.com/questions/19728723

02-07-2022
|

Pergunta

I have fitted mixture distributions to multi-modal biological measurement data in order to group individuals accordingly (picture a multi-modal histogram of length measurements; assuming each mode represents a different age cohort I can infer numbers at age from the easily measured length data).

The mixture distribution provides posterior probabilities for each individual's membership to each mode, and so once binned by length class one line of data might look like:

   l.class freq age1  age2  age3  age5
       9   41    0.2  0.25   0.3  0.25

Where l.class is the length bin, freq is the number of individuals, and age1, age2, age3 and age5 are the probabilities of association with a given mixture mode / age group. As these are probabilities as opposed to proportions I wanted to iterate over each entry a number of times in order to get an estimate of numbers at age for each length bin.

I have tried using sample() to achieve this in R, but cannot get my head around the classification to one of a number of potential groups according to probability.

Solução

x <- sample(names(data1)[3:ncol(data1)], data1$freq, replace=T, prob=c(data1[i,3:ncol(data1)]))

Here is the approach I ended up using. I wanted to run the sampling in a loop in order to sample by probabilities a number of times (i.e. 1000), so I did this and then took the mean number of samples for each age class as my estimate.

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow