Pergunta

I want to calculate the median of a frequency distribution for a large number of samples. Each of the samples have a number of classes (3 in the example) and their respective frequencies. Each of the classes is associated with a different value

data <- data.frame(sample=c(1,2,3,4,5), 
                   freq_class1=c(1,1,59,10,2), 
                   freq_class2=c(1,0,35,44,22), 
                   freq_class3=c(0,4,1,9,2), 
                   value_class1=c(12,11,14,11,13), 
                   value_class2=c(27,33,34,31,29), 
                   value_class3=c(75,78,88,81,65))

For example the median of sample 1 would be 19.5. I assume that this can be done using quantile() on the frequency distribution of each sample, but all attempts failed.

Do you have any suggestion?

Foi útil?

Solução

This is probably not the most elegant way, but it works: basically, I'm recreating the full data vector from the information contained in the data.frame, then finding the median of that. Writing a function to do it lets me use apply to quickly do it to each row of the data.frame.

find.median <- function(x) {
  full.x <- rep(x[5:7],times=x[2:4])
  return(median(full.x))
}

> apply(data,1,find.median)
[1] 19.5 78.0 14.0 31.0 29.0
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top