Domanda

I want to calculate the median of a frequency distribution for a large number of samples. Each of the samples have a number of classes (3 in the example) and their respective frequencies. Each of the classes is associated with a different value

data <- data.frame(sample=c(1,2,3,4,5), 
                   freq_class1=c(1,1,59,10,2), 
                   freq_class2=c(1,0,35,44,22), 
                   freq_class3=c(0,4,1,9,2), 
                   value_class1=c(12,11,14,11,13), 
                   value_class2=c(27,33,34,31,29), 
                   value_class3=c(75,78,88,81,65))

For example the median of sample 1 would be 19.5. I assume that this can be done using quantile() on the frequency distribution of each sample, but all attempts failed.

Do you have any suggestion?

È stato utile?

Soluzione

This is probably not the most elegant way, but it works: basically, I'm recreating the full data vector from the information contained in the data.frame, then finding the median of that. Writing a function to do it lets me use apply to quickly do it to each row of the data.frame.

find.median <- function(x) {
  full.x <- rep(x[5:7],times=x[2:4])
  return(median(full.x))
}

> apply(data,1,find.median)
[1] 19.5 78.0 14.0 31.0 29.0
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top