문제

I want to calculate the median of a frequency distribution for a large number of samples. Each of the samples have a number of classes (3 in the example) and their respective frequencies. Each of the classes is associated with a different value

data <- data.frame(sample=c(1,2,3,4,5), 
                   freq_class1=c(1,1,59,10,2), 
                   freq_class2=c(1,0,35,44,22), 
                   freq_class3=c(0,4,1,9,2), 
                   value_class1=c(12,11,14,11,13), 
                   value_class2=c(27,33,34,31,29), 
                   value_class3=c(75,78,88,81,65))

For example the median of sample 1 would be 19.5. I assume that this can be done using quantile() on the frequency distribution of each sample, but all attempts failed.

Do you have any suggestion?

도움이 되었습니까?

해결책

This is probably not the most elegant way, but it works: basically, I'm recreating the full data vector from the information contained in the data.frame, then finding the median of that. Writing a function to do it lets me use apply to quickly do it to each row of the data.frame.

find.median <- function(x) {
  full.x <- rep(x[5:7],times=x[2:4])
  return(median(full.x))
}

> apply(data,1,find.median)
[1] 19.5 78.0 14.0 31.0 29.0
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 StackOverflow
scroll top