Question

thanks in advance for the help. I am working with a series of .csv files that contain data in the following format:

ID<-c(1,1,1,1,2,2,3,3,3,4,4,4,4,5,5,6,7,7)
Length<-c(3,3,4,7,6,4,7,8,8,9,3,2,4,3,6,8,5,3)
dummydata<-cbind(ID,Length)

dummydata<-cbind(ID,Length)

> dummydata
      ID Length
 [1,]  1      3
 [2,]  1      3
 [3,]  1      4
 [4,]  1      7
 [5,]  2      6
 [6,]  2      4
 [7,]  3      7
 [8,]  3      8
 [9,]  3      8
[10,]  4      9
[11,]  4      3
[12,]  4      2
[13,]  4      4
[14,]  5      3
[15,]  5      6
[16,]  6      8
[17,]  7      5
[18,]  7      3

What I need to do is find the median Length of each unique number (1,2,3, etc). I can do this individually by using the following code:

one<-median(dummydata[dummydata$ID=="1","Length"])
two<-median(dummydata[dummydata$ID=="2","Length"])
three<-median(dummydata[dummydata$ID=="3","Length"])

However, in every .csv file, there are thousands of ID's, and creating the above code for each number is not feasible. Is there a way for me to find the median Length of each unique ID number for the entire thousands long data set? Ideally I would be able to create a new column with these medians.

I would appreciate any insight into this issue!

Was it helpful?

Solution

have a look at tapply.

for example:

with(as.data.frame(dummydata), tapply(Length,list(ID),median))
#   1   2   3   4   5   6   7 
# 3.5 5.0 8.0 3.5 4.5 8.0 4.0 

OTHER TIPS

A dplyr solution:

library(dplyr)

as.data.frame(dummydata) %.% group_by(ID) %.% summarise(Median = median(Length))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top