have a look at tapply
.
for example:
with(as.data.frame(dummydata), tapply(Length,list(ID),median))
# 1 2 3 4 5 6 7
# 3.5 5.0 8.0 3.5 4.5 8.0 4.0
Question
thanks in advance for the help. I am working with a series of .csv files that contain data in the following format:
ID<-c(1,1,1,1,2,2,3,3,3,4,4,4,4,5,5,6,7,7)
Length<-c(3,3,4,7,6,4,7,8,8,9,3,2,4,3,6,8,5,3)
dummydata<-cbind(ID,Length)
dummydata<-cbind(ID,Length)
> dummydata
ID Length
[1,] 1 3
[2,] 1 3
[3,] 1 4
[4,] 1 7
[5,] 2 6
[6,] 2 4
[7,] 3 7
[8,] 3 8
[9,] 3 8
[10,] 4 9
[11,] 4 3
[12,] 4 2
[13,] 4 4
[14,] 5 3
[15,] 5 6
[16,] 6 8
[17,] 7 5
[18,] 7 3
What I need to do is find the median Length of each unique number (1,2,3, etc). I can do this individually by using the following code:
one<-median(dummydata[dummydata$ID=="1","Length"])
two<-median(dummydata[dummydata$ID=="2","Length"])
three<-median(dummydata[dummydata$ID=="3","Length"])
However, in every .csv file, there are thousands of ID's, and creating the above code for each number is not feasible. Is there a way for me to find the median Length of each unique ID number for the entire thousands long data set? Ideally I would be able to create a new column with these medians.
I would appreciate any insight into this issue!
Solution
have a look at tapply
.
for example:
with(as.data.frame(dummydata), tapply(Length,list(ID),median))
# 1 2 3 4 5 6 7
# 3.5 5.0 8.0 3.5 4.5 8.0 4.0
OTHER TIPS
A dplyr
solution:
library(dplyr)
as.data.frame(dummydata) %.% group_by(ID) %.% summarise(Median = median(Length))