Is something like this you are looking for?
library(plyr)
test5$A <- gsub('[0-9]+', '', test5$A)
ddply(test5, .(A), summarise, mean=mean(B, na.rm = T), sd = sd(B, na.rm = T))
A mean sd
1 JCT 4.000000 1.000000
2 LFR 5.333333 2.081666
Domanda
I am using plyr to calculate means and standard deviations in r. However, my grouping variable contains a combination of letters and numbers, so I need to either use some kind of wildcard in my grouping variable, or create a new grouping variable by removing the numbers from the original grouping variable. For example, with the following dataframe:
test5 <- structure(list(A = structure(1:6, .Label = c("JCT1", "JCT2",
"JCT3", "LFR1", "LFR2", "LFR3"), class = "factor"), B = c(4L,
5L, 3L, 7L, 3L, 6L), C = structure(c(1L, 1L, 1L, 2L, 2L, 2L), .Label = c("JCT",
"LFR"), class = "factor")), .Names = c("A", "B", "C"), class = "data.frame", row.names = c(NA,
-6L))
A B C
1 JCT1 4 JCT
2 JCT2 5 JCT
3 JCT3 3 JCT
4 LFR1 7 LFR
5 LFR2 3 LFR
6 LFR3 6 LFR
I can use the following code to calculate means and sd:
library(plyr)
ddply(test5,~A,summarise,mean=mean(B),sd=sd(B))
which gives a result like
A mean sd
1 JCT1 4 NA
2 JCT2 5 NA
3 JCT3 3 NA
4 LFR1 7 NA
5 LFR2 3 NA
6 LFR3 6 NA
However, I really need the groups to be JCT
and LFR
, so need to either 1) use a wildcard in the code (so groups are based on JCT
and LFR
, with the number being the wildcard), or 2) create a new column like C
in my original dataframe that has removed the numbers from column A
. So for example, if I could create this new column C
then I could use the code
ddply(test5,~C,summarise,mean=mean(B),sd=sd(B))
to produce my desired result of
C mean sd
1 JCT 4.000000 1.000000
2 LFR 5.333333 2.081666
Does anyone know of an easy way to do this? I thought I could use ifelse statements to somehow create a new column C
, but this would require a lot of code as I have many different values in my real dataframe. I am hoping there is a quicker way.
Thanks!
Soluzione
Is something like this you are looking for?
library(plyr)
test5$A <- gsub('[0-9]+', '', test5$A)
ddply(test5, .(A), summarise, mean=mean(B, na.rm = T), sd = sd(B, na.rm = T))
A mean sd
1 JCT 4.000000 1.000000
2 LFR 5.333333 2.081666
Altri suggerimenti
You could use regmatches
and regexpr
, to extract the letters and then summarize based on that
> ddply(test5,.(letter=regmatches(A,regexpr("[A-Za-z]*",A))),
summarise,mean=mean(B),sd=sd(B))
letter mean sd
1 JCT 4.000000 1.000000
2 LFR 5.333333 2.081666