R. summarizing data without merge
-
23-02-2021 - |
Question
I have a dataframe (df) of goals scored against various teams by date
gamedate teamID Gls
1992-08-22 CHL 3
1992-08-22 MNU 1
1992-08-23 ARS 0
1992-08-23 LIV 2
1992-08-24 MNU 0
1992-08-25 LIV 2
1992-08-26 ARS 0
1992-08-26 CHL 0
I wish to produce a summary table which shows the number of games played and number of games these teams have blanked the opposition on each date
gamedate games blanks
1992-08-22 2 0
1992-08-23 2 1
1992-08-24 1 1
1992-08-25 1 0
1992-08-26 2 2
I can get the games and blanks separately using ddply
df.a <- ddply(df,"gamedate",function(x) c(count=nrow(x)))
df.b <- ddply(subset(df,Gls==0),"gamedate",function(x) c(count=nrow(x)))
and then merger df.a and df.b to get my answer. However, I am sure there must be a more simple and elegant solution
Solution
You just need to use summarise
:
Read the data in:
dat <- read.table(textConnection("gamedate teamID Gls
1992-08-22 CHL 3
1992-08-22 MNU 1
1992-08-23 ARS 0
1992-08-23 LIV 2
1992-08-24 MNU 0
1992-08-25 LIV 2
1992-08-26 ARS 0
1992-08-26 CHL 0"),sep = "",header = TRUE)
and then call ddply
:
ddply(dat,.(gamedate),summarise,tot = length(teamID),blanks = length(which(Gls == 0)))
gamedate tot blanks
1 1992-08-22 2 0
2 1992-08-23 2 1
3 1992-08-24 1 1
4 1992-08-25 1 0
5 1992-08-26 2 2
OTHER TIPS
The only thing you are missing is wrapping your functions in a data.frame()
call and giving them column names... and the column names are optional :)
I'm using @joran's dat data.frame as it allowed me to test my answer.
ddply( dat, "gamedate", function(x) data.frame(
tot = nrow( x ),
blanks = nrow( subset(x, Gls == 0 ) )
)
)
BTW, my funny formatting above is just to prevent it from scrolling on the screen and to help illustrate how I'm really just bringing together the functions you already created.
Another solution using simple aggregate
. I am using joran's dat
.
agg <- aggregate(cbind(1, dat$Gls==0), list(dat$gamedate), sum)
names(agg) <- c("gamedate", "games", "blanks")
agg