Question

I have a dataframe (df) of goals scored against various teams by date

gamedate teamID Gls
 1992-08-22  CHL  3
 1992-08-22  MNU  1
 1992-08-23  ARS  0
 1992-08-23  LIV  2
 1992-08-24  MNU  0
 1992-08-25  LIV  2
 1992-08-26  ARS  0
 1992-08-26  CHL  0

I wish to produce a summary table which shows the number of games played and number of games these teams have blanked the opposition on each date

gamedate   games blanks
 1992-08-22   2     0
 1992-08-23   2     1
 1992-08-24   1     1
 1992-08-25   1     0
 1992-08-26   2     2

I can get the games and blanks separately using ddply

df.a <- ddply(df,"gamedate",function(x) c(count=nrow(x)))
df.b <- ddply(subset(df,Gls==0),"gamedate",function(x) c(count=nrow(x)))

and then merger df.a and df.b to get my answer. However, I am sure there must be a more simple and elegant solution

Was it helpful?

Solution

You just need to use summarise:

Read the data in:

   dat <- read.table(textConnection("gamedate teamID Gls
  1992-08-22  CHL  3
  1992-08-22  MNU  1
  1992-08-23  ARS  0
  1992-08-23  LIV  2
  1992-08-24  MNU  0
  1992-08-25  LIV  2
  1992-08-26  ARS  0
  1992-08-26  CHL  0"),sep = "",header = TRUE)

and then call ddply:

ddply(dat,.(gamedate),summarise,tot = length(teamID),blanks = length(which(Gls == 0)))
    gamedate tot blanks
1 1992-08-22   2      0
2 1992-08-23   2      1
3 1992-08-24   1      1
4 1992-08-25   1      0
5 1992-08-26   2      2

OTHER TIPS

The only thing you are missing is wrapping your functions in a data.frame() call and giving them column names... and the column names are optional :)

I'm using @joran's dat data.frame as it allowed me to test my answer.

ddply( dat, "gamedate", function(x) data.frame( 
                                      tot = nrow( x ), 
                                      blanks = nrow( subset(x, Gls == 0 ) ) 
                                              ) 
     )

BTW, my funny formatting above is just to prevent it from scrolling on the screen and to help illustrate how I'm really just bringing together the functions you already created.

Another solution using simple aggregate. I am using joran's dat.

agg <- aggregate(cbind(1, dat$Gls==0), list(dat$gamedate), sum)
names(agg) <- c("gamedate", "games", "blanks")
agg
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top