In your chained sequence of dplyr
operations, the summarise
call will produce two columns: the grouping variable and the result of the summary function.
df %.%
group_by(userId) %.%
summarise(
one = max(playCount))
# Source: local data frame [5 x 2]
#
# userId one
# 1 A 85
# 2 B 84
# 3 C 18
# 4 D 72
# 5 E 65
When you then try to select
the songID variable from the data frame generated by summarise
, the songID variable is not found.
df %.%
group_by(userId) %.%
summarise(
one = max(playCount)) %.%
select(userId, songId, playCount)
# Error in eval(expr, envir, enclos) : object 'songId' not found
A more suitable dplyr
function in this case is filter
. Here we select rows where the condition playCount == max(playCount)
is TRUE
within each group.
df %.%
group_by(userId) %.%
filter(
playCount == max(playCount))
# Source: local data frame [5 x 3]
# Groups: userId
#
# userId songId playCount
# 1 A 568r 85
# 2 C 34n 18
# 3 E 454j 65
# 4 D 663a 72
# 5 B 35d 84
You find several nice dplyr examples here.