dplyr return only grouping and computed columns

https://stackoverflow.com/questions/22971446

r
dplyr

30-06-2023
|

Question

I'm wondering if there is a way of doing this:

iris %.% group_by(Species) %.% 
  mutate(v1=Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15) %.% select(Species:v1)

While skipping the select bit. I thought the following should work (but doesn't, for many reasons):

iris %.% group_by(Species) %.% 
  select(Species, v1=Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15)

Note in this example I replaced mutate with select in the hopes that alone would do it. This also doesn't work because summarize expects expressions to return 1 value:

iris %.% 
  group_by(Species) %.% 
  summarise(Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15)

Clearly, not a huge deal, but wondering if there is a simpler way of replicating default data.table behavior:

data.table(iris)[, Sepal.Length / mean(Sepal.Length), by=Species][V1 > 1.15]

Which produces just the by columns and the computed value:

      Species       V1
1:     setosa 1.158610
2: versicolor 1.179245
3: versicolor 1.162399
4:  virginica 1.153613
5:  virginica 1.168792
6:  virginica 1.168792
7:  virginica 1.168792
8:  virginica 1.199150
9:  virginica 1.168792

La solution

This can now be simplified with dplyr's new transmute function which drops any columns except for the grouping variable and cumputed variables (V1 in this case).

require(dplyr) # >= 0.3.0.2
iris %>% 
  group_by(Species) %>% 
  transmute(v1 = Sepal.Length / mean(Sepal.Length)) %>% 
  filter(v1 > 1.15)

#Source: local data frame [9 x 2]
#Groups: Species
#
#     Species       v1
#1     setosa 1.158610
#2 versicolor 1.179245
#3 versicolor 1.162399
#4  virginica 1.153613
#5  virginica 1.168792
#6  virginica 1.168792
#7  virginica 1.168792
#8  virginica 1.199150
#9  virginica 1.168792

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow