Question

I'm wondering if there is a way of doing this:

iris %.% group_by(Species) %.% 
  mutate(v1=Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15) %.% select(Species:v1)

While skipping the select bit. I thought the following should work (but doesn't, for many reasons):

iris %.% group_by(Species) %.% 
  select(Species, v1=Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15)

Note in this example I replaced mutate with select in the hopes that alone would do it. This also doesn't work because summarize expects expressions to return 1 value:

iris %.% 
  group_by(Species) %.% 
  summarise(Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15)

Clearly, not a huge deal, but wondering if there is a simpler way of replicating default data.table behavior:

data.table(iris)[, Sepal.Length / mean(Sepal.Length), by=Species][V1 > 1.15]

Which produces just the by columns and the computed value:

      Species       V1
1:     setosa 1.158610
2: versicolor 1.179245
3: versicolor 1.162399
4:  virginica 1.153613
5:  virginica 1.168792
6:  virginica 1.168792
7:  virginica 1.168792
8:  virginica 1.199150
9:  virginica 1.168792
Was it helpful?

Solution

This can now be simplified with dplyr's new transmute function which drops any columns except for the grouping variable and cumputed variables (V1 in this case).

require(dplyr) # >= 0.3.0.2
iris %>% 
  group_by(Species) %>% 
  transmute(v1 = Sepal.Length / mean(Sepal.Length)) %>% 
  filter(v1 > 1.15)

#Source: local data frame [9 x 2]
#Groups: Species
#
#     Species       v1
#1     setosa 1.158610
#2 versicolor 1.179245
#3 versicolor 1.162399
#4  virginica 1.153613
#5  virginica 1.168792
#6  virginica 1.168792
#7  virginica 1.168792
#8  virginica 1.199150
#9  virginica 1.168792
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top