Pergunta

I'm wondering if there is a way of doing this:

iris %.% group_by(Species) %.% 
  mutate(v1=Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15) %.% select(Species:v1)

While skipping the select bit. I thought the following should work (but doesn't, for many reasons):

iris %.% group_by(Species) %.% 
  select(Species, v1=Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15)

Note in this example I replaced mutate with select in the hopes that alone would do it. This also doesn't work because summarize expects expressions to return 1 value:

iris %.% 
  group_by(Species) %.% 
  summarise(Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15)

Clearly, not a huge deal, but wondering if there is a simpler way of replicating default data.table behavior:

data.table(iris)[, Sepal.Length / mean(Sepal.Length), by=Species][V1 > 1.15]

Which produces just the by columns and the computed value:

      Species       V1
1:     setosa 1.158610
2: versicolor 1.179245
3: versicolor 1.162399
4:  virginica 1.153613
5:  virginica 1.168792
6:  virginica 1.168792
7:  virginica 1.168792
8:  virginica 1.199150
9:  virginica 1.168792
Foi útil?

Solução

This can now be simplified with dplyr's new transmute function which drops any columns except for the grouping variable and cumputed variables (V1 in this case).

require(dplyr) # >= 0.3.0.2
iris %>% 
  group_by(Species) %>% 
  transmute(v1 = Sepal.Length / mean(Sepal.Length)) %>% 
  filter(v1 > 1.15)

#Source: local data frame [9 x 2]
#Groups: Species
#
#     Species       v1
#1     setosa 1.158610
#2 versicolor 1.179245
#3 versicolor 1.162399
#4  virginica 1.153613
#5  virginica 1.168792
#6  virginica 1.168792
#7  virginica 1.168792
#8  virginica 1.199150
#9  virginica 1.168792
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top