dplyr return only grouping and computed columns

https://stackoverflow.com/questions/22971446

r
dplyr

30-06-2023
|

Pergunta

I'm wondering if there is a way of doing this:

iris %.% group_by(Species) %.% 
  mutate(v1=Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15) %.% select(Species:v1)

While skipping the select bit. I thought the following should work (but doesn't, for many reasons):

iris %.% group_by(Species) %.% 
  select(Species, v1=Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15)

Note in this example I replaced mutate with select in the hopes that alone would do it. This also doesn't work because summarize expects expressions to return 1 value:

iris %.% 
  group_by(Species) %.% 
  summarise(Sepal.Length / mean(Sepal.Length)) %.% 
  filter(v1 > 1.15)

Clearly, not a huge deal, but wondering if there is a simpler way of replicating default data.table behavior:

data.table(iris)[, Sepal.Length / mean(Sepal.Length), by=Species][V1 > 1.15]

Which produces just the by columns and the computed value:

      Species       V1
1:     setosa 1.158610
2: versicolor 1.179245
3: versicolor 1.162399
4:  virginica 1.153613
5:  virginica 1.168792
6:  virginica 1.168792
7:  virginica 1.168792
8:  virginica 1.199150
9:  virginica 1.168792

Solução

This can now be simplified with dplyr's new transmute function which drops any columns except for the grouping variable and cumputed variables (V1 in this case).

require(dplyr) # >= 0.3.0.2
iris %>% 
  group_by(Species) %>% 
  transmute(v1 = Sepal.Length / mean(Sepal.Length)) %>% 
  filter(v1 > 1.15)

#Source: local data frame [9 x 2]
#Groups: Species
#
#     Species       v1
#1     setosa 1.158610
#2 versicolor 1.179245
#3 versicolor 1.162399
#4  virginica 1.153613
#5  virginica 1.168792
#6  virginica 1.168792
#7  virginica 1.168792
#8  virginica 1.199150
#9  virginica 1.168792

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow