by() giving error when applying mean function over a data frame. What's happening?

StackOverflow https://stackoverflow.com/questions/21099800

  •  27-09-2022
  •  | 
  •  

Question

I am trying to learn by() in R(3.0.1) . This is what I am doing.

  1. Open R
  2. attach(iris)
  3. head(iris)
  4. by(iris[,1:4] , Species , mean)

This is what I am getting

> by(iris[,1:4] , Species , mean)

Species: setosa
[1] NA
------------------------------------------------------------ 
Species: versicolor
[1] NA
------------------------------------------------------------ 
Species: virginica
[1] NA
Warning messages:
1: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA

2: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA

3: In mean.default(data[x, , drop = FALSE], ...) :
  argument is not numeric or logical: returning NA
Was it helpful?

Solution

The problem here is that the function you are applying doesn't work on a data frame. In effect you are calling something like this

R> mean(iris[iris$Species == "setosa", 1:4])
[1] NA
Warning message:
In mean.default(iris[iris$Species == "setosa", 1:4]) :
  argument is not numeric or logical: returning NA

i.e. you are passing a data frame of 4 columns, containing the rows of the original where Species == "setosa".

For by() you need to do this variable by variable, as in

R> by(iris[,1] , iris$Species , mean)
iris$Species: setosa
[1] 5.006
------------------------------------------------------------ 
iris$Species: versicolor
[1] 5.936
------------------------------------------------------------ 
iris$Species: virginica
[1] 6.588

Or use colMeans() instead of mean() as the FUN applied

R> by(iris[,1:4] , iris$Species , colMeans)
iris$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.006        3.428        1.462        0.246 
------------------------------------------------------------ 
iris$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.936        2.770        4.260        1.326 
------------------------------------------------------------ 
iris$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       6.588        2.974        5.552        2.026

If a canned function like colMeans() doesn't exist, then you can always write a wrapper, to sapply() eg

foo <- function(x, ...) sapply(x, mean, ...)
by(iris[, 1:4], iris$Species, foo)

R> by(iris[, 1:4], iris$Species, foo)
iris$Species: setosa
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.006        3.428        1.462        0.246 
------------------------------------------------------------ 
iris$Species: versicolor
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       5.936        2.770        4.260        1.326 
------------------------------------------------------------ 
iris$Species: virginica
Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
       6.588        2.974        5.552        2.026

You might find aggregate() more appealing:

R> with(iris, aggregate(iris[,1:4], list(Species = Species), FUN = mean))
     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa        5.006       3.428        1.462       0.246
2 versicolor        5.936       2.770        4.260       1.326
3  virginica        6.588       2.974        5.552       2.026

Notice how I use with() to access Species directly; this is much better than attaching() iris if you don't want to index via iris$Species.

OTHER TIPS

Here is one more solution which combines "split" and "sapply". The result is the same but transposed. Maybe this is preferable when statistics for many variables are shown because they are listed vertically.

sapply(split(iris, iris[, 5]), function(x) colMeans(x[,c(1:4)]))

                setosa versicolor virginica
   Sepal.Length  5.006      5.936     6.588
   Sepal.Width   3.428      2.770     2.974
   Petal.Length  1.462      4.260     5.552
   Petal.Width   0.246      1.326     2.026
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top