Question

i have a data.frame consisting of character columns and numeric columns. Now I would like to calculate the mean of the numeric columns and append the results to the end of the dataframe.

class1  1    2    5
class2  2    3    6
class3  2    3    2

to

class1  1    2    1
class2  2    3    6
class3  2    3    2
mean    1.6  2.6  3

I tried so with colMeans, but this conflicts with the character column and I get the following error:

Error in colMeans(data, na.rm = FALSE) : 'x' must be numeric

I also tried to restrict colMeans to parts of the data.frame with data[2:4], but then I struggle to append the string, as it doesn't have the same length as the original data.frame.

Thanks for your help.

Was it helpful?

Solution 2

I agree with the above comment that sticking them at the end of your data frame doesn't seem like a good idea.

Anyway, you could take this opportunity to expand your R-pertoire with rapply

str(iris)
# 'data.frame':  150 obs. of  5 variables:
#   $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
# $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
# $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
# $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
# $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

summary(iris)
# Sepal.Length    Sepal.Width     Petal.Length    Petal.Width          Species  
# Min.   :4.300   Min.   :2.000   Min.   :1.000   Min.   :0.100   setosa    :50  
# 1st Qu.:5.100   1st Qu.:2.800   1st Qu.:1.600   1st Qu.:0.300   versicolor:50  
# Median :5.800   Median :3.000   Median :4.350   Median :1.300   virginica :50  
# Mean   :5.843   Mean   :3.057   Mean   :3.758   Mean   :1.199                  
# 3rd Qu.:6.400   3rd Qu.:3.300   3rd Qu.:5.100   3rd Qu.:1.800                  
# Max.   :7.900   Max.   :4.400   Max.   :6.900   Max.   :2.500                  

rapply(iris, mean, classes = c('numeric','integer'))
# Sepal.Length  Sepal.Width Petal.Length  Petal.Width 
# 5.843333     3.057333     3.758000     1.199333 

But if you had to join them, you could do

tmp <- rapply(iris, mean, classes = c('numeric','integer'))
rbind(iris, tmp[match(names(iris), names(tmp))])

tail(rbind(iris, tmp[match(names(iris), names(tmp))]), 5)
#     Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
# 147     6.300000    2.500000        5.000    1.900000 virginica
# 148     6.500000    3.000000        5.200    2.000000 virginica
# 149     6.200000    3.400000        5.400    2.300000 virginica
# 150     5.900000    3.000000        5.100    1.800000 virginica
# 151     5.843333    3.057333        3.758    1.199333      <NA>

I regret coining R-pertoire already

OTHER TIPS

You can try this, if you wanted to stick with your colMeans attempt so far:

new <- rbind(mydf, c(V1 = "mean", as.list(colMeans(mydf[2:4]))))
new
#       V1       V2       V3       V4
# 1 class1 1.000000 2.000000 5.000000
# 2 class2 2.000000 3.000000 6.000000
# 3 class3 2.000000 3.000000 2.000000
# 4   mean 1.666667 2.666667 4.333333
str(new)
# 'data.frame':  4 obs. of  4 variables:
#  $ V1: chr  "class1" "class2" "class3" "mean"
#  $ V2: num  1 2 2 1.67
#  $ V3: num  2 3 3 2.67
#  $ V4: num  5 6 2 4.33

Depending on how you created your data, you may need to convert "V1" to character first:

mydf$V1 <- as.character(mydf$V1)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top