Question

I have the following matrix - example:

        col1      col2      col3
S01LA   "0.0143"  "0.1286"  "---"                          
N01AX "0.0088"    "---"     "0.343"                         
N05AG "0.0927"    "0.8692"  "---"                             

And I want to get the average of each row. I tried doing this by changing the "---" values to NA and then using colSums

example[example=='---'] <- NA
row_means <- rowMeans(as.numeric(example), na.rm=TRUE)

which gives me the error

Error in colSums(as.numeric(copy_specificity_df), na.rm = TRUE) : 
   'x' must be an array of at least two dimensions 

Because as.numeric flattens the dataframe. How can I get the average of all the rows in a dataframe, ignoring elements that can not be converted to floats?

Was it helpful?

Solution 2

The display of your "example" object and the attempts you've made indicate to me that even though you are calling your object a data.frame, it's actually a matrix.

My hints that you're actually using a matrix?

  1. data.frames don't generally print the quotes around strings.
  2. as.numeric(some_data_frame) will give you an error about coercing a list to double.

With that, here's some example data:

example <- structure(c("0.0143", "0.0088", "0.0927", "0.1286", 
                 "---", "0.8692", "---", "0.343", "---"), 
               .Dim = c(3L, 3L), 
               .Dimnames = list(c("S01LA", "N01AX", "N05AG"), 
                                c("col1", "col2", "col3")))
example
#       col1     col2     col3   
# S01LA "0.0143" "0.1286" "---"  
# N01AX "0.0088" "---"    "0.343"
# N05AG "0.0927" "0.8692" "---"  

Here's an approach you can take if that's the case.

example[example == "---"] <- NA   ## Replace "---" with `NA`
N <- as.numeric(example)          ## Convert to numeric. You can start here
dim(N) <- dim(example)            ## Restore the dimensions
dimnames(N) <- dimnames(example)  ## Restore the dimnames
colMeans(N, na.rm=TRUE)           ## Perform your calculation
#   col1   col2   col3 
# 0.0386 0.4989 0.3430 

Note: You can actually skip the first line, but you'll get a warning.

OTHER TIPS

If you know beforehand what the NA values look like in the raw data, you can use na.strings in read.table. This effectively reads your data as three numeric columns. Make friends with the args.

> dat <- read.table(text = 'col1      col2      col3
  S01LA   "0.0143"  "0.1286"  "---"                          
  N01AX "0.0088"    "---"     "0.343"                         
  N05AG "0.0927"    "0.8692"  "---"', na.strings = "---")
> dat
#         col1   col2  col3
# S01LA 0.0143 0.1286    NA
# N01AX 0.0088     NA 0.343
# N05AG 0.0927 0.8692    NA
> colSums(dat, na.rm = TRUE)
##   col1   col2   col3 
## 0.1158 0.9978 0.3430 
> rowMeans(dat, na.rm = TRUE)
##   S01LA   N01AX   N05AG 
## 0.07145 0.17590 0.48095

Here's one way.

dat <- read.table(text = 'col1      col2      col3
S01LA   "0.0143"  "0.1286"  "---"                          
N01AX "0.0088"    "---"     "0.343"                         
N05AG "0.0927"    "0.8692"  "---"')

First transform the factors to numeric values (you can ignore the warning messages):

dat[] <- lapply(dat, function(x) if (is.factor(x)) as.numeric(as.character(x)) 
                                 else as.numeric(x))

#         col1   col2  col3
# S01LA 0.0143 0.1286    NA
# N01AX 0.0088     NA 0.343
# N05AG 0.0927 0.8692    NA

Second, apply colsums

colSums(dat, na.rm = TRUE)
#   col1   col2   col3 
# 0.1158 0.9978 0.3430 
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top