Question

I'm fairly new to R and I'm encountering a problem in one of my functions. I want to convert three of the columns in a data.frame from character to numeric. They are mostly made up of numbers with a few "Not Available" entries scattered throughout. I am aware of these, and I want them coerced to NAs without seeing the warning, so I'm using the suppressWarnings() function.

Here is my code:

suppressWarnings(class(dataframe[,2]) <- "numeric")
suppressWarnings(class(dataframe[,3]) <- "numeric")
suppressWarnings(class(dataframe[,4]) <- "numeric")
print(apply(dataframe,2,class))

My issue is that the result that gets printed is:

          1           2           3           4 
"character" "character" "character" "character" 

So it doesn't seem to be changing the class! Why is this?

When I do it without suppressing the warnings, like this:

  class(dataframe[,2]) <- "numeric"
  class(dataframe[,3]) <- "numeric"
  class(dataframe[,4]) <- "numeric"
  print(apply(dataframe,2,class))

I get the same output, but with the warning message:

          1           2           3           4 
"character" "character" "character" "character" 
Warning messages:
1: In class(dataframe[, 2]) <- "numeric" : NAs introduced by coercion
2: In class(dataframe[, 3]) <- "numeric" : NAs introduced by coercion
3: In class(dataframe[, 4]) <- "numeric" : NAs introduced by coercion

So it's not the warning suppression that's the problem. It must be the apply() function, but I can't figure out why it would display the classes incorrectly.

Any advice or assistance would be appreciated!

Was it helpful?

Solution

The problem here is apply; see "Details" in ?apply: "If X is not an array [..], apply attempts to coerce it to an array via as.matrix if it is two-dimensional (e.g., a data frame)." Then have a look at "Details in ?as.matrix: "The method for data frames will return a character matrix if there is only atomic columns and any non-(numeric/logical/complex) column". Thus, despite that your conversion to numeric works, the use of apply to 'loop' over columns when checking their class first coerces the data frame to a character matrix.

A small example. First create a toy data frame:

df <- data.frame(x1 = c("a", "b"),
                 x2 = c("Not Available", 2),
                 x3 = c("Not Available", 3),
                 x4 = c(4, "Not available"))

Convert selected columns to numeric as you did in you question, or like this:

df[, 2:4] <- lapply(df[ , 2:4], function(x) as.numeric(x))
str(df)

If the resulting data frame is coerced to a matrix, as apply would do, it is coerced to a character matrix

str(as.matrix(df))
# chr [1:2, 1:4] "a" "b" NA " 2" NA " 3" " 4" NA
# - attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:4] "x1" "x2" "x3" "x4"

Instead of using applyto check the class of your columns, you may try:

sapply(df, class)
#          x1          x2          x3          x4 
# "character"   "numeric"   "numeric"   "numeric" 

str(df)
# 'data.frame':  2 obs. of  4 variables:
# $ x1: chr  "a" "b"
# $ x2: num  NA 2
# $ x3: num  NA 3
# $ x4: num  4 NA
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top