Question

I have a dataframe of which the columns contain a variable amount of numbers and a variable amount of NA's. The dataframe looks like this:

    V1 V2 V3 V4 V5 V6
1    0 11  4  0  0 10
2    0 17  3  0  2  2
3   NA  0  4  0  1  9
4   NA 12 NA  1  1  0
<snip>
743 NA NA NA NA  8 NA
744 NA NA NA NA  0 NA

I want to make a boxplot out of this, but when I do

boxplot(dataframe)

I get the error

adding class "factor" to an invalid object

When I do

lapply(dataframe,class)

I get the folowing output:

$V1
[1] "factor"
$V2
[1] "factor"
<snip>
$V6
[1] "factor"

So how can I change my dataframe so that the columns are seen as numeric?

Was it helpful?

Solution

You want to apply as.numeric(as.character(...)) to each factor column. The code below shows how this can be done affecting only the factor variables leaving the numeric types alone.

## dummy data
df <- data.frame(V1 = factor(sample(1:5, 10, rep = TRUE)),
                 V2 = factor(sample(99:101, 10, rep = TRUE)),
                 V3 = factor(sample(1:2, 10, rep = TRUE)),
                 V4 = 1:10)

df2 <- data.frame(sapply(df, function(x) { if(is.factor(x)) {
                                              as.numeric(as.character(x))
                                           } else {
                                              x
                                           }
                                         }))

This gives:

> df2
   V1  V2 V3 V4
1   4 101  2  1
2   1 100  1  2
3   5  99  2  3
4   4  99  2  4
5   2 100  1  5
6   2 100  2  6
7   2 101  2  7
8   4 100  1  8
9   2 101  2  9
10  4 101  1 10
> str(df2)
'data.frame':   10 obs. of  4 variables:
 $ V1: num  4 1 5 4 2 2 2 4 2 4
 $ V2: num  101 100 99 99 100 100 101 100 101 101
 $ V3: num  2 1 2 2 1 2 2 1 2 1
 $ V4: num  1 2 3 4 5 6 7 8 9 10

OTHER TIPS

How about

as.data.frame(lapply(dat1,function(x){as.numeric(as.character(x))}))

which simply converts each column to numeric (after first converting to character). You have to be careful with this because naive conversion of factors to numeric will generally result in the underlying integer codes, not the values you see displayed.

with a test data.frame:

testframe <- data.frame(V1 = as.factor(c(0,0,NA,NA)), V2 = as.factor(c(11,17,0,12)))

> sapply(testframe, class)
      V1       V2 
"factor" "factor" 

You could use

testframe.n <- as.data.frame(sapply(testframe, as.numeric))

> sapply(testframe.n, class)
       V1        V2 
"numeric" "numeric" 

Now, all columns should be numeric and boxplot can be called.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top