How can I change a dataframe of factors so that the dataframe can be boxplotted?
Question
I have a dataframe of which the columns contain a variable amount of numbers and a variable amount of NA's. The dataframe looks like this:
V1 V2 V3 V4 V5 V6
1 0 11 4 0 0 10
2 0 17 3 0 2 2
3 NA 0 4 0 1 9
4 NA 12 NA 1 1 0
<snip>
743 NA NA NA NA 8 NA
744 NA NA NA NA 0 NA
I want to make a boxplot out of this, but when I do
boxplot(dataframe)
I get the error
adding class "factor" to an invalid object
When I do
lapply(dataframe,class)
I get the folowing output:
$V1
[1] "factor"
$V2
[1] "factor"
<snip>
$V6
[1] "factor"
So how can I change my dataframe so that the columns are seen as numeric?
Solution
You want to apply as.numeric(as.character(...))
to each factor column. The code below shows how this can be done affecting only the factor variables leaving the numeric types alone.
## dummy data
df <- data.frame(V1 = factor(sample(1:5, 10, rep = TRUE)),
V2 = factor(sample(99:101, 10, rep = TRUE)),
V3 = factor(sample(1:2, 10, rep = TRUE)),
V4 = 1:10)
df2 <- data.frame(sapply(df, function(x) { if(is.factor(x)) {
as.numeric(as.character(x))
} else {
x
}
}))
This gives:
> df2
V1 V2 V3 V4
1 4 101 2 1
2 1 100 1 2
3 5 99 2 3
4 4 99 2 4
5 2 100 1 5
6 2 100 2 6
7 2 101 2 7
8 4 100 1 8
9 2 101 2 9
10 4 101 1 10
> str(df2)
'data.frame': 10 obs. of 4 variables:
$ V1: num 4 1 5 4 2 2 2 4 2 4
$ V2: num 101 100 99 99 100 100 101 100 101 101
$ V3: num 2 1 2 2 1 2 2 1 2 1
$ V4: num 1 2 3 4 5 6 7 8 9 10
OTHER TIPS
How about
as.data.frame(lapply(dat1,function(x){as.numeric(as.character(x))}))
which simply converts each column to numeric (after first converting to character). You have to be careful with this because naive conversion of factors to numeric will generally result in the underlying integer codes, not the values you see displayed.
with a test data.frame:
testframe <- data.frame(V1 = as.factor(c(0,0,NA,NA)), V2 = as.factor(c(11,17,0,12)))
> sapply(testframe, class)
V1 V2
"factor" "factor"
You could use
testframe.n <- as.data.frame(sapply(testframe, as.numeric))
> sapply(testframe.n, class)
V1 V2
"numeric" "numeric"
Now, all columns should be numeric and boxplot can be called.