In a way, the problem is with apply
, but more appropriately, the problem is with as.matrix
, and how it is handling logical
values.
Here are a few examples to help elaborate on the query I had for Karl.
First, let's create four data.frame
s to do some tests on.
- Your original
data.frame
to demonstrate the behavior: - A
data.frame
with varying number of characters in the "test" column to look into Karl's explanation of what's going on. - A
data.frame
with some numbers to help us start to understand what actually seems to be going on. - A
data.frame
where your "logi" column is explicitly createdas.character
.
df1 <- data.frame(test = c("a","b","<",">"),
logi = c(TRUE,FALSE,FALSE,TRUE))
df2 <- data.frame(test = c("aa","b","<",">>"),
logi = c(TRUE,FALSE,FALSE,TRUE))
df3 <- data.frame(test = c("aa","b","<",">>"),
logi = c(TRUE,FALSE,FALSE,TRUE),
num = c(1, 12, 123, 2))
df4 <- data.frame(test = c("aa","b","<",">>"),
logi = as.character(c(TRUE,FALSE,FALSE,TRUE)))
Now, let's use as.matrix
on each of them.
This has a space before TRUE
.
as.matrix(df1)
# test logi
# [1,] "a" " TRUE"
# [2,] "b" "FALSE"
# [3,] "<" "FALSE"
# [4,] ">" " TRUE"
This has a space before TRUE
, but the "test" column remains unaffected. Hmm.
as.matrix(df2)
# test logi
# [1,] "aa" " TRUE"
# [2,] "b" "FALSE"
# [3,] "<" "FALSE"
# [4,] ">>" " TRUE"
Ahh... This has a space before TRUE
and spaces before shorter numbers. So it seems that perhaps R is considering the numeric underlying value of TRUE
and FALSE
, but calculating the width of the number of characters in TRUE
and FALSE
. Again, the first "test" column remains unaffected.
as.matrix(df3)
# test logi num
# [1,] "aa" " TRUE" " 1"
# [2,] "b" "FALSE" " 12"
# [3,] "<" "FALSE" "123"
# [4,] ">>" " TRUE" " 2"
Things seem fine here, if you tell R that the logi
column is a character column.
as.matrix(df4)
# test logi
# [1,] "aa" "TRUE"
# [2,] "b" "FALSE"
# [3,] "<" "FALSE"
# [4,] ">>" "TRUE"
For what it's worth, sapply
doesn't seem to have that problem.
sapply(df1, as.matrix)
# test logi
# [1,] "a" "TRUE"
# [2,] "b" "FALSE"
# [3,] "<" "FALSE"
# [4,] ">" "TRUE"
Update
In the R Public chat room, Joshua Ulrich points to format
being the culprit. as.matrix
uses as.vector
for factors, which converts them to character (try str(as.vector(df1$test))
to see what I mean; for everything else, it uses format
, but unfortunately, doesn't have an option to include any of the arguments from format
, one of which is trim
(which is by default set to FALSE
).
Compare the following:
A <- c(TRUE, FALSE)
format(A)
# [1] " TRUE" "FALSE"
format(A, trim = TRUE)
# [1] "TRUE" "FALSE"
format(as.character(A))
# [1] "TRUE " "FALSE"
format(as.factor(A))
# [1] "TRUE " "FALSE"
So, how to sort of easily convert logical columns to character? Maybe something like this (though I would suggest creating a backup of your data first):
df1[sapply(df1, is.logical)] <- lapply(df1[sapply(df1, is.logical)], as.character)
df1
# test logi
# 1 a TRUE
# 2 b FALSE
# 3 < FALSE
# 4 > TRUE
as.matrix(df1)
# test logi
# [1,] "a" "TRUE"
# [2,] "b" "FALSE"
# [3,] "<" "FALSE"
# [4,] ">" "TRUE"