Pergunta

I have a problem using data.table: How do I convert column classes? Here is a simple example: With data.frame I don't have a problem converting it, with data.table I just don't know how:

df <- data.frame(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
#One way: http://stackoverflow.com/questions/2851015/r-convert-data-frame-columns-from-factors-to-characters
df <- data.frame(lapply(df, as.character), stringsAsFactors=FALSE)
#Another way
df[, "value"] <- as.numeric(df[, "value"])

library(data.table)
dt <- data.table(ID=c(rep("A", 5), rep("B",5)), Quarter=c(1:5, 1:5), value=rnorm(10))
dt <- data.table(lapply(dt, as.character), stringsAsFactors=FALSE) 
#Error in rep("", ncol(xi)) : invalid 'times' argument
#Produces error, does data.table not have the option stringsAsFactors?
dt[, "ID", with=FALSE] <- as.character(dt[, "ID", with=FALSE]) 
#Produces error: Error in `[<-.data.table`(`*tmp*`, , "ID", with = FALSE, value = "c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)") : 
#unused argument(s) (with = FALSE)

Do I miss something obvious here?

Update due to Matthew's post: I used an older version before, but even after updating to 1.6.6 (the version I use now) I still get an error.

Update 2: Let's say I want to convert every column of class "factor" to a "character" column, but don't know in advance which column is of which class. With a data.frame, I can do the following:

classes <- as.character(sapply(df, class))
colClasses <- which(classes=="factor")
df[, colClasses] <- sapply(df[, colClasses], as.character)

Can I do something similar with data.table?

Update 3:

sessionInfo() R version 2.13.1 (2011-07-08) Platform: x86_64-pc-mingw32/x64 (64-bit)

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] data.table_1.6.6

loaded via a namespace (and not attached):
[1] tools_2.13.1
Foi útil?

Solução

For a single column:

dtnew <- dt[, Quarter:=as.character(Quarter)]
str(dtnew)

Classes ‘data.table’ and 'data.frame':  10 obs. of  3 variables:
 $ ID     : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
 $ Quarter: chr  "1" "2" "3" "4" ...
 $ value  : num  -0.838 0.146 -1.059 -1.197 0.282 ...

Using lapply and as.character:

dtnew <- dt[, lapply(.SD, as.character), by=ID]
str(dtnew)

Classes ‘data.table’ and 'data.frame':  10 obs. of  3 variables:
 $ ID     : Factor w/ 2 levels "A","B": 1 1 1 1 1 2 2 2 2 2
 $ Quarter: chr  "1" "2" "3" "4" ...
 $ value  : chr  "1.487145280568" "-0.827845218358881" "0.028977182770002" "1.35392750102305" ...

Outras dicas

Try this

DT <- data.table(X1 = c("a", "b"), X2 = c(1,2), X3 = c("hello", "you"))
changeCols <- colnames(DT)[which(as.vector(DT[,lapply(.SD, class)]) == "character")]

DT[,(changeCols):= lapply(.SD, as.factor), .SDcols = changeCols]

Raising Matt Dowle's comment to Geneorama's answer (https://stackoverflow.com/a/20808945/4241780) to make it more obvious (as encouraged).

for (col in names_factors) 
  set(dt, j=col, value=as.factor(dt[[col]]))

Also, noted in another of Matt's comment see https://stackoverflow.com/a/33000778/4241780 for more info.

This is a BAD way to do it! I'm only leaving this answer in case it solves other weird problems. These better methods are the probably partly the result of newer data.table versions... so it's worth while to document this hard way. Plus, this is a nice syntax example for eval substitute syntax.

library(data.table)
dt <- data.table(ID = c(rep("A", 5), rep("B",5)), 
                 fac1 = c(1:5, 1:5), 
                 fac2 = c(1:5, 1:5) * 2, 
                 val1 = rnorm(10),
                 val2 = rnorm(10))

names_factors = c('fac1', 'fac2')
names_values = c('val1', 'val2')

for (col in names_factors){
  e = substitute(X := as.factor(X), list(X = as.symbol(col)))
  dt[ , eval(e)]
}
for (col in names_values){
  e = substitute(X := as.numeric(X), list(X = as.symbol(col)))
  dt[ , eval(e)]
}

str(dt)

which gives you

Classes ‘data.table’ and 'data.frame':  10 obs. of  5 variables:
 $ ID  : chr  "A" "A" "A" "A" ...
 $ fac1: Factor w/ 5 levels "1","2","3","4",..: 1 2 3 4 5 1 2 3 4 5
 $ fac2: Factor w/ 5 levels "2","4","6","8",..: 1 2 3 4 5 1 2 3 4 5
 $ val1: num  0.0459 2.0113 0.5186 -0.8348 -0.2185 ...
 $ val2: num  -0.0688 0.6544 0.267 -0.1322 -0.4893 ...
 - attr(*, ".internal.selfref")=<externalptr> 

I tried several approaches.

# BY {dplyr}
data.table(ID      = c(rep("A", 5), rep("B",5)), 
           Quarter = c(1:5, 1:5), 
           value   = rnorm(10)) -> df1
df1 %<>% dplyr::mutate(ID      = as.factor(ID),
                       Quarter = as.character(Quarter))
# check classes
dplyr::glimpse(df1)
# Observations: 10
# Variables: 3
# $ ID      (fctr) A, A, A, A, A, B, B, B, B, B
# $ Quarter (chr) "1", "2", "3", "4", "5", "1", "2", "3", "4", "5"
# $ value   (dbl) -0.07676732, 0.25376110, 2.47192852, 0.84929175, -0.13567312,  -0.94224435, 0.80213218, -0.89652819...

, or otherwise

# from list to data.table using data.table::setDT
list(ID      = as.factor(c(rep("A", 5), rep("B",5))), 
     Quarter = as.character(c(1:5, 1:5)), 
     value   = rnorm(10)) %>% setDT(list.df) -> df2
class(df2)
# [1] "data.table" "data.frame"

I provide a more general and safer way to do this stuff,

".." <- function (x) 
{
  stopifnot(inherits(x, "character"))
  stopifnot(length(x) == 1)
  get(x, parent.frame(4))
}


set_colclass <- function(x, class){
  stopifnot(all(class %in% c("integer", "numeric", "double","factor","character")))
  for(i in intersect(names(class), names(x))){
    f <- get(paste0("as.", class[i]))
    x[, (..("i")):=..("f")(get(..("i")))]
  }
  invisible(x)
}

The function .. makes sure we get a variable out of the scope of data.table; set_colclass will set the classes of your cols. You can use it like this:

dt <- data.table(i=1:3,f=3:1)
set_colclass(dt, c(i="character"))
class(dt$i)

If you have a list of column names in data.table, you want to change the class of do:

convert_to_character <- c("Quarter", "value")

dt[, convert_to_character] <- dt[, lapply(.SD, as.character), .SDcols = convert_to_character]

try:

dt <- data.table(A = c(1:5), 
                 B= c(11:15))

x <- ncol(dt)

for(i in 1:x) 
{
     dt[[i]] <- as.character(dt[[i]])
}
Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top