Вопрос

I'm working on a generic code that loops through every column in a data frame and converts the column as a "factor" class if the number of unique values in that column are less than, say 32. My current progress is:

dfr <- data.frame(x<-floor(runif(40,0,5)), y<-rnorm(40))

colnames(dfr)<-c('y','z')

In this example, I want variable 'y' to be converted into a factor variable. So I do:

sapply(dfr, function(x) ifelse(length(unique(x)) <= 32, x <- as.factor(x), x))

But, after doing this I'm unable to convert the class for 'y'

sapply(dfr, class)

 y        z 

"numeric" "numeric"

Can anyone give guidance as to where I'm going wrong. I didn't imagine doing this action to be this onerous.

Thanks in advance.

Это было полезно?

Решение

  1. ifelse will return a vector the same length as the test (not what you want), use if(){}else{} instead

  2. You have more than 40 unique values in y , so your function will not coerce it to factor.

  3. sapply will coerce the results to a matrix, which will force all variables to be the same "class"

What you want to do is use lapply, and then replace the contents of the original.

dfr[] <- lapply(dfr, function(x) if(length(unique(x)) <=32) { as.factor(x)} else{x})
# It works!
str(dfr)
# 'data.frame': 40 obs. of  2 variables:
#  $ y: Factor w/ 5 levels "0","1","2","3",..: 2 1 2 1 5 3 5 1 5 1 ...
#  $ z: num  0.9036 0.2909 -0.9027 -0.4588 -0.0495 ...
Лицензировано под: CC-BY-SA с атрибуция
Не связан с StackOverflow
scroll top