Question

I have a data frame of mostly numeric columns, each with few unique elements. Those with 20 or fewer unique values, I'd like to convert to factors as is, those with more, I'd like to convert to factors using gtools::quantcut.

What am I not understanding about the behavior of ifelse within lapply?

d <- data.frame(a = sample(1:10, 100, replace=T), 
                                b = sample(1:10, 100 ,replace=T), 
                                c = sample(1:30, 100 ,replace=T),
                                d = sample(1:30, 100 ,replace=T),
                                e = sample(1:30, 100 ,replace=T))

wrong <- as.data.frame(lapply(d[,sapply(d, is.numeric)],
function(x) ifelse(length(unique(x)) <=20, 
                   as.factor(x),
                   quantcut(x))))
dim(wrong)
# [1]  1 5
right <- as.data.frame(lapply(d[, sapply(d, is.numeric)],
                       function(x) { 
                           if(length(unique(x)) <= 20) {
                           return(as.factor(x))
                           }
                           quantcut(x)
                           }))
dim(right)
# [1] 100    5
Was it helpful?

Solution

The problem is that you are asking ifelse to return a vector when the test argument is a scalar. The ifelse statement in the wrong way you have above is returning the first element of the desired vector. From the help file: ifelse can only return "a value that is the same shape as test".

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top