If your goal is to have the same variables with the same levels, then you need to avoid using dataframeReduce
a second time, and instead use the same columns as produced by the dataframeReduce
operation on hte train-set and apply factor reduction logic to the test-set in a manner that results in whatever degree of homology is needed of subsequent comparison operations. If it is a predict
operation that is planned then you need to get the levels to be the same and you need to modify the code in dataframeReduce
that works on the levels:
if (is.category(x) || length(unique(x)) == 2) {
tab <- table(x)
if ((min(tab)/n) < minprev) {
if (is.category(x)) {
x <- combine.levels(x, minlev = minprev)
s <- "grouped categories"
if (length(levels(x)) < 2)
s <- paste("prevalence<", minprev, sep = "")
}
else s <- paste("prevalence<", minprev, sep = "")
}
}
So a better problem statement is likely to produce a better strategy. This will probably require both knowing what levels are in the entire set and in the train and test sets as well as what testing or predictions are anticipated (but not yet stated).