There are two parts to this problem:
- checking that inputs are valid
- coercing a list to a vector
Checking valid inputs
First, I'd avoid is()
because it's known to be slow. That gives:
check_valid <- function (elem, mode) {
if (length(elem) != 1) stop("Must be length 1")
if (mode(elem) != mode) stop("Not desired type")
TRUE
}
Now we need to figure out whether a loop or apply variant is faster. We'll benchmark with the worst possible case where all inputs are valid.
worst <- as.list(0:101)
library(microbenchmark)
options(digits = 3)
microbenchmark(
`for` = for(i in seq_along(worst)) check_valid(worst[[i]], "numeric"),
lapply = lapply(worst, check_valid, "numeric"),
vapply = vapply(worst, check_valid, "numeric", FUN.VALUE = logical(1))
)
## Unit: microseconds
## expr min lq median uq max neval
## for 278 293 301 318 1184 100
## lapply 274 282 291 310 1041 100
## vapply 273 284 288 298 1062 100
The three methods are basically tied. lapply()
is very slightly
faster, probably because of the special C tricks that it uses
Coercing list to vector
Now let's look at a few ways of coercing a list to a vector:
change_mode <- function(x, mode) {
mode(x) <- mode
x
}
microbenchmark(
change_mode = change_mode(worst, "numeric"),
unlist = unlist(worst),
as.vector = as.vector(worst, "numeric")
)
## Unit: microseconds
## expr min lq median uq max neval
## change_mode 19.13 20.83 22.36 23.9 167.51 100
## unlist 2.42 2.75 3.11 3.3 22.58 100
## as.vector 1.79 2.13 2.37 2.6 8.05 100
So it looks like you're already using the fastest method, and the total cost is dominated by the check.
Alternative approach
Another idea is that we might be able to get a little faster by looping over the vector once, instead of once to check and once to coerce:
as_atomic_for <- function (x, mode) {
out <- vector(mode, length(x))
for (i in seq_along(x)) {
check_valid(x[[i]], mode)
out[i] <- x[[i]]
}
out
}
microbenchmark(
as_atomic_for(worst, "numeric")
)
## Unit: microseconds
## expr min lq median uq max neval
## as_atomic_for(worst, "numeric") 497 524 557 685 1279 100
That's definitely worse.
All in all, I think this suggests if you want to make this function faster, you should try vectorising the check function in Rcpp.