All the functions you used in pcode_normalize
are already vectorized. There's no need to loop using sapply
. It also looks like you're using strsplit
to look for single-spaces. grepl
would be faster.
Using fixed=TRUE
in your calls to gsub
and grepl
will be faster, since you're not actually using regular expressions.
pcode_normalize <- function (x) {
x <- gsub(" ", " ", x, fixed=TRUE)
sp <- grepl(" ", x, fixed=TRUE)
x[!sp] <- paste(substr(x[!sp], 1, 4), substr(x[!sp], 5, 7))
x
}
all_postcodes$npcode <- pcode_normalize(all_postcodes$pcode)
I couldn't actually test this, since you didn't provide any example data, but it should get you on the right path.