that happens because sapply
returns a vector, and a vector can't be mixed. If you use lapply
then you get a list result which can be mixed, the same code but with lapply
instead of sapply
works how you want it to.
Applying Regex Across a Vector
Question
I'm at a loss why the following code doesn't work. The intention is to input a vector of strings, some of which can be converted to a number, some can't. The following 'sapply' function should use a regex to match numbers and then return the number or (if not) return the original.
sapply(c("test","6","-99.99","test2"), function(v){
if(grepl("^[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?$",v)){as.numeric(v)} else {v}
})
Which returns the following result:
"test" "6" "-99.99" "test2"
Edit: What I expect the code to return:
"test" 6 -99.99 "test2
I can run the if statement on each element successfully.
> if(grepl("^[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?$","test")){as.numeric("test")} else {"test"}
[1] "test"
if(grepl("^[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?$","6")){as.numeric("6")} else {"6"}
[1] 6
And etc...
I don't understand why this is happening. I guess I have two questions. One: Why is this happening? And two: Usually I'm pretty good at troubleshooting, but I have no idea where to even look for this. If you know the problem, how did you find/know the solution? Should I open up the internal lapply function code?
Solution
OTHER TIPS
@Jeremy points into right direction, you can use lapply, which returns a list. Or, you can tell sapply
not to simplify result.
If simplification occurs, the output type is determined from the highest type of the return values in the hierarchy NULL < raw < logical < integer < double < complex < character < list < expression, after coercion of pairlists to lists.
out <- sapply(c("test","6","-99.99","test2"), function(v){
if(grepl("^[-+]?[0-9]*.?[0-9]+([eE][-+]?[0-9]+)?$",v)){
as.numeric(v)
} else {
v
}
}, simplify = FALSE)
> out
$test
[1] "test"
$`6`
[1] 6
$`-99.99`
[1] -99.99
$test2
[1] "test2"