String extraction: Understanding weird output

Question 1

hrbrmstr and Jake Burkhead give you the explanation: what is not matched is not replaced.

Since the two last columns don't contain digits, they are not matched (and replaced).

A way to solve the problem is to replace all that is not a digit with nothing:

numbers<-gsub(pattern="\\D+","", c)

Question 2

gsub() is going to take the vector, look for the pattern, replace it where found and return each element whether it was replaced or not. You can use something like this:

library(stringr)

c.names <- c("Variable182predict", "Variable123Target", "Timestamp", "TargetVariable")
as.numeric(na.omit(str_extract(c.names, "\\d+")))

which will return

## [1] 182 123

(I made the assumption you only wanted the numeric output and nothing else)

The stringr is a pretty handy package to have around if you do alot with character vectors.

Question 3

From ?gsub:

 Elements of character vectors ‘x’ which are not
 substituted will be returned unchanged

So if the regex doesn't match one of the input elements it does nothing to that element. The last 2 elements of your input vector don't match the pattern since they don't contain an e followed by one or more digits, so they are returned untouched.

Question 4

If you want to extract all digits from text use this function from stringi package. "Nd" is the class of decimal digits.

    stri_extract_all_charclass(c(123,43,"66ala123","kot"),"\\p{Nd}")
[[1]]
[1] "123"

[[2]]
[1] "43"

[[3]]
[1] "66"  "123"

[[4]]
[1] NA

Please note that here 66 and 123 numbers are extracted separatly and using gsub function they are paste together in 66123