Question

I have a data.frame in R with a column containing character string of the form {some letters}-{a number}{a letter}, e.g. x <- 'KFKGDLDSKFDSKJJFDI-4567W'. So I want for instance to get a column with the numbers eg '4567' for that particular example/row. Theres only one number but it can be of any reasonable length. How can I extract the number from each row in the data.frame?

Was it helpful?

Solution

Use regular expressions to extract substrings. Use as.numeric to convert the resulting character string to a number:

string = 'KFKGDLDSKFDSKJJFDI-4567W'
as.numeric(regmatches(string, regexpr('\\d+', string)))
# 4567

You can easily use this to create a new column in your data frame:

#data = data.frame(x = rep(string, 10))
transform(data, y = as.numeric(regmatches(x, regexpr('\\d+', x))))
#                           x    y
# 1  KFKGDLDSKFDSKJJFDI-4567W 4567
# 2  KFKGDLDSKFDSKJJFDI-4567W 4567
# 3  KFKGDLDSKFDSKJJFDI-4567W 4567
# 4  KFKGDLDSKFDSKJJFDI-4567W 4567
…

OTHER TIPS

Try this one:

gsub("[a-zA-Z]+-([0-9]+)[a-zA-Z]","\\1", "KFKGDLDSKFDSKJJFDI-4567W")
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top