Question

I have many strings of the form name1, name2 and name3, 0, 1, 2 or name1, name2, name3 and name4, 0, 1, 2 and would like to split the vector into 4 elements where the first one would be the whole text string of names. The problem is that strsplit doesn't differenciate between text and numbers and split the string into 5 elements in the first case and into 6 elements in the second example. How can I tell R to dynamically skip the text part of the string with variable number of names?

Was it helpful?

Solution

You have two main options:
(1) grep for the numbers, and extract those.
(2) split on the comma, then coerce to numeric and check for NAs

I prefer the second

splat <- strsplit(x, ",")[[1]]
numbs <- !is.na(suppressWarnings(as.numeric(splat)))

c(paste(splat[!numbs], collapse=","), splat[numbs])
# [1] "name1, name2 and name3" " 0" " 1" " 2"

OTHER TIPS

You could also insert a delimiter in the right places, and then split on that:

delimmed <- gsub('(.*[a-z][0-9]+| [0-9]+),','\\1%',strr)
strsplit(delimmed,'%')

The first part of the regular expression (to the left of the |) matches everything (.*) up to the final letter-number-comma combo; and the second matches any space-number-comma combo. The comma is dropped (since it's outside the parentheses) and replaced by %.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top