Question

I've been trying to split a space delimited string with double-quotes in R for some time but without success. An example of a string is as follows:

rainfall snowfall "Channel storage" "Rivulet storage"

It's important for us because these are column headings that must match the subsequent data. There are other suggestions on this site as to how to go about this but they don't seem to work with R. One example:

Regex for splitting a string using space when not surrounded by single or double quotes

Here is some code I've been trying:

str <- 'rainfall snowfall "Channel storage" "Rivulet storage"'
regex <- "[^\\s\"']+|\"([^\"]*)\""
split <- strsplit(str, regex, perl=T)

what I would like is

[1] "rainfall" "snowfall" "Channel storage" "Rivulet storage"

but what I get is:

[1] ""  " " " " " "

The vector is the right length (which is encouraging) but of course the strings are empty or contain a single space. Any suggestions?

Thanks in advance!

Was it helpful?

Solution

scan will do this for you

scan(text=str, what='character', quiet=TRUE)
[1] "rainfall"        "snowfall"        "Channel storage" "Rivulet storage"

OTHER TIPS

As mplourde said, use scan. that's by far the cleanest solution (unless you want to keep the \", that is...)

If you want to use regexes to do this (or something not solved that easily by scan), you are still looking at it the wrong way. Your regex returns what you want, so if you use that in your strsplit it will cut out everything you want to keep.

In these scenarios you should look at the function gregexp, which returns the starting positions of your matches and adds the lengths of the match as an attribute. The result of this can be passed to the function regmatches(), like this:

str <- 'rainfall snowfall "Channel storage" "Rivulet storage"'
regex <- "[^\\s\"]+|\"([^\"]+)\""

regmatches(str,gregexpr(regex,str,perl=TRUE))

But if you just needs the character vector as the solution of mplourde returns, go for that. And most likely that's what you're after anyway.

You can use strapply from package gsubfn. In strapply you can define matching string rather than splitting string.

str <- "rainfall snowfall 'Channel storage' 'Rivulet storage'"
strapply(str,"\\w+|'\\w+ \\w+'",c)[[1]]

[1] "rainfall"          "snowfall"          "'Channel storage'" "'Rivulet storage'"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top