Question

I have a string in some text of the form "12,34,77", including the quotation marks.

I need to get the values of each of those numbers into a list. I tried using lapply and strsplit:

control2=lapply(strsplit(data$values,","),as.numeric)

but I get the error:

non character argument

What am I doing wrong?

Was it helpful?

Solution

1) strapply

1a) scalar Here is a one-liner using strapply from the gsubfn package:

library(gsubfn)
x <- '"12,34,567"'

strapply(x, "\\d+", as.numeric, simplify = c)
## [1]  12  34 567

1b) vectorized A vectorized version is even simpler -- just remove the simplify=c like this:

v <- c('"1,2,3"', '"8,9"') # test data
strapply(v, "\\d+", as.numeric)`

2) gsub and scan

2a) scalar and here is a one-linear using gsub and scan:

scan(text = gsub('"', '', x), what = 0, sep = ",")
## Read 3 items
## [1]  12  34 567

2b) vectorized A vectorized version would involve lapply-ing over the components:

lapply(v, function(x) scan(text = gsub('"', '', x), what = 0, sep = ","))

3) strsplit

3a) scalar and here is a strsplit solution. Note that we split on both " and , :

as.numeric(strsplit(x, '[",]')[[1]][-1])
## [1]  12  34 567

3b) vectorized A vectorized solution would, again, involve lapply-ing over the components:

lapply(v, function(x) as.numeric(strsplit(x, '[",]')[[1]][-1]))

3c) vectorized - simpler or slightly simpler:

lapply(strsplit(gsub('"', '', v), split = ","), as.numeric)

OTHER TIPS

I think your problem may stem from your source data. In any case, if you want to work with numbers, you will have get rid of quotes. I recommend gsub.

> x <- '"1,3,5"'
> x
[1] "\"1,3,5\""
> x <- gsub("\"", "", x)
> x
[1] "1,3,5"
> as.numeric(unlist(strsplit(x, ",")))
[1] 1 3 5

Try this:

x <-  "12,34,77"
sapply(strsplit(x, ",")[[1]], as.numeric, USE.NAMES=FALSE)
[1] 12 34 77

Since the result of strsplit() is a list of lists, you need to extract the first element and pass this to lapply().


If, however, your string really containst embedded quotes, you need to remove the embedded quotes first. You can use gsub() for this:

x <-  '"12,34,77"'
sapply(strsplit(gsub('"', '', x), ",")[[1]], as.numeric, USE.NAMES=FALSE)
[1] 12 34 77

As has already been pointed out, you need to regex out the quotation marks first.

The destring function in the taRifx library will do that (remove any non-numeric characters) and then coerce to numeric:

test <- '"12,34,77"'
library(taRifx)
lapply(strsplit(test,","),destring)
[[1]]
[1] 12 34 77
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top