Question

I'm using lapply to try to split up a character string in a data frame. The strings all look similar to "02D_48M_RHD". I'm trying to grab the numbers before the "D" and before the "M".

My use of lapply seems to be working:

a <- lapply(res$description, strsplit, split="[DM]_", fixed=FALSE)

> a[[1]]
[[1]]
[1] "02"  "48"  "RHD"

However I cannot, for the life of me, figure out how to access just the first element of the vector in a[1]. The documentation suggests that a[[1]][1] should give me the first element, but this is what happens:

> a[[1]][1]
[[1]]
[1] "02"  "48"  "RHD"

I don't understand why this doesn't work. R tells me that this is a vector, but it also says that it has length of one.

> is.vector(a[[1]])
[1] TRUE
> length(a[[1]])
[1] 1

I'm not sure what I'm misunderstanding. Is lapply giving output in some way other than what I expect? I expect a list of vectors of length three, and that's what it looks like. Or, is that what I'm getting but I'm trying to access them wrong?

Eventually, I'd like to add three columns to my data frame, one for each of these pieces of information, so anything that could help me move in that direction would be greatly appreciated.

Was it helpful?

Solution

strsplit is already vectorized, so there's no need to wrap it in lapply. You're confused because a is a list of lists of vectors, not a list of vectors. I.e. a[[1]] is itself a one-element list that contains a vector.

Also, lists are "vectors". That's why is.vector returns TRUE. is.character should return FALSE.

You want something like:

splits    <- strsplit(res$description, "[DM]_", fixed=FALSE)
res$one   <- sapply(splits, "[", 1)
res$two   <- sapply(splits, "[", 2)
res$three <- sapply(splits, "[", 3)

OTHER TIPS

I don't think your call to lapply is necessary as strsplit already works on vectors. Something like this may help:

a <- "02D_48M_RHD"
#Create a vector of values to splot
aa <- c(a,a,a,a,a,a,a)
#rbind them together and make a data.frame
> data.frame(do.call("rbind", strsplit(aa, split="[DM]_", fixed=FALSE)))

  X1 X2  X3
1 02 48 RHD
2 02 48 RHD
3 02 48 RHD
4 02 48 RHD
5 02 48 RHD
6 02 48 RHD
7 02 48 RHD
 x=c('02D_48M_RHD', '34D_98M_AHR')


> lapply(x,strsplit,split='[DM]_',fixed=F)
[[1]]
[[1]][[1]]
[1] "02"  "48"  "RHD"


[[2]]
[[2]][[1]]
[1] "34"  "98"  "AHR"

this makes a nasty nested list thing. I think what you want is:

> lapply(strsplit(x,split='[DM]_',fixed=F),'[',1)
[[1]]
[1] "02"

[[2]]
[1] "34"
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top