Question

As a newbie in R how to treat correctly a variable having multiple values like that :

x = c("1","1","1/2","2","2/3","1/3")

As you see value 3 only appears in conjonction with others.

To compute x further, the best would be to obtain 3 vectors like :

X[1] = c(1,1,1,NA,NA,1)

because "1" appears in 1st, 2nd, 3rd and 6th places. idem with X[2] and X[3]

All information seems to be preserved doing so : Am I wrong ?

I have already tested strsplit but it is not preserving NA's values that are not already in my vector.

Was it helpful?

Solution

This seems to work:

x = c("1","1","1/2","2","2/3","1/3")

#Split on your character. This may not be inclusive of all characters that 
#need to be split on.
xsplit <- strsplit(x, "\\/")
#Find the unique items
xunique <- unique(unlist(xsplit))

#Iterate over each xsplit for all unique values
out <- sapply(xsplit, function(z)  
  sapply(xunique, function(zz) zz %in% z)
)
#convert FALSE to NA
out[out == FALSE] <- NA

#Results in
> out
  [,1] [,2] [,3] [,4] [,5] [,6]
1 TRUE TRUE TRUE   NA   NA TRUE
2   NA   NA TRUE TRUE TRUE   NA
3   NA   NA   NA   NA TRUE TRUE

OTHER TIPS

An alternative is to use cSplit_e from my "splitstackshape" package.

x = c("1","1","1/2","2","2/3","1/3")
library(splitstackshape)
cSplit_e(data.frame(x), "x", "/")
#     x x_1 x_2 x_3
# 1   1   1  NA  NA
# 2   1   1  NA  NA
# 3 1/2   1   1  NA
# 4   2  NA   1  NA
# 5 2/3  NA   1   1
# 6 1/3   1  NA   1

(Note that the results here are transposed in comparison to the results in the accepted answer.)

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top