Spliting a comma delimitted string into several columns and asigning 0 to nospace

https://stackoverflow.com/questions/19888053

30-07-2022
|

Question

In my data.frame a vector x containing text strings (with six values (from 0 to 100) separated by comma inside each string) in this format:

x[1] "3,2,4,34,2,9"
x[2] "45,,67,,,"
x[3] ",,,,99,"

Here is the link to the actual vector I am having problems with: x.cvs x.cvs

Unfortunately, the value of "0" is recorded as "an empty no space" between the two commas, or before the first comma, or after the last comma.

It would be great first to be able to transform it into:

x[1]  "3,2,4,34,2,9"
x[2]  "45,0,67,0,0,0"
x[3]  "0,0,0,0,99,0"

But most importantly, I would like to split this vector into 6 different vectors x1, x2, x3, x4, x5, x6 and each of them to take the value from the string, and replace "no space" between commas with "0", for example, the result should be:

x1[3] 0
x6[2] 0

I think the strsplit() would have worked if there has been a value between commas, but since there is no value, not even an empty space, I am not sure what is the right way to proceed, without getting NAs.

I tried the following, but it does give me a lot of errors:

x<- as.character(x)
x <- gsub(",,", ",0,", x)
x <- gsub(", ,", ",0,", x)
splitx = do.call("rbind", (strsplit(x, ",")))
splitx = data.frame(apply(splitx, 2, as.numeric))
names(splitx) = paste("x", 1:6, sep = "")

I get errors...

In rbind(c("51", "59", "59", "60", "51", "51"), c("51", "59", "59",  :
  number of columns of result is not a multiple of vector length (arg 10994)
 In apply(splitx, 2, as.numeric) : NAs introduced by coercion

Solution

Here are two alternatives to consider, depending on what you are actually expecting as your output.

The first option outputs a set of vectors, but I find that to be a little bit unnecessary and can quickly litter your workspace with lots of objects.

The second option, which I prefer, creates a convenient data.frame with each row representing one of the items from your vector "x".

Sample Data

x <- vector()
x[1] <- "3,2,4,34,2,9"
x[2] <- "45,,67,,,"
x[3] <- ",,,,99,"

Option 1

Names <- paste0("A", seq_along(x))
for (i in seq_along(x)) {
  assign(Names[i], {Z <- scan(text=x[i], sep=","); Z[is.na(Z)] <- 0; Z})
}
A1
# [1]  3  2  4 34  2  9
A2
# [1] 45  0 67  0  0  0
A3
# [1]  0  0  0  0 99  0

Option 2

Z <- read.csv(text = x, header = FALSE)
Z[is.na(Z)] <- 0
Z
#   V1 V2 V3 V4 V5 V6
# 1  3  2  4 34  2  9
# 2 45  0 67  0  0  0
# 3  0  0  0  0 99  0

Extracting values from a data.frame is as easy as specifying the desired rows and columns.

Z[1, 3]
# [1] 4
Z[2, 4]
# [1] 0
Z[3, c(1, 3, 5)]
#   V1 V3 V5
# 3  0  0 99

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow