Pregunta

I have in one column of the table this:

paragemcard-resp+insufcardioresp
dpco+pneumonia
posopperfulceragastrica+ards
pos op hematoma #rim direito expontanea
miopatiaduchenne-erb+insuf.resp
dpco+dhca+#femur
posde#subtroncantГ©ricaesqВЄ+complicepidural
dpco+asma

And i want to separate them like this:

paragemcard-resp                            insufcardioresp
dpco                                        pneumonia
posopperfulceragastrica                     ards
pos op hematoma #rim direito expontanea
miopatiaduchenne-erb                        insuf.resp
dpco                                        dhca                   #femur
posde#subtroncantГ©ricaesqВЄ                complicepidural
dpco                                        asma

But the problem is that they don't have the same length. As you can see, in line 3, we have 2 variable and in line 6 we have 3.

And i want to create this string in the same column for further analysis.

Thanks

¿Fue útil?

Solución

You can use read.table, but you should use count.fields or some kind of regex to figure out the correct number of columns first. Using Robert's "text" sample data:

Cols <- max(sapply(gregexpr("+", text, fixed = TRUE), length))+1
## Cols <- max(count.fields(textConnection(text), sep = "+"))

read.table(text = text, comment.char="", header = FALSE, 
           col.names=paste0("V", sequence(Cols)), 
           fill = TRUE, sep = "+")
#                                        V1              V2     V3
# 1                        paragemcard-resp insufcardioresp       
# 2                                    dpco       pneumonia       
# 3                 posopperfulceragastrica            ards       
# 4 pos op hematoma #rim direito expontanea                       
# 5                    miopatiaduchenne-erb      insuf.resp       
# 6                                    dpco            dhca #femur
# 7            posde#subtroncantГ©ricaesqВЄ complicepidural       
# 8                                    dpco            asma       

Also, possibly useful: the "stringi" library makes counting elements easy (as an alternative to the gregexpr step above).

library(stringi)
Cols <- max(stri_count_fixed(x, "+") + 1)

Why the need for the "Cols" step? read.table and family decides how many columns to use either by (1) the maximum number of fields detected within the first 5 rows of data or (2) the length of the col.names argument. In your example row with the most number of fields is the sixth row, so directly using read.csv or read.table would result in incorrectly wrapped data.

Otros consejos

You can use strsplit:

text <- c("paragemcard-resp+insufcardioresp", "dpco+pneumonia", "posopperfulceragastrica+ards", "pos op hematoma #rim direito expontanea", "miopatiaduchenne-erb+insuf.resp", "dpco+dhca+#femur", "posde#subtroncantГ©ricaesqВЄ+complicepidural", "dpco+asma")

strings <- strsplit(text, "+", fixed = TRUE)
maxlen <- max(sapply(strings, length))
strings <- lapply(strings, function(s) { length(s) <- maxlen; s })
strings <- data.frame(matrix(unlist(strings), ncol = maxlen, byrow = TRUE))

and it looks like

                                          X1              X2     X3
   1                        paragemcard-resp insufcardioresp   <NA>
   2                                    dpco       pneumonia   <NA>
   3                 posopperfulceragastrica            ards   <NA>
   4 pos op hematoma #rim direito expontanea            <NA>   <NA>
   5                    miopatiaduchenne-erb      insuf.resp   <NA>
   6                                    dpco            dhca #femur
   7            posde#subtroncantГ©ricaesqВЄ complicepidural   <NA>
   8                                    dpco            asma   <NA>
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top