I have a large data set, which variables are separated by the symbol of |**|. I've tried to use sep="|", but this did not work when the one of the string variables containing |. How can I make R to read data with compound separator?

有帮助吗?

解决方案

(Frankly I think it would be easier to do this with sed. This may not be very fast in R)

Lines <- readLines(filename)
sLines <- strsplit(Lines, "|**|", fixed=TRUE) # Thanks, Richie.
dat <- read.table(text= sapply(sLines, paste, collapse=",") ,sep=",")

Here's the test on a simple datastring:

Lines <- "a|**|b|**|c\nd|**|e|**|f"
sLines <- strsplit(Lines, "\\|\\*\\*\\|")
dat <- read.table(text= sapply(sLines, paste, collapse=",") ,sep=",")
dat
#-----------
  V1 V2 V3
1  a  b  c
2  d  e  f

strsplit uses regex patterns so you need to doubly escape the "specials". Would be faster if you used colClasses in the read.table call. See ?read.table

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top