Read a text file with tab and semicolon in R

Question 1

Why not first replace the tab (or multi-space) with a semi-colon then import as normal:

tx<-"Date  ON/OFF  93489;123985;219389;1324;2349
Date  ON/OFF  34536;34566;12346;235346;32567
Date  ON/OFF  6346;235;6547457;2345;4576782"

read.table(text=gsub("([ /t]){2,9}",";",tx),header=F,sep=";")

    V1     V2    V3     V4      V5     V6      V7
1 Date ON/OFF 93489 123985  219389   1324    2349
2 Date ON/OFF 34536  34566   12346 235346   32567
3 Date ON/OFF  6346    235 6547457   2345 4576782

Here's a 2 step version to deal with the number of ;-separated items being irregular:

df<-read.table(text=tx,header=F,stringsAsFactors=F)    # read table with ;-sep chars as one col

x.list<-strsplit(df[,ncol(df)],";")                    # turn the last row into a list, split by ;
max.length<-max(sapply(x.list,length))                 # work out the max length

cbind(df[,1:ncol(df)-1],                               # bind the first columns
  t(                                                   # to the transposed matrix
    sapply(x.list,function(x){length(x)<-max.length    # of the list, with each element expanded
                              x})                      # to max.length items (NAs for missing)
  )
)

    V1     V2     1      2       3      4       5     6
1 Date ON/OFF 93489 123985  219389   1324    2349  <NA>
2 Date ON/OFF 34536  34566   12346 235346   32567  <NA>
3 Date ON/OFF  6346    235 6547457   2345 4576782 43455

Question 2

Suppose we have the test data:

Lines <- "Date\tON/OFF\t93489;123985;219389;1324;2349
Date\tON/OFF\t34536;34566;12346;235346;32567
Date\tON/OFF\t6346;235;6547457;2345;4576782
"

We will use this for the purpose of reproducibility but in reality you would use something like the commented out lines:

1) read.table Read the data with a tab separator and then re-read the third column using a semicolon separator. Finally combine them:

# d1 <- read.table("myfile", as.is = TRUE)
d1 <- read.table(text = Lines, as.is = TRUE)
d2 <- read.table(text = d1[[3]], sep = ";")
d <- cbind(d1[1:2], d2)

giving:

    V1     V2    V1     V2      V3     V4      V5
1 Date ON/OFF 93489 123985  219389   1324    2349
2 Date ON/OFF 34536  34566   12346 235346   32567
3 Date ON/OFF  6346    235 6547457   2345 4576782

2) read.pattern There is a new function read.pattern in the development version of the gsubfn package that makes this simple to do:

library(gsubfn)
source("http://gsubfn.googlecode.com/svn/trunk/R/read.pattern.R")

# read.pattern("myfile", pattern = "[^[:space:];]+")
read.pattern(text = Lines, pattern = "[^[:space:];]+")

giving:

    V1 V2  V3    V4     V5      V6     V7      V8
1 Date ON OFF 93489 123985  219389   1324    2349
2 Date ON OFF 34536  34566   12346 235346   32567
3 Date ON OFF  6346    235 6547457   2345 4576782

REVISED In second solution changed regular expression in pattern argument and changed https to http in source statement.