The fill=TRUE
setting to the parameters of read.table
(or its derivative cousin read.csv
) are probably what you are looking for.
df <- read.table(dat, sep=',' , header=T , fill=TRUE,
colClasses = c("numeric" , "numeric", "character", "character"))
df
#
x1 x2 x3 x4
1 1 2 present present
2 3 4
3 5 6
The default for fill is TRUE for read.csv
, but your error says you used fill=T suggesting that you have an object named T
in your workspace. The default for read.table is fill=!blank.lines.skip
and since the default is also blank.lines.skip = TRUE
, the usual default for fill
in read.table
is FALSE.
Your edited question suggests you have other problems in your character fields. The usual suspects are unmatched quotes or octothorpes(#
) which are effectively line terminators, so try this instead:
df <- read.table( 'C.tab' , header=T , sep='\t', fill=TRUE,
quote="",
comment.char="",
colClasses = c(rep('numeric',7),rep('character',28)))
If you are having difficulty with errors related to varying numbers of items per line, it can be very useful to use count.fields
. It accepts similar parameters to those used by read.table
. If you have a large number of input lines it can be useful to wrap the call to count.fields
in a table
call:
length_tbl <- table( count.fields( 'C.tab' , header=TRUE , sep='\t',
quote="",
comment.char="")
)
You can then experiment with different options. Once you know what you are looking for you can also identify the line numbers that are causing problems by wrapping a which
call around count.fields:
bad_lines <- which( count.fields( 'C.tab' , header=TRUE , sep='\t',
quote="",
comment.char="")
!= 7 # or whatever is the "correct" length
)