read.table function for reading in incomplete data in R

https://stackoverflow.com/questions/22364307

r
read.table

13-06-2023
|

Question

I have a big table to read into R, and the file is in .txt format. In R, I use read.table function but there is error in reading it in. The following error message appears:

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
  line 28 did not have 23 elements

It seems that (counting from the 1st row without counting the header as I specified skip=), the data in line 28 have missing elements. I am seeking a way to automatically correct this issue by filtering out this row. For now, it is impossible for me to even read in the file, so I cannot manipulate in R... Any suggestions are greatly appreciated :)

Solution

Here would be my way to do it: call read.table with the option fill=TRUE, and exclude the lines without all fields filled afterward (with a call to count.fields).

Example:

# 1. Data generation, and saving in 'tempfile'
cat("1 John", "2 Paul", "7 Pierre", '9', file = "tempfile", sep = "\n")

# 2. read the data:
data = read.table('tempfile',fill=T)

# 3. exclude incomplete data
c.fields = count.fields('tempfile')
data = data[ - (which(c.fields) != max(c.fields)),]

(edited to get automatically the number of rows)

OTHER TIPS

That error also occur when you have a hash symbol (#) in your data.

If that's the case, simply change the option comment.char to comment.char = ""

read.table("file.txt", comment.char = "")

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow