read.table seems to force line break

https://stackoverflow.com/questions/22384107

14-06-2023
|

Question

I have to read in a semicolon separated csv file containing 17 rows with each containing different numbers of strings (see also my previous question). To load the data into R i use read.table with the fill function:

read.table("example.csv", sep=";", fill=TRUE)

When looking at the data frame in R, I see that it did not properly read in the longest rows. The others contain a maximum of 18 elements, the three longest rows contain 22, 23 and 61 elements. Here R seems to force some kind of linebreak, so that the original 19th element of the long rows is loaded as the first element of a new row.

Why is that?

Solution

See ?read.table:

The number of data columns is determined by looking at the first five lines of input (or the whole file if it has less than five lines), or from the length of col.names if it is specified and is longer.

So you have 2 options:

specify the col.names parameter
name every column in your file before reading it into R

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow