Why write.csv and read.csv are not consistent? [closed]

https://stackoverflow.com/questions/12512062

03-07-2021
|

Question

The problem is simple, consider the following example:

m <- head(iris)
write.csv(m, file = 'm.csv')
m1 <- read.csv('m.csv')

The result of this is that m1 is different from the original object m in that it has a new first column named "X". If I really wanted to make them equal, I have to use additional arguments, like in these two examples:

write.csv(m, file = 'm.csv', row.names = FALSE)
# and then
m1 <- read.csv('m.csv')

write.csv(m, file = 'm.csv')
m1 <- read.csv('m.csv', row.names = 1)

The question is, what is the reason of this difference? in particular, why if write.csv and read.csv are supposedly intended to stick to the Excel convention, the don't import the same object that was exported in the first place? To me this is a very counter intuitive behavior and highly undesirable.

(this results happens exactly the same if I use the csv2 variants of these functions)

Thanks in advance!

These are the data.frames m and m1 if you prefer not to use R to see the example:

> m
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa

> m1
  X Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 1          5.1         3.5          1.4         0.2  setosa
2 2          4.9         3.0          1.4         0.2  setosa
3 3          4.7         3.2          1.3         0.2  setosa
4 4          4.6         3.1          1.5         0.2  setosa
5 5          5.0         3.6          1.4         0.2  setosa
6 6          5.4         3.9          1.7         0.4  setosa

Solution

Here's my guess...

write.table writes a data.frame to a file and data.frames always have row names, so not writing row names by default would be throwing away information. (Yes, write.table will also write a matrix and matrices don't have to have row names, but data.frames are probably used much more often than matrices.)

read.table returns a data.frame but CSV files don't have any concept of row names, so someone may argue that it's counter-intuitive to assume, by default, that the first column of a CSV is a row name.

Now there may be a way to make these two functions consistent, but I would argue that writing to a text file isn't the best way to output/input data from one R session to another. It's much safer/faster to use save, load, saveRDS, readRDS, etc.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow