Question

I'd like to add metadata to my spreadsheet as comments, and have R ignore these afterwards.

My data are of the form

v1,v2,v3,
1,5,7,
4,2,1,#possible error,

(which the exception that it is much longer. the first comment actually appears well outside of the top 5 rows, used by scan to determine the number of columns)

I've been trying:

read.table("data.name",header=TRUE,sep=",",stringsAsFactors=FALSE,comment.char="#")

But read.table (and, for that matter, count.fields) thinks that I have one more field than I actually do. My data frame ends up with a blank column called 'X'. I think this is because my spreadsheet program adds commas to the end of every line (as in the above example).

Using flush=TRUE has no effect, even though (according to the help file) it " [...] allows putting comments after the last field [...]"

Using colClasses=c(rep(NA,3),NULL) has no effect either.

I could just delete the column afterwards, but since it seems that this is a common practice I'd like to learn how to do it properly.

Thanks,

Andrew

Was it helpful?

Solution

Your issue regarding the comment character and the number of data columns are unrelated to read.table() but not to your spreadsheet (I'm using Excel). The default behavior for read.table is to treat # as the beginning of a comment and ignore what follows. The reason you are getting an error is because there is a trailing comma at the end of your data lines. That tells read.table that more data should follow. Reading your original example:

> read.table(text="v1, v2, v3,
+  1,5,7,
+  4,2,1,#possible error,", sep=",", header=TRUE)
  v1 v2 v3  X
1  1  5  7 NA
2  4  2  1 NA

The comment is ignored by default and a fourth column is created and labeled X. You could easily delete this column after the fact or use the method that @flodel mentions or you can remove the trailing comma before reading the file into R. In Excel, the trailing comma is added when you save a file as csv (comma separated variables) because the comment appears in the fourth column and Excel doesn't recognize it as a comment. If you save the file as space-separated, the problem goes away (remove the sep= argument since the space is the default separator):

> read.table(text="v1 v2 v3 
+    1 5 7 
+    4 2 1#possible error", header=TRUE)
  v1 v2 v3
1  1  5  7
2  4  2  1

OTHER TIPS

From the doc (?read.table):

colClasses character. A vector of classes to be assumed for the columns. Recycled as necessary, or if the character vector is named, unspecified values are taken to be NA.

Possible values are NA (the default, when type.convert is used), "NULL" (when the column is skipped), one of the atomic vector classes (logical, integer, numeric, complex, character, raw), or "factor", "Date" or "POSIXct". Otherwise there needs to be an as method (from package methods) for conversion from "character" to the specified formal class.

Note that it says to use "NULL", not NULL. Indeed, this works as expected:

con <- textConnection("
v1,v2,v3,
1,5,7,
4,2,1,#possible error,
")

read.table(con, header = TRUE, sep = ",",
           stringsAsFactors = FALSE, comment.char = "#",
           colClasses = c(rep(NA, 3), "NULL"))
#   v1 v2 v3
# 1  1  5  7
# 2  4  2  1
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top