Take a closer look at the documentation for read.csv.sql
, specifically at the argument nrows
:
nrows: Number of rows used to determine column types. It defaults to 50. Using -1 causes it to use all rows for determining column types.
Another thing you'll note from looking at the documentation for read.csv.sql
and sqldf
is that there is no colClasses
parameter. If you read the file.format
documenation in sqldf
, you'll see that parameters in the file.format
list are not passed to read.table
but rather to sqliteImportFile
, which has no understanding of R's data types. If you don't like modifying the nrows
parameter, you could read the entire dataframe as having character type and then use whatever methods you like to figure out what column should be what class. You're always going to have the problem of not knowing whether an integer is an integer or numeric until you read the entire column however. Also, if the speed issue is really killing you here, you may want to consider moving away from CSV's.