It is not possible to select rows with fread()
as with read.csv.sql()
yet. But it is still better to read the entire data (memory permitting) and then subset it as per your criteria. For a 200 mb file, fread()
+ subset()
gave ~ 4 times better performance than read.csv.sql()
.
So, using @Arun's suggestion,
ans = rbindlist(lapply(files, function(x) fread(x)[, fn := x]))
subset(ans, 'your criteria')
is better than the approach in the original question.