Reading big data and logistic regression in R
-
19-06-2021 - |
题
Situation: 1GB CSV file, 100000 rows, 4000 independent numeric variable, 1 dependent variable. R on Windows Citrix Server, with 16GB memory.
Problem: It took me 2 hours! to do:
read.table("full_data.csv", header=T, sep",")
and the glm process crashes, the program is not responding, and I have to shut it down in Task Manager.
解决方案
I often resort to the package sqldf
to load large .csv in memory. A good pointer is here.
不隶属于 StackOverflow