Situation: 1GB CSV file, 100000 rows, 4000 independent numeric variable, 1 dependent variable. R on Windows Citrix Server, with 16GB memory.

Problem: It took me 2 hours! to do:

read.table("full_data.csv", header=T, sep",")

and the glm process crashes, the program is not responding, and I have to shut it down in Task Manager.

有帮助吗?

解决方案

I often resort to the package sqldf to load large .csv in memory. A good pointer is here.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top