After recently experimenting with MongoDB, I tried a few different methods of importing/inserting large amounts of data into collections. So far the most efficient method I've found is mongoimport. It works perfectly, but there is still overhead. Even after the import is complete, memory isn't made available unless I reboot my machine.
Example:
mongoimport -d flightdata -c trajectory_data --type csv --file trjdata.csv --headerline
where my headerline and data look like:
'FID','ACID','FLIGHT_INDEX','ORIG_INDEX','ORIG_TIME','CUR_LAT', ...
'20..','J5','79977,'79977','20110116:15:53:11','1967', ...
With 5.3 million rows by 20 columns, about 900MB, I end up like this:
This won't work for me in the long run; I may not always be able to reboot, or will eventually run out of memory. What would be a more effective way of importing into MongoDB? I've read about periodic RAM flushing, how could I implement something like with the example above?
Update:
I don't think my case would benefit much from adjusting fsync, syncdelay, or journaling. I'm just curious as to when that would be a good idea, and best practice, even if I was running on high RAM servers.