سؤال

I need to read a huge file (around 20G) that contains trade data, and I wonder if there's a good method to read the file without killing the memory.

My current method is loading the files by columns and join those columns together:

columnA:(" S "; 10 20 30)0:`filepath

The problem of this method is that although it is pretty fast, it uses a huge chunk of memory, and I want to improve its memory usage.

I have also tried to use .Q.fs, but it takes more than 3 hours to load the file...

Is there a way to do this efficiently without consuming tons of memory?

Thanks

هل كانت مفيدة؟

المحلول

.Q.fsn is a version of .Q.fs that allows specifying the size of chunks that are read in bytes, .Q.fs uses the default size 131000. You could increase the chunk size which would speed things up. .Q.fsn takes three arguments, the first two are the same as for .Q.fs the last is the size.

نصائح أخرى

Do you need to keep the table in memory or is this an intermediary step to writing the table to disk?

If you want to keep the table in memory, it sounds like you don't have enough RAM either way. Whether you read each individual column and then join or stream the table using .Q.fs, I suspect the total memory footprint will be similar.

You could follow the steps here which shows how to handle large files, although all of these use .Q.fs. My guess is you've already looked at this.

If you are saving the table directly to disk as a splayed table, you could read in each column and then write out individually. Then delete the column from memory before moving on to the next.

مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى StackOverflow
scroll top