streamline processing of large files in pandas

https://stackoverflow.com/questions/23641484

22-07-2023
|

Domanda

Is there a way to streamline process large or excel files in pandas without taking up large amounts of memory?

What I do right now is load the file like this:

data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=False)

Perform some task

data.to_csv('Results.csv', sep=',')

If I was working on a computer with low amounts of memory. Is there a way which I can stream and process large datafiles with a iterative function to do something like:

   Load first 1000 rows, store this in memory

   Perform some task

   Save data

   Load next 1000 rows, over write this in memory

   perform task

   append to save file

Soluzione

Just add the chunksize argument to your code:

data = pd.read_csv('SUPERLARGEFILE.csv', index_col=0, encoding = "ISO-8859-1", low_memory=Fals, chunksize=10)

result = []
for chunk in data:  # get chunks of 10 rows each
   result.append(chunk.mean())
# do something with res e.g. res = DataFrame(res).to_csv("result.csv")

Autorizzato sotto: CC-BY-SA insieme a attribuzione

Non affiliato a StackOverflow