سؤال

I have two CSV files(each of the file size is in GBs). I am trying to merge the two CSV files but every time I try to it my computer hangs. Is there no way to merge the files in chunks in pandas itself?

هل كانت مفيدة؟

المحلول

No, there is not. You will have to use an alternative tool like dask, drill, spark, or a good old fashioned relational database.

نصائح أخرى

When faced with such situations (loading & appending multi-GB csv files), I found @user666's option of loading one data set (e.g. DataSet1) as a Pandas DF and appending the other (e.g. DataSet2) in chunks to the existing DF to be quite feasible.

Here is the code I implement:

import pandas as pd

amgPd = pd.DataFrame()
for chunk in pd.read_csv(path1+'DataSet1.csv', chunksize = 100000, low_memory=False):
    amgPd = pd.concat([amgPd,chunk])
مرخصة بموجب: CC-BY-SA مع الإسناد
لا تنتمي إلى datascience.stackexchange
scroll top