문제

I have two CSV files(each of the file size is in GBs). I am trying to merge the two CSV files but every time I try to it my computer hangs. Is there no way to merge the files in chunks in pandas itself?

도움이 되었습니까?

해결책

No, there is not. You will have to use an alternative tool like dask, drill, spark, or a good old fashioned relational database.

다른 팁

When faced with such situations (loading & appending multi-GB csv files), I found @user666's option of loading one data set (e.g. DataSet1) as a Pandas DF and appending the other (e.g. DataSet2) in chunks to the existing DF to be quite feasible.

Here is the code I implement:

import pandas as pd

amgPd = pd.DataFrame()
for chunk in pd.read_csv(path1+'DataSet1.csv', chunksize = 100000, low_memory=False):
    amgPd = pd.concat([amgPd,chunk])
라이센스 : CC-BY-SA ~와 함께 속성
제휴하지 않습니다 datascience.stackexchange
scroll top