Merging large CSV files in Pandas
Question
I have two CSV files(each of the file size is in GBs). I am trying to merge the two CSV files but every time I try to it my computer hangs. Is there no way to merge the files in chunks in pandas itself?
OTHER TIPS
When faced with such situations (loading & appending multi-GB csv files), I found @user666's option of loading one data set (e.g. DataSet1) as a Pandas DF and appending the other (e.g. DataSet2) in chunks to the existing DF to be quite feasible.
Here is the code I implement:
import pandas as pd
amgPd = pd.DataFrame()
for chunk in pd.read_csv(path1+'DataSet1.csv', chunksize = 100000, low_memory=False):
amgPd = pd.concat([amgPd,chunk])
Licensed under: CC-BY-SA with attribution
Not affiliated with datascience.stackexchange