This is basically the answer here, which I recently answered.
Bottom line is this, you need to turn off indexing store.append('df',df,index=False)
. When creating the store, then index it at the end.
Furthermore turn off compression when merging the tables as well.
Indexing is a fairly expensive operation and if I recall correctly, only uses a single processor.
Finally, make sure that you create the merged with with mode='w'
as all of the subsequent operations are appends and you want to start with a clean new file.
I also would NOT specify the chunksize
upfront. Rather, after you have created the final index, perform the compression using ptrepack
and specify chunksize=auto
which will compute it for you. I don't think this will affect write performance but will optimize query performance.
You might try tweaking the chunksize
parameter to append
(this is the writing chunksize) to a larger number as well.
Obviously make sure that each of the appending tables has exactly the same structure (will raise if this is not the case).
I created this issue for an enhancement to do this 'internally': https://github.com/pydata/pandas/issues/6837