The wide object columns are definitely the problem. My solution has been to truncate the object columns while reading them in. If I truncate to a width of 20 characters, the h5 file is only about twice as large as a csv file. However, if I truncate to 100 characters, the h5 file is about 6 times larger.
I include my code below as an answer, but if anyone has any idea how to reduce this size disparity without having to truncate so much text, I'd be grateful.
store = pd.HDFStore(filepath, 'w')
for chunk in pd.read_csv(f, chunksize=5000, sep='\t',
na_values="null", error_bad_lines=False):
chunk = chunk.apply(truncateCol)
store.append(table, chunk)
def truncateCol(ser, width=100):
if ser.dtype == np.object:
ser = ser.str[:width] if ser.str.len().max() > width else ser
return ser