Question

This works fine:

cols = ['X', 'Y']
ind = [('A', 1), ('B', 2)]
ind = pd.MultiIndex.from_tuples(index, names=['foo', 'number'])

df = pd.DataFrame(rand(2,2), columns = cols, index=ind)
store.put('df', df, table=True)
print store['df']

               X         Y
foo number                    
A   1       0.015005  0.213427
B   2       0.090311  0.595418

This breaks:

cols = [('X', 1), ('Y', 2)]
cols = pd.MultiIndex.from_tuples(index, names=['bar', 'number'])
ind = [('A', 1), ('B', 2)]
ind = pd.MultiIndex.from_tuples(index, names=['foo', 'number'])

df = pd.DataFrame(rand(2,2), columns = cols, index=ind)
store.put('df', df, table=True)
print store['df']

KeyError: u'no item named foo'

I suspect this is a known limitation of using PyTables, but I couldn't find any reference in the Pandas docs that the multiindex is in fact restricted to the index, not the columns.

Was it helpful?

Solution

This is not supported, e.g. BOTH a column-multi-index and an index multi-index. Either one alone works. However, in general a column multi-index is not very useful as its impossible to select from it with out some really odd syntax (the columns are stored as tuples, so they have to be explicity selected). So I wouldn't recommend it in any event.

I'll open an issue to support both, as it current raises, in any event, see here: https://github.com/pydata/pandas/issues/5823

OTHER TIPS

Until #5823 is solved, you may collapse the index prior to storing it, as a workaround (see this SO how: https://stackoverflow.com/a/14508355/548792):

assert isinstance(df.columns, pd.MultiIndex), df
df.columns = ['.'.join(col).strip() for col in df.columns.values]
df.to_hdf(store, 'df', table=True)

And to recreate it, assuming no other dot(.) exists anywhere in the original column names:

df = store['/df']
df.columns = pd.MultiIndex.from_tuples([c.split('.') for c in df.columns])
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top