Read csv file with many named column labels with pandas

https://stackoverflow.com/questions/18905057

29-06-2022
|

Question

I'm brand new to pandas for python. I have a data file that has multiple row labels (per row) and column labels (per column) like the following data of observation counts for 3 different animals (dog,bat,ostrich) at multiple recording times (monday morning, day, night):

   ''    ,    ''      , colLabel:name    , dog   ,    bat     , Ostrich
   ''    ,    ''      , colLabel:genus   , Canis , Chiroptera , Struthio,
   ''    ,    ''      , colLabel:activity, diurnal,  nocturnal,  diurnal
   day   , time of day,  ''              ,        ,           ,         
  Monday , morning    ,    ''            , 17     ,  5        , 2
  Monday , day        ,    ''            , 63     ,  0        , 34
  Monday , night      ,    ''            , 21     ,  68       , 1
  Friday , day        ,    ''            , 72     ,  0        , 34

I'd like to read this data into Pandas where both the rows and columns are hierarchically organized. What is the best way of doing this?

Solution

You can use the header, index_col and tupleize_cols arguments of read_csv:

In [1]: df = pd.read_csv('foo.csv', header=[0, 1, 2], index_col=[0, 1], tupleize_cols=False, sep='\s*,\s+')

Note: in 0.13 tupelize=False will be the default, so you won't need to use that.

There's a little bit of hacking required to get out the column level names:

In [2]: df.columns.names = df.columns[0]

In [3]: del df[df.columns[0]]

In [4]: df
Out[4]:
colLabel:name           dog         bat    Ostrich
colLabel:genus        Canis  Chiroptera  Struthio,
colLabel:activity   diurnal   nocturnal    diurnal
day    time of day
Monday morning           17           5          2
       day               63           0         34
       night             21          68          1
Friday day               72           0         34

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow