Question

I've frame_table called 'data' in a HDFStore with a multi-index. In a DataFrame it could look like this

                      var1   var2  var3  var4  var5  var6
x_coor y_coor date                                        
928    310    2006257   133  14987  7045    18   240   171
              2006273   136      0  7327    30   253   161
              2006289   125      0  -239    83   217   168
              2006305    95  14604  6786    13   215    57
              2006321    84      0  4548    13   133    88

But now I would like to add an column on the right side with the range (starting from 1). My plan: 1. Create new node with the range 2. Concatenate both nodes into new node

What I did, firstly create a new node (storing as a DataFrame) and then concatenate while resetting columns

store['rindex'] = pd.DataFrame(pd.Series(xrange(1,
                  len(store.root.all_data.table)+1)))
store['rall']=pd.concat([store['all_data'].reset_index(),
              store['rindex'].reset_index()],ignore_index=True,axis=1)

But now both the indices ar part of my data (in columns 0,1,2,10):

 0   1     2     3    5    6    7  8   9  10 11
928 310 2006257 133 14987 7045 18 240 171 0   1
928 310 2006273 136     0 7327 30 253 161 1   2
928 310 2006289 125     0 -239 83 217 168 2   3
928 310 2006305 95  14604 6786 13 215 57  3   4
928 310 2006321 84      0 4548 13 133 88  4   5

<class 'pandas.core.frame.DataFrame'>
Int64Index: 203 entries, 0 to 202
Data columns (total 11 columns):
0     203  non-null values
1     203  non-null values
2     203  non-null values
3     203  non-null values
4     203  non-null values
5     203  non-null values
6     203  non-null values
7     203  non-null values
8     203  non-null values
9     203  non-null values
10    203  non-null values
dtypes: int32(7), int64(4)

I tried the following using this but this results in emptiness:

>>> store['selection'] = store.select('all_data', [pd.Term('index', '>', '0')])
>>> store['selection'].reindex(columns = ['3','4','5','6','7','8','10'])
<class 'pandas.core.frame.DataFrame'>
Int64Index: 203 entries, 0 to 202
Data columns (total 7 columns):
3     0  non-null values
4     0  non-null values
5     0  non-null values
6     0  non-null values
7     0  non-null values
8     0  non-null values
10    0  non-null values
dtypes: float64(7)

So how to select these columns without emptying the values?

Was it helpful?

Solution

Your original frame

In [19]: df2
Out[19]: 
                     var1   var2  var3  var4  var5  var6
x_cor y_cor date                                        
928   310   2006257   133  14987  7045    18   240   171
            2006273   136      0  7327    30   253   161
            2006289   125      0  -239    83   217   168
            2006305    95  14604  6786    13   215    57
            2006321    84      0  4548    13   133    88

reset_index and set_index are invertible to the original frame

In [20]: df2.reset_index()
Out[20]: 
   x_cor  y_cor     date  var1   var2  var3  var4  var5  var6
0    928    310  2006257   133  14987  7045    18   240   171
1    928    310  2006273   136      0  7327    30   253   161
2    928    310  2006289   125      0  -239    83   217   168
3    928    310  2006305    95  14604  6786    13   215    57
4    928    310  2006321    84      0  4548    13   133    88

In [21]: df2.reset_index().set_index(['x_cor','y_cor','date'])
Out[21]: 
                     var1   var2  var3  var4  var5  var6
x_cor y_cor date                                        
928   310   2006257   133  14987  7045    18   240   171
            2006273   136      0  7327    30   253   161
            2006289   125      0  -239    83   217   168
            2006305    95  14604  6786    13   215    57
            2006321    84      0  4548    13   133    88

To number a column

In [23]: df2['range'] = range(len(df2))

In [24]: df2
Out[24]: 
                     var1   var2  var3  var4  var5  var6  range
x_cor y_cor date                                               
928   310   2006257   133  14987  7045    18   240   171      0
            2006273   136      0  7327    30   253   161      1
            2006289   125      0  -239    83   217   168      2
            2006305    95  14604  6786    13   215    57      3
            2006321    84      0  4548    13   133    88      4

You need to store the multi-index frame with the index set (otherwise its just a regular index frame).

Your reindexing step doesn't do anything aas you are reindexing by strings, not numbers (e.g. '1','2' is NOT the same as 1,2)

Is your data REALLY large? Why are you not just reading the frame in from the store, modifying in memory, then writing it back (to the original, or a new location).

The strategy you are talking about which essentially creates an indexed column store only makes sense if you have LOTS of data.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top