I have a dataframe (a) , from which I want to subtract a list (b), column-wise:

import numpy as np
import pandas as pd

In:a=pd.DataFrame(np.arange(0,20).reshape(5,4))
   print(a)

Out:   0   1   2   3
   0   0   1   2   3
   1   4   5   6   7
   2   8   9  10  11
   3  12  13  14  15
   4  16  17  18  19

In: b=[1,2,3,4,5]

I expect this operation to work:

c=a-b

However I get an error.

The following operation does what I want, but it is inelegant. What is the correct way to do this?

In: c=(a.T-b).T
    print(a)

Out:  0   1   2   3
  0  -1   0   1   2
  1   2   3   4   5
  2   5   6   7   8
  3   8   9  10  11
  4  11  12  13  14
有帮助吗?

解决方案

I'd recommend using sub:

>>> a = pd.DataFrame(np.arange(0,20).reshape(5,4))
>>> b = [1,2,3,4,5]
>>> a.sub(b, axis=0)
    0   1   2   3
0  -1   0   1   2
1   2   3   4   5
2   5   6   7   8
3   8   9  10  11
4  11  12  13  14

[5 rows x 4 columns]
>>> np.allclose(a.sub(b,axis=0), (a.T-b).T)
True

其他提示

I think this may be easier to read and understand:

In [12]:

import numpy as np
a=pd.DataFrame(np.arange(0,20).reshape(5,4))
b=[1,2,3,4,5]

In [13]:

print (a.T-array(b)).T
    0   1   2   3
0  -1   0   1   2
1   2   3   4   5
2   5   6   7   8
3   8   9  10  11
4  11  12  13  14

Or maybe this:

a-(np.zeros(a.shape)+array(b)[...,np.newaxis])

Matrix manipulation would be the slowest, the buildin .sub() @DSM and the array broadcasting method will be similarly fast:

In [31]:                                  

%timeit a.sub(b, axis=0)                  
1000 loops, best of 3: 565 us per loop    
In [32]:                                  

%timeit a-(np.zeros(a.shape)+array(b)[...,np.newaxis])
1000 loops, best of 3: 572 us per loop    
In [33]:                                  

%timeit (a.T-array(b)).T                
1000 loops, best of 3: 896 us per loop

In case if you are wondering, the lambda version is the slowest, as it often the case in python:

In [36]:

%timeit a.apply(lambda x: x-b)
100 loops, best of 3: 2.63 ms per loop

You can use apply and use a lambda to subtract the list values column-wise:

In [11]:

import pandas as pd
a=pd.DataFrame(np.arange(0,20).reshape(5,4))
b=[1,2,3,4,5]

a

Out[11]:

    0   1   2   3
0   0   1   2   3
1   4   5   6   7
2   8   9  10  11
3  12  13  14  15
4  16  17  18  19

[5 rows x 4 columns]

In [12]:

c=a.apply(lambda x: x-b)
c

Out[12]:

    0   1   2   3
0  -1   0   1   2
1   2   3   4   5
2   5   6   7   8
3   8   9  10  11
4  11  12  13  14

[5 rows x 4 columns]
许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top