Question

I would like to force matrix multiplication "orientation" using Python Pandas, both between DataFrames against DataFrames, Dataframes against Series and Series against Series.

As an example, I tried the following code:

t = pandas.Series([1, 2])
print(t.T.dot(t))

Which outputs: 5

But I expect this:

[1 2
 2 4]

Pandas is great, but this inability to do matrix multiplications the way I want is what is the most frustrating, so any help would be greatly appreciated.

PS: I know Pandas tries to implicitly use index to find the right way to compute the matrix product, but it seems this behavior can't be switched off!

Was it helpful?

Solution 2

Anyone coming to this now may want to consider: pandas.Series.to_frame(). It's kind of clunky.

Here's the original question's example:

import pandas as pd

t = pd.Series([1, 2])

t.to_frame() @ t.to_frame().T
# or equivalently:
t.to_frame().dot(t.to_frame().T)

Which yields:

In [3]: t.to_frame().dot(t.to_frame().T)                                        
Out[3]: 
   0  1
0  1  2
1  2  4

OTHER TIPS

Here:

In [1]: import pandas

In [2]: t = pandas.Series([1, 2])

In [3]: np.outer(t, t)
Out[3]:
array([[1, 2],
       [2, 4]])

Solution found by y-p:

https://github.com/pydata/pandas/issues/3344#issuecomment-16533461

from pandas.util.testing import makeCustomDataframe as mkdf
a=mkdf(3,5,data_gen_f=lambda r,c: randint(1,100))
b=mkdf(5,3,data_gen_f=lambda r,c: randint(1,100))
c=DataFrame(a.values.dot(b.values),index=a.index,columns=b.columns)
print a
print b
print c
assert  (a.iloc[0,:].values*b.iloc[:,0].values.T).sum() == c.iloc[0,0]

C0       C_l0_g0  C_l0_g1  C_l0_g2  C_l0_g3  C_l0_g4
R0                                                  
R_l0_g0       39       87       88        2       65
R_l0_g1       59       14       76       10       65
R_l0_g2       93       69        4       29       58
C0       C_l0_g0  C_l0_g1  C_l0_g2
R0                                
R_l0_g0       76       88       11
R_l0_g1       66       73       47
R_l0_g2       78       69       15
R_l0_g3       47        3       40
R_l0_g4       54       31       31
C0       C_l0_g0  C_l0_g1  C_l0_g2
R0                                
R_l0_g0    19174    17876     7933
R_l0_g1    15316    13503     4862
R_l0_g2    16429    15382     7284

The assert here is useless, it just does a check that it's indeed a correct matrix multiplication.

The key here seems to be line 4:

c=DataFrame(a.values.dot(b.values),index=a.index,columns=b.columns)

What this does is that it computes the dot product of a and b, but force that the resulting DataFrame c has a's indexes and b's columns, indeed converting the dot product into a matrix multiplication, and in pandas's style since you keep the indexes and columns (you lose the columns of a and indexes of b, but this is semantically correct since in a matrix multiplication you are summing over those rows, so it would be meaningless to keep them).

This is a bit awkward but seems simple enough if it is consistent with the rest of the API (I still have to test what will be the result with Series x Dataframe and Series x Series, I will post here my findings).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top