Pergunta

I'd like to know how I can do matrix addition in Python, and I'm running into quite a number of roadblocks trying to figure out the best way.

Here's the problem, written as best as I can formulate it right now.

I have a data set, which is an adjacency matrix for a directed graph, in which isolates of an biological virus is connected to another influenza virus via a directed edge, going from Isolate 1 to Isolate 2. The current representation of this adjacency matrix is as follows:

Adjacency Matrix for Part 1
===========================
Isolate 1    Isolate 2    Connected?
---------    ---------    ---------    
ID1          ID2          1
ID1          ID3          1
ID2          ID4          1

As is seen above, not every isolate is connected to another isolate, for a given part. I have another sparse matrix, illustrating the same type of connections but for a different part. Here's what it's like:

Adjacency Matrix for Part 2
===========================
Isolate 1    Isolate 2    Connected?
---------    ---------    ----------
ID1          ID2          1
ID1          ID3          1
ID1          ID4          1

The difference here is that ID1 is connected to ID4, rather than ID2 being connected to ID4.

So what I'd like to do is to add these two adjacency matrices. What I would expect is the following:

Summed Adjacency Matrix
=======================
Isolate 1    Isolate 2    Connected?
---------    ---------    ---------    
ID1          ID2          2
ID1          ID3          2
ID1          ID4          1
ID2          ID4          1

Does anybody know how I can do this efficiently using Python packages? Most of my work has been done in iPython's HTML notebook, and I've been relying heavily on Pandas 0.11 to do this analysis. If there's an answer in which I could avoid transforming the data into a huge matrix (500x500), that would be the best!

Thanks everybody!

Foi útil?

Solução

Here is a straightforward method (you can reset_index() at the end if you want)

Create with a multi-index on id1 and id2

In [24]: df1 = DataFrame([['ID1','ID2',1],['ID1','ID3',1],['ID2','ID4',1]],columns=['id1','id2','value']).set_index(['id1','id2'])

In [25]: df2 = DataFrame([['ID1','ID2',1],['ID1','ID3',1],['ID1','ID4',1]],columns=['id1','id2','value']).set_index(['id1','id2'])

In [26]: df1
Out[26]: 
         value
id1 id2       
ID1 ID2      1
    ID3      1
ID2 ID4      1

In [27]: df2
Out[27]: 
         value
id1 id2       
ID1 ID2      1
    ID3      1
    ID4      1

Join the index

In [35]: joined_index = df1.index+df2.index

Reindex both by the joint index, fill with 0 and add

In [36]: df1.reindex(joined_index,fill_value=0) + df2.reindex(joined_index,fill_value=0)
Out[36]: 
         value
id1 id2       
ID1 ID2      2
    ID3      2
    ID4      1
ID2 ID4      1

Here is another way (and allows various ways of joining if you specify join kw)

In [41]: a1, a2 = df1.align(df2, fill_value=0)

In [42]: a1 + a2
Out[42]: 
         value
id1 id2       
ID1 ID2      2
    ID3      2
    ID4      1
ID2 ID4      1

Outras dicas

Assuming you have the adjacency data as a list of connections:

import itertools
from collections import defaultdict

adj1 = [
    ('A', 'B'),
    ('A', 'C'),
    ('B', 'D')
]

adj2 = [
    ('A', 'B'),
    ('A', 'C'),
    ('A', 'D')
]

result = defaultdict(int)
for adjacency in itertools.chain(adj1, adj2):
    result[adjacency] +=1

To allow for arbitrary number of connections between the same isolates (e.g. 0, 2, 10):

import itertools
from collections import defaultdict

adj1 = [
    ('A', 'B', 0),
    ('A', 'C', 10),
    ('B', 'D', 1)
]

adj2 = [
    ('A', 'B', 3),
    ('A', 'C', 1),
    ('A', 'D', 1)
]
result = defaultdict(int)
for isolate1, isolate2, connections in itertools.chain(adj1, adj2):
    result[(isolate1, isolate2)] += connections

In both cases, result will be a dictionary of form (isolate1, isolate2) -> sum of adjacencies

scipy.sparse.coo_matrix() constructs a sparse matrix from triplets. Just build a coo_matrix for each adjacency graph and add them: A+B. It is that simple.

Licenciado em: CC-BY-SA com atribuição
Não afiliado a StackOverflow
scroll top