Question

I have a massive dictionary of items in a co-occurrence format. Basically, conditional word vectors. the simplified dictionary looks something like this:

reservoir ={
 ('a', 'b'): 2,
 ('a', 'c'): 3,
 ('b', 'a'): 1,
 ('b', 'c'): 3,
 ('c', 'a'): 1,
 ('c', 'b'): 2,
 ('c', 'd'): 5,             ,
}

For the sake of storage, I have decided that if there isn't a co-occurrence, then to not store the information at all, ie: the fact that a and b never occur with d, and therefore I do not have any information associated with either point.

The result I'm trying to get is that for every tuple, key1=x and key2=y, so that in a matrix it will look like this:

  a b c d
a 0 2 3 0
b 1 0 3 0
c 1 2 0 5
d 0 0 0 0

I

I have found information in this post: Adjacency List and Adjacency Matrix in Python, but it's just not quite what I'm looking to do. All my attempts thus far have proven to be less than fruitful. Any help would be amazing.

Thanks again,

Was it helpful?

Solution

You really just need to get the labels for the rows and columns. From there, it's just a few for loops:

from __future__ import print_function

import itertools

reservoir = {
    ('a', 'b'): 2,
    ('a', 'c'): 3,
    ('b', 'a'): 1,
    ('b', 'c'): 3,
    ('c', 'a'): 1,
    ('c', 'b'): 2,
    ('c', 'd'): 5
}

fields = sorted(list(set(itertools.chain.from_iterable(reservoir))))

print(' ', *fields)

for row in fields:
    print(row, end=' ')

    for column in fields:
        print(reservoir.get((row, column), 0), end=' ')

    print()

Your table will start getting ugly when the cells get more than one digit, so I'll leave that to you to figure out. You'll just need to find the maximal length of the field for each column before printing them.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top