Pregunta

I have a list of 4-grams that I want to populate a dictionary object/shevle object with:

['I','go','to','work']
['I','go','there','often']
['it','is','nice','being']
['I','live','in','NY']
['I','go','to','work']

So that we have something like:

four_grams['I']['go']['to']['work']=1

and any newly encountered 4-gram is populated with its four keys, with the value 1, and its value is incremented if it is encountered again.

¿Fue útil?

Solución

You could do something like this:

import shelve

from collections import defaultdict

db = shelve.open('/tmp/db')

grams = [
    ['I','go','to','work'],
    ['I','go','there','often'],
    ['it','is','nice','being'],
    ['I','live','in','NY'],
    ['I','go','to','work'],
]

for gram in grams:
    path = db.get(gram[0], defaultdict(int))

    def f(path, word):
        if not word in path:
            path[word] = defaultdict(int)
        return path[word]
    reduce(f, gram[1:-1], path)[gram[-1]] += 1

    db[gram[0]] = path

print db

db.close()

Otros consejos

You can just create a helper method that inserts the elements into a nested dictionary one at a time, each time checking to see if the desired sub-dictionary already exists or not:

dict = {}
def insert(fourgram):
    d = dict    # reference
    for el in fourgram[0:-1]:       # elements 1-3 if fourgram has 4 elements
        if el not in d: d[el] = {}  # create new, empty dict
        d = d[el]                   # move into next level dict

    if fourgram[-1] in d: d[fourgram[-1]] += 1  # increment existing, or...
    else: d[fourgram[-1]] = 1                   # ...create as 1 first time

You can populate it with your dataset like:

insert(['I','go','to','work'])
insert(['I','go','there','often'])
insert(['it','is','nice','being'])
insert(['I','live','in','NY'])
insert(['I','go','to','work'])

after which, you can index into dict as desired:

print( dict['I']['go']['to']['work'] );     # prints 2
print( dict['I']['go']['there']['often'] ); # prints 1
print( dict['it']['is']['nice']['being'] ); # prints 1
print( dict['I']['live']['in']['NY'] );     # prints 1
Licenciado bajo: CC-BY-SA con atribución
No afiliado a StackOverflow
scroll top