Question

I am new to dictionary and facing trouble with understanding how to interpret the output of a file as dictionary and read the key value pairs of it.

Here is my script which takes the output of a file as dictionary:

dicts = {}
for line in sys.stdin:
   d = ast.literal_eval(line)
   for k,v in d.items():
      dicts.setdefault(k, []).append(v)
      charcount = sum(int(d['charcount']) for d in dicts[k])
      output_dict = {k: {'charcount': charcount}}
      print output_dict

Here is the output of my file from which the script takes as input:

{ 262968617233162240 : {'@': False, '#': False, 'word': 'good#1st#time#will',    'longword': True, 'title': False, 'charcount': 18, 'uppercase': False, 'stop': False, 'sscore': False, 'url': False, '!!!': False} }
{ 262968617233162240 : {'@': False, '#': False, 'word': 'be', 'longword': False, 'title': False, 'charcount': 2, 'uppercase': False, 'stop': True, 'sscore': False, 'url': False, '!!!': False} }
{ 262968617233162240 : {'@': False, '#': False, 'word': 'going', 'longword': False, 'title': False, 'charcount': 5, 'uppercase': False, 'stop': False, 'sscore': False, 'url': False, '!!!': False} }
{ 262968617233162240 : {'@': False, '#': False, 'word': 'back#', 'longword': False, 'title': False, 'charcount': 5, 'uppercase': False, 'stop': False, 'sscore': False, 'url': False, '!!!': False} }
{ 263790847424880641 : {'@': False, '#': False, 'word': 'http://instagr.am/p/rx9939civ8/\xc2\xa0', 'longword': True, 'title': False, 'charcount': 33, 'uppercase': False, 'stop': False, 'sscore': False, 'url': True, '!!!': False} }

When I run the script, I get repetitive values instead of it parsing the entire input.

Thanks.

Was it helpful?

Solution

I suspect what you're actually looking for here is not one big dict, but rather a list of dicts, one for each line. For example:

dicts = []
for line in sys.stdin:
    dicts.append(eval(line))

I would actually write this with ast.literal_eval (as the eval) docs suggest),* and simplify it into a list comprehension:

dicts = [ast.literal_eval(line) for line in sys.stdin]

But either way, now each element in dicts is a dict. So, to print them all out:

for d in dicts:
    print d

The only thing is, you wanted to sort them. I'm not sure how you want to sort them. In general, sorting dictionaries doesn't make any sense (which is why Python 2 gives you a meaningless order, and Python 3 gives you a TypeError). There are, of course, particular cases where there is some meaningful order, but each such case is different.

Maybe in your case, you want to rely on the fact that each dict has a single key, and sort on that key? If so:

for d in sorted(dicts, key=lambda d: d.keys()[0]):
    print d

But that's just a guess.


From a comment:

how do I do a count on let say, charcount (it exists in the value part of the dict) of all dictionaries with same key.

If you're trying to do that, you have two options.

First, you can always just search the whole list of dictionaries, like this:

charcounts = []
for d in dicts:
    for k, v in d.items():
        if k == key:
            charcounts.append(v['charcount'])

But in this case, you might be better off with a "multidict" structure—that is, a dict whose values are all lists (of dicts, in this case).

There are two easy ways to build a multidict—the setdefault method on dict, or the defaultdict class in collections. Both are equally simple; the different is that the first one gives you a regular dict, so it's a KeyError to look for a key that doesn't exist, while the second one gives you a defaultdict, so you'll get an empty list looking for a key that doesn't exist. I'll show the first, but really, you have to decide which one you want.

dicts = {}
for line in sys.stdin:
    d = ast.literal_eval(line)
    for k, v in d.items(): # should only be one
        dicts.setdefault(k, []).append(v)

This is a bit more work to set up, but less work to search through. For example, the whole mess above can be replaced by one line:

charcounts = [d['charcount'] for d in dicts[key]]

… and, if dicts is very big, it'll be a lot faster, because it only has to look through the ones with matching keys, rather than all of them.

To give you an idea of what this looks like, here's dicts with your sample input:

{262968617233162240: 
    [
        {'!!!': False, '#': False, '@': False, 'charcount': 18, 'longword': True, 'sscore': False, 'stop': False, 'title': False, 'uppercase': False, 'url': False, 'word': 'good#1st#time#will'},
        {'!!!': False, '#': False, '@': False, 'charcount': 2, 'longword': False, 'sscore': False, 'stop': True, 'title': False, 'uppercase': False, 'url': False, 'word': 'be'},
        {'!!!': False, '#': False, '@': False, 'charcount': 5, 'longword': False, 'sscore': False, 'stop': False, 'title': False, 'uppercase': False, 'url': False, 'word': 'going'},
        {'!!!': False, '#': False, '@': False, 'charcount': 5, 'longword': False, 'sscore': False, 'stop': False, 'title': False, 'uppercase': False, 'url': False, 'word': 'back#'}
    ],
 263790847424880641: 
    [
        {'!!!': False, '#': False, '@': False, 'charcount': 33, 'longword': True, 'sscore': False, 'stop': False, 'title': False, 'uppercase': False, 'url': True, 'word': 'http://instagr.am/p/rx9939civ8/\xc2\xa0'}
    ]
}

From another comment:

So the output that I am looking for is: { 262968617233162240, charcount: 30}

Well, that isn't a valid anything in Python. It looks like something half-way between a set and a dict. A dict is a bunch of key-value pairs, with a colon between each key and value.

Here's something that is valid in Python:

{262968617233162240: {'charcount': 30}}

How would you get that?

Well, I already showed you how to get the list of charcounts for any given key. Before you can add them up, you have to convert them all to numbers:

charcounts = [int(d['charcount']) for d in dicts[key]]

Then, to add them up, just call sum:

charcount = sum(int(d['charcount']) for d in dicts[key])

Now, how do we build the output you wanted?

charcount = sum(int(d['charcount']) for d in dicts[key])
output_dict = {key: {'charcount': charcount}}

If you want to do that for each key in the multidict:

for key, values in dicts.items():
    charcount = sum(int(d['charcount']) for d in values)
    output_dict = {key: {'charcount': charcount}}
    # now do something with output_dict

* Or, better yet, change the saving code to use a format actually meant for data interchange, like JSON or pickle.

OTHER TIPS

You have two main problems:

1)

print dicts[v]

cannot work as a dict gets called with a key, and v is the value. This call should give you (your values are dicts in fact):

TypeError: unhashable type: 'dict'

Change it for

print dicts[k]

and the program will run

2)

Your three first lines in the file have the same key. So they are overwritten when you update the dictionary. So at the end you have only two outputs (in four lines as it includes the two print calls):

{'@': False, 'uppercase': False, 'stop': False, '!!!': False, '#': False, 'word': 'back#', 'longword': False, 'title': False, 'url': False, 'sscore': False, 'charcount': 5}
262968617233162240 {'@': False, 'uppercase': False, 'stop': False, '!!!': False, '#': False, 'word': 'back#', 'longword': False, 'title': False, 'url': False, 'sscore': False, 'charcount': 5}
{'@': False, 'uppercase': False, 'stop': False, '!!!': False, '#': False, 'word': 'http://instagr.am/p/rx9939civ8/\xc2\xa0', 'longword': True, 'title': False, 'url': True, 'sscore': False, 'charcount': 33}
263790847424880641 {'@': False, 'uppercase': False, 'stop': False, '!!!': False, '#': False, 'word': 'http://instagr.am/p/rx9939civ8/\xc2\xa0', 'longword': True, 'title': False, 'url': True, 'sscore': False, 'charcount': 33}
Script terminated.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top