FreqDist with nltk: ValueError: too many values to unpack

Question

You were so close!

In this case, you changed your tagged_sent from a list of tuples to a list of lists of tuples because of your list comprehension tagged_sent = [nltk.pos_tag(sent)for sent in words].

Here's some things you can do to discover what type of objects you have:

>>> type(tagged_sent), len(tagged_sent)
(<type 'list'>, 2)

This shows you that you have a list; in this case of 2 sentences. You can further inspect one of those sentences like this:

>>> type(tagged_sent[0]), len(tagged_sent[0])
(<type 'list'>, 9)

You can see that the first sentence is another list, containing 9 items. Well, what does one of those items look like? Well, lets look at the first item of the first list:

>>> tagged_sent[0][0]
('this', 'DT')

If your curious to see the entire object, which I frequently am, you can ask the pprint (pretty-print) module to make it nicer to look at like this:

>>> from pprint import pprint
>>> pprint(tagged_sent)
[[('this', 'DT'),
  ('ball', 'NN'),
  ('is', 'VBZ'),
  ('blue', 'JJ'),
  (',', ','),
  ('small', 'JJ'),
  ('and', 'CC'),
  ('extraordinary', 'JJ'),
  ('.', '.')],
 [('like', 'IN'), ('no', 'DT'), ('other', 'JJ'), ('ball', 'NN'), ('.', '.')]]

So, the long answer is your code needs to iterate over the new second layer of lists, like this:

nouns= []
for sentence in tagged_sent:
    for word,pos in sentence:
        if pos in ['NN',"NNP","NNS"]:
            nouns.append(word)

Of course, this just returns a non-unique list of items, which look like this:

>>> nouns
['ball', 'ball']

You can unique-ify this list in many different ways, but you can quickly by using the set() data structure, like so:

unique_nouns = list(set(nouns))
>>> print unique_nouns
['ball']

For an examination of other ways you can unique-ify a list of items, see the slightly older but extremely useful: http://www.peterbe.com/plog/uniqifiers-benchmark