I haven't used nltk
, but I suspect the problem is that from_words
accepts a string or tokens(?) object.
Something akin to
with open('MkXVM6ad9nI.txt') as wordfile:
text = wordfile.read)
tokens = nltk.wordpunct_tokenize(text)
finder = BigramCollocationFinder.from_words(tokens)
is likely to work, although there's probably a specialised API for files too.