Spaceghost is correct, you need to provide a reference back to an actual NgramTagger
object as the backoff
argument and not just an int
. Simply using a number as backoff is meaningless - when creating a new tagger, it has no idea where to look for the previously created tagger with a smaller relative context.
This is why you get the AttributeError: 'int' object has no attribute '_taggers'
. NLTK is looking for an object of a class inheriting from SequentialBackoffTagger
.
Based on your range(3)
, I'm going to guess you actually wanted a trigram tagger with backoff to a bigram tagger, with backoff to a unigram tagger.
You can try something like,
from nltk.corpus import brown
from nltk import NgramTagger
trains = brown.tagged_sents(categories="news")
tagger = None # None here is okay since it's the default argument anyway
for n in range(1,4): # start at unigrams (1) up to and including trigrams (3)
tagger = NgramTagger(n, trains, backoff=tagger)
NOTE: No need to import nltk multiple times.
>>> tagger.tag('hi how are you'.split())
[('hi', None), ('how', 'WRB'), ('are', 'BER'), ('you', 'PPSS')]
Notice, we get None
for the POS of words like "hi" since it doesn't occur in the given corpus (Brown's news category). You can set a default tagger if you want by initially setting tagger
(before the for-loop) like,
from nltk import DefaultTagger
tagger = DefaultTagger('NN')