tagging pos in nltk using backoff ngrams

Question 1

Spaceghost is correct, you need to provide a reference back to an actual NgramTagger object as the backoff argument and not just an int. Simply using a number as backoff is meaningless - when creating a new tagger, it has no idea where to look for the previously created tagger with a smaller relative context.

This is why you get the AttributeError: 'int' object has no attribute '_taggers'. NLTK is looking for an object of a class inheriting from SequentialBackoffTagger.

Based on your range(3), I'm going to guess you actually wanted a trigram tagger with backoff to a bigram tagger, with backoff to a unigram tagger.

You can try something like,

from nltk.corpus import brown
from nltk import NgramTagger

trains = brown.tagged_sents(categories="news")
tagger = None         # None here is okay since it's the default argument anyway
for n in range(1,4):  # start at unigrams (1) up to and including trigrams (3)
    tagger = NgramTagger(n, trains, backoff=tagger)

NOTE: No need to import nltk multiple times.

>>> tagger.tag('hi how are you'.split())
[('hi', None), ('how', 'WRB'), ('are', 'BER'), ('you', 'PPSS')]

Notice, we get None for the POS of words like "hi" since it doesn't occur in the given corpus (Brown's news category). You can set a default tagger if you want by initially setting tagger (before the for-loop) like,

from nltk import DefaultTagger
tagger = DefaultTagger('NN')

Question 2

The parameter backoff should point to another tagger that is to be used when the current one has done it's best. You need to define a second tagger or use the default and then change your code to use that. Something like this:

default_tagger = nltk.data.load(nltk.tag._POS_TAGGER)
tagger = nltk.NgramTagger(n, trains, backoff=default_tagger)