scikit learn use multinomial naive bayes for a trigram classifier?

Question

CountVectorizer will extract trigrams for you (using ngram_range=(3, 3)). The text feature extraction documentation introduces this. Then, just use MultinomialNB exactly like before with the transformed feature matrix.

Note that this is actually modeling:

P(document | label) = P(word_X, word_X-1, word_X-2 | label) * P(word_X-1, word_X-2, word_X-3 | label) * ...

How different is that? Well, that first term can be written as

P(word_X, word_X-1, word_X-2 | label) = P(word_X | word_X-1, word_X-2, label) * P(word_X-1, word_X-2 | label)

Of course, all the other terms can be written that way too, so you end up with (dropping the subscripts and the conditioning on the label for brevity):

P(X | X-1, X-2) P(X-1 | X-2, X-3) ... P(3 | 2, 1) P(X-1, X-2) P(X-2, X-3) ... P(2, 1)

Now, P(X-1, X-2) can be written as P(X-1 | X-2) P(X-2). So if we do that for all those terms, we have

P(X | X-1, X-2) P(X-1 | X-2, X-3) ... P(3 | 2, 1) P(X-1 | X-2) P(X-2 | X-3) ... P(2 | 1) P(X-2) P(X-1) ... P(1)

So this is actually like using trigrams, bigrams, and unigrams (though not estimating the bigram/unigram terms directly).