Scikit-learn: is semi-supervised Naive Bayes implementation available?

Question

Some months ago, I opened an issue on GitHub about this topic. It is possible to add the respective code to the current master branch of scikit-learn.

The user @larsmans added an experimental class SemisupervisedNB to the file sklearn/naive_bayes.py around a year ago. This code resides in the branch emnb of his forked scikit-learn repository and can be accessed here.

The essential code resides in two files:

The file naive_bayes.py in the current master branch has to be replaced by the older one from the emnb branch.
An editing of the class LabelBinarizer is necessary which can be found in the file sklearn/preprocessing.py in the master branch. The entire class has to be replaced by its definition in @larsmans' emnb branch. There, it resides in the file sklearn/preprocessing/__init__.py.

Even though the code for the Naive Bayes classifiers have not changed a lot for a year, some bug fixes were added to them. Therefore it makes sense to keep the current versions of the file naive_bayes.py and the class LabelBinarizer and instead to give the experimental versions different names.

I've just created my own fork of the scikit-learn repository and added the experimental files on top of the current stable branch 0.13.X. This branch is called 0.13.X-emnb and can be accessed here. If you look at my three recent commits (1 and 2 and 3), you see which files I've changed and newly created.

Since SemisupervisedNB does not work together with the most recent versions of the other classifiers, I've just added a new module next to naive_bayes.py called semisupervised_naive_bayes.py. In there, you find the older versions of the classifiers in renamed versions, e.g. SemiMultinomialNB instead of MultinomialNB so that they don't clash with the most recent versions of the classifiers. Likewise, I've added a class SemisupervisedLabelBinarizer next to LabelBinarizer (the choice of the name is a bit unfortunate but at least it's clear what it should be used for).

So, if you want to use the semisupervised versions of the classifiers, use the module sklearn.semisupervised_naive_bayes. For the current versions, use the module sklearn.naive_bayes.

But please keep in mind that this is highly experimental. It's just a setting for getting this old code working. I haven't searched for bugs.