I am trying to use NLTK for training a Naive Bayes classifier for multi-class text classification. But I do not have access to the original texts. I am provided with is a file in SVM Light format (one instance each line with feature:value pair). I simply have to import this file and train and test Naive Bayes classifier using this dataset. I was wondering if there is some way to import this file into NLTK and use it directly for training classifiers.

有帮助吗?

解决方案

According to nltk's own documentation this is achieved something like this:

Excerpt from Documentation:

scikit-learn (http://scikit-learn.org) is a machine learning library for Python. It supports many classification algorithms, including SVMs, Naive Bayes, logistic regression (MaxEnt) and decision trees.

This package implement a wrapper around scikit-learn classifiers. To use this wrapper, construct a scikit-learn estimator object, then use that to construct a SklearnClassifier. E.g., to wrap a linear SVM with default settings:

Example:

>>> from sklearn.svm import LinearSVC
>>> from nltk.classify.scikitlearn import SklearnClassifier
>>> classif = SklearnClassifier(LinearSVC())

See: http://www.nltk.org/api/nltk.classify.html#module-nltk.classify.scikitlearn

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top