I have the following code (based on the samples here), but it is not working:

[...]
def my_analyzer(s):
    return s.split()
my_vectorizer = CountVectorizer(analyzer=my_analyzer)
X_train = my_vectorizer.fit_transform(traindata)

ch2 = SelectKBest(chi2,k=1)
X_train = ch2.fit_transform(X_train,Y_train)
[...]

The following error is given when calling fit_transform:

AttributeError: 'function' object has no attribute 'analyze'

According to the documentation, CountVectorizer should be created like this: vectorizer = CountVectorizer(tokenizer=my_tokenizer). However, if I do that, I get the following error: "got an unexpected keyword argument 'tokenizer'".

My actual scikit-learn version is 0.10.

有帮助吗?

解决方案

You're looking at the documentation for 0.11 (to be released soon), where the vectorizer has been overhauled. Check the documentation for 0.10, where there is no tokenizer argument and the analyzer should be an object implementing an analyze method:

class MyAnalyzer(object):
    @staticmethod
    def analyze(s):
        return s.split()

v = CountVectorizer(analyzer=MyAnalyzer())

http://scikit-learn.org/dev is the documentation for the upcoming release (which may change at any time), while http://scikit-learn/stable has the documentation for the current stable version.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top