Question

I have the following code (based on the samples here), but it is not working:

[...]
def my_analyzer(s):
    return s.split()
my_vectorizer = CountVectorizer(analyzer=my_analyzer)
X_train = my_vectorizer.fit_transform(traindata)

ch2 = SelectKBest(chi2,k=1)
X_train = ch2.fit_transform(X_train,Y_train)
[...]

The following error is given when calling fit_transform:

AttributeError: 'function' object has no attribute 'analyze'

According to the documentation, CountVectorizer should be created like this: vectorizer = CountVectorizer(tokenizer=my_tokenizer). However, if I do that, I get the following error: "got an unexpected keyword argument 'tokenizer'".

My actual scikit-learn version is 0.10.

Was it helpful?

Solution

You're looking at the documentation for 0.11 (to be released soon), where the vectorizer has been overhauled. Check the documentation for 0.10, where there is no tokenizer argument and the analyzer should be an object implementing an analyze method:

class MyAnalyzer(object):
    @staticmethod
    def analyze(s):
        return s.split()

v = CountVectorizer(analyzer=MyAnalyzer())

http://scikit-learn.org/dev is the documentation for the upcoming release (which may change at any time), while http://scikit-learn/stable has the documentation for the current stable version.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top