Frage

I am trying to run an example provided in stack overflow which is here.

I have copied the code here again:

from sklearn.feature_extraction.text import TfidfVectorizer
text_files = ['file1.txt', 'file2.txt']
documents = [open(f) for f in text_files]
tfidf = TfidfVectorizer().fit_transform(documents)
# no need to normalize, since Vectorizer will return normalized tf-idf
pairwise_similarity = tfidf * tfidf.T

The only thing I added is this line:

text_files = ['file1.txt', 'file2.txt']

when I run the code I get this error:

File "C:\Python33\lib\site-packages\sklearn\feature_extraction\text.py", line 195, in <lambda>
return lambda x: strip_accents(x.lower())
AttributeError: '_io.TextIOWrapper' object has no attribute 'lower'

the file1.txt and file2.txt are input text files. Am I using a wrong format for text_files? what is the reason for this error and how can I fix that? I really appreciate any help on this.

War es hilfreich?

Lösung

open(f) is a _io.TextIOWrapper object, that's why it fails.

Try changing

documents = [open(f) for f in text_files]

to

documents = [open(f).read() for f in text_files]
Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top