what does this doc2vec based ML predict?
-
31-10-2019 - |
Pergunta
I'm trying to understand what does this ML program - which based on doc2vec - predict:
import logging, gensim
from gensim.models.doc2vec import TaggedDocument
from gensim.models import Doc2Vec
import re
import os
import random
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
import numpy as np
model = Doc2Vec.load('reviews_model.d2v') # Already trained
sent = []
answer = []
docvec = []
for fname in ['yelp', 'amazon_cells','imdb' ]:
with open ('sentiment labelled sentences/%s_labelled.txt'% fname, encoding = ('UTF-8')) as f:
for i , line in enumerate(f):
line_split = line.strip().split('\t')
sent.append(line_split[0])
words = extract_word(line_split[0])
answer.append(int(line_split[1]))
docvec.append(model.infer_vector(words, steps=10))
print (str(docvec) + 'time')
combined = list(zip(sent, docvec, answer))
random.shuffle(combined)
sent , docvec, answer= zip(*combined)
clf = KNeighborsClassifier(n_neighbors=9)
score = cross_val_score(clf, docvec, answer, cv =5)
print (str(np.mean(score)) + str(np.std(score)) )
The output be something like:
0.7903333333333334
So what does it actually mean it's 79% correct? correct of predicting what exactly?
P.S: the documents learned are positive and negative reviews.
Nenhuma solução correta
Licenciado em: CC-BY-SA com atribuição
Não afiliado a datascience.stackexchange