Pergunta

I'm trying to understand what does this ML program - which based on doc2vec - predict:

import logging, gensim 
from gensim.models.doc2vec import TaggedDocument
from gensim.models import Doc2Vec 
import re
import os 
import random
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import cross_val_score
import numpy as np


model = Doc2Vec.load('reviews_model.d2v') # Already trained 



sent = []
answer = []
docvec = []

for fname in ['yelp', 'amazon_cells','imdb' ]:
    with open ('sentiment labelled sentences/%s_labelled.txt'% fname, encoding = ('UTF-8')) as f:
        for i , line in enumerate(f):
            line_split = line.strip().split('\t')
            sent.append(line_split[0])
            words = extract_word(line_split[0])
            answer.append(int(line_split[1]))
            docvec.append(model.infer_vector(words, steps=10))
            print (str(docvec) +  'time')

combined = list(zip(sent, docvec, answer))
random.shuffle(combined)
sent , docvec, answer= zip(*combined)



 clf = KNeighborsClassifier(n_neighbors=9)
 score = cross_val_score(clf, docvec, answer, cv =5)

 print (str(np.mean(score)) + str(np.std(score)) )

The output be something like:

0.7903333333333334

So what does it actually mean it's 79% correct? correct of predicting what exactly?

P.S: the documents learned are positive and negative reviews.

Nenhuma solução correta

Licenciado em: CC-BY-SA com atribuição
scroll top