I m trying to simulate out of sample prediction of a binary classifier using libsvm in matlab. My target variable (ie my label) is binary (-1 +1). Therefore, in my test set there are series for which i don t know the label. I created a new label for these observations (this label is 747). I found that in my predicted_label_test vector (see code below), this 747 label is included. So it means the prediction I get is influenced by the labels of the data included the test set, which is what I m supposed to predict? - The mistake may be in the way I use Libsvm read and write functions but i can t find it - many thanks!!
%%%%%%%%%% GET DATA FROM THE CSV FILE AND CONVERT THEM TO LIBSVM
addpath('C:\libsvm1\matlab'); %indicate position of the CSV file
ALLDATA = csvread('DATACSV.csv'); % read a csv file
labels = ALLDATA(:, 1); % labels are included in the first column of data
labels_sparse = sparse (labels); %? needed
features = ALLDATA(:, 4:end); % features start at 4th column
features_sparse = sparse(features); % features must be in a sparse matrix
libsvmwrite('TTT.train', labels_sparse, features_sparse); % write the file to libsvm format
[label_vector, predictors_matrix] = libsvmread('C:\libsvm1\matlab\TTT.train'); % read the file that was recorded in Libsvm format
%%%%% DEFINE VECTOR AND MATRIX SIZE
label_vector_train = label_vector (1:143,:);
predictors_matrix_train = predictors_matrix (1:143,:);
label_vector_test = label_vector (144:193,:);
predictors_matrix_test = predictors_matrix (144:193,:);
%PREDICTION
param = ['-q -c 2 -g 3'];
bestModel = svmtrain(label_vector_test, predictors_matrix_test, param);
[predicted_label_test, accuracy, prob_values] = svmpredict(label_vector_test, predictors_matrix_test, bestModel);