For scikit learn classifiers the data is represented in the form of a matrix of shape [n_samples, n_features]
, as you rightly say. There is no need for a sample index (if ever you need one for other purposes, don't put it in the matrix you intend to pass to scikit learn).
Taking your example, one sample would be
samples = [[0.162, 0.168, 0.250, 0.114, 0.106, 0.044, 0.035, 0.059, 0.200, 0.149]]
This is a python list of samples. It contains one sample which is a list of your features. However, if you have a lot of numerical data it is essential that you keep it in memory contiguously. This is done using numpy.ndarray
s. Converting your one sample matrix to an array is as easy as
import numpy as np
sample_array = np.array(samples)
Let's move on to several samples. I will copy the same one several times, but of course they would be different. Again this really should be an array, so
samples = np.array(
[[0.162, 0.168, 0.250, 0.114, 0.106, 0.044, 0.035, 0.059, 0.200, 0.149],
[0.162, 0.168, 0.250, 0.114, 0.106, 0.044, 0.035, 0.059, 0.200, 0.149],
[0.162, 0.168, 0.250, 0.114, 0.106, 0.044, 0.035, 0.059, 0.200, 0.149]])
This sample matrix can be fed into a classifier, usually it takes the role of X
.