How to deal with HOG feature arrays to use for classification in scikit-learn?

https://stackoverflow.com/questions/22997513

01-07-2023
|

문제

I want to extract HOG features from an image so that I can put them through a classifier (AdaBoost.SAMME, in particular, using scikit-learn: multiclass Adaboost in scikit-learn ensemble).

To do that, I need to convert the image into an array-like of shape = [n_samples, n_features]

But the output of extracting HOG features from the image is a 1D array. This is the relevant part my code:

tree = io.imread('C:\\Users\\app\\Pictures\\treephoto1.jpg')
type(tree)
print tree.shape, tree.dtype

graytree = color.rgb2gray(tree)
ge = filter.canny(graytree)
fd, hog_image = hog(ge, orientations=8, pixels_per_cell=(16, 16),
                    cells_per_block=(1, 1), visualise=True)

When I check fd, it's an array like this:

enter image description here

Let's take the first 9 rows and assume that this was the complete output from the hog function, and that this image from which I got the HOG features is the first image in my training dataset.

Then given that I have a thousand training images, how will this HOG information be represented as X = [n_samples, n_features]?

Will the line below be the first row of X that I can use as input to a classifier?

[1, [0.162, 0.168, 0.250, 0.114, 0.106, 0.044, 0.035, 0.059, 0.200, 0.149]]

Will this entire row of HOG values be considered as a single feature? Or is this wrong? From what I understand, a training sample should conform to this format:

[sample_index, feature_1, feature_2, feature_3, ..., feature_x]

해결책

For scikit learn classifiers the data is represented in the form of a matrix of shape [n_samples, n_features], as you rightly say. There is no need for a sample index (if ever you need one for other purposes, don't put it in the matrix you intend to pass to scikit learn).

Taking your example, one sample would be

samples = [[0.162, 0.168, 0.250, 0.114, 0.106, 0.044, 0.035, 0.059, 0.200, 0.149]]

This is a python list of samples. It contains one sample which is a list of your features. However, if you have a lot of numerical data it is essential that you keep it in memory contiguously. This is done using numpy.ndarrays. Converting your one sample matrix to an array is as easy as

import numpy as np
sample_array = np.array(samples)

Let's move on to several samples. I will copy the same one several times, but of course they would be different. Again this really should be an array, so

samples =  np.array(
               [[0.162, 0.168, 0.250, 0.114, 0.106, 0.044, 0.035, 0.059, 0.200, 0.149],
               [0.162, 0.168, 0.250, 0.114, 0.106, 0.044, 0.035, 0.059, 0.200, 0.149],
               [0.162, 0.168, 0.250, 0.114, 0.106, 0.044, 0.035, 0.059, 0.200, 0.149]])

This sample matrix can be fed into a classifier, usually it takes the role of X.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow