Concept behind SIFT descriptor

https://stackoverflow.com/questions/23086673

opencv
sift

04-07-2023
|

Pergunta

I have read some literature about SIFT and watched some videos also. I understood most of the concepts behind SIFT but one thing which confuses me is about SIFT descriptors.

In SIFT:

we find a keypoint
we take 16 x 16 pixels around the keypoint.
Divide the 16 x 16 blocks into 16 number of 4 x 4 blocks
Calculate a 8 bin histogram for each 4 x 4 block
Therefore, we get 4 x 4 x 8 = 128 dimension SIFT descriptor for this keypoint.

My confusion:

Lets say, my image have 50 keypoints.
The SIFT descriptor i receive for this image (i.e. Mat descriptor) has 128 columns and 1 row.....why???
I got 128 columns and 1 row for a single keypoint then if i am getting 50 keypoints then shouldn't it be a 50 rows and 128 colmuns matrix?

Solução

The opencv's 2.4.8 source says you should get n by 128 descriptor matrix, where n is the number of keypoints. You can see calcDescriptors() creates descriptor for every keypoint by refrencing descriptors rows.

static void calcDescriptors(const vector<Mat>& gpyr, const vector<KeyPoint>& keypoints,
                            Mat& descriptors, int nOctaveLayers, int firstOctave )
{
    int d = SIFT_DESCR_WIDTH, n = SIFT_DESCR_HIST_BINS;

    for( size_t i = 0; i < keypoints.size(); i++ )
    {
        // [...]
        // some unrelevant code 

        calcSIFTDescriptor(img, ptf, angle, size*0.5f, d, n, descriptors.ptr<float>((int)i));
    }
}

Licenciado em: CC-BY-SA com atribuição

Não afiliado a StackOverflow