Question

I wonder whether it would be possible to extract only hands from a video with matlab. In the video hands perform some gesture. Because first frames are only background I tried in this way:

readerObj = VideoReader('VideoWithHands.mp4');
nFrames = readerObj.NumberOfFrames;
fr = get(readerObj, 'FrameRate');
writerObj = VideoWriter('Hands.mp4', 'MPEG-4');
set(writerObj, 'FrameRate', fr);
open(writerObj);
bg = read(readerObj, 1);   %background
for k = 1 : nFrames
      frame = read(readerObj, k);
      hands = imabsdiff(frame,bg);
      writeVideo(writerObj,hands);
end
close(writerObj);

But I realized that colors of the hands are not "real" and they are transparent. Is there a better way to extract them from video keeping colors and opacity level exploiting the first frames (background)?

EDIT: Well, I have found a good setting for vision.ForegroundDetector object, now hands are white logical regions but when I try to visualize them with:

videoSource = vision.VideoFileReader('VideoWithHands.mp4', 'VideoOutputDataType', 'uint8');

detector = vision.ForegroundDetector('NumTrainingFrames', 46, 'InitialVariance', 4000, 'MinimumBackgroundRatio', 0.2);

videoplayer = vision.VideoPlayer();
hands = uint8(zeros(720,1280,3));
while ~isDone(videoSource)

    frame = step(videoSource);
    fgMask = step(detector, frame);

    [m,n] = find(fgMask);
    a = [m n];
    if isempty(a)==true

        hands(:,:,:) = uint8(zeros(720,1280,3));
    else


        hands(m,n,1) = frame(m,n,1);
        hands(m,n,2) = frame(m,n,2);
        hands(m,n,3) = frame(m,n,3);

    end



    step(videoplayer, hands)



end

release(videoplayer)
release(videoSource)

or put them into a videofile with:

eaderObj = VideoReader('Video 9.mp4');
nFrames = readerObj.NumberOfFrames;
fr = get(readerObj, 'FrameRate');



writerObj = VideoWriter('hands.mp4', 'MPEG-4');

set(writerObj, 'FrameRate', fr);

detector = vision.ForegroundDetector('NumTrainingFrames', 46, 'InitialVariance', 4000, 'MinimumBackgroundRatio', 0.2);
open(writerObj);

bg = read(readerObj, 1);


frame = uint8(zeros(size(bg)));

for k = 1 : nFrames


frame = read(readerObj, k);

   fgMask =  step(detector, frame);


[m,n] = find(fgMask);

hands = uint8(zeros(720,1280));

if isempty([m n]) == true

    hands(:,:) = uint8(zeros(720,1280));

else

    hands(m,n) = frame(m,n);

end

 writeVideo(writerObj,mani);





end

close(writerObj);

...my PC crashes. Some suggestion?

Was it helpful?

Solution

So you're trying to cancel out the background, making it black, right? The easiest way to do this should be to filter it, you can do that by comparing your difference data to a threshold value and then using the result as indices to set a custom background.

filtered = imabsdiff(frame,bg);
bgindex = find( filtered < 10 );
frame(bgindex) = custombackground(bgindex);

where custombackground is whatever image file you want to put into the background. If you want it to be just black or white, use 0 or 255 instead of custombackground(bgindex). Note that the numbers depend on your video data's format and could be inaccurate (except 0, this one should always be right). If too much gets filtered out, lower the 10 above, if too much remains unfiltered, increase the 10.

At the end, you write your altered frame back into the video, so it just replaces the hands variable in your code.

Also, depending on your format, you might have to do the comparison across RGB values. This is slightly more complicated as it involves checking 3 values at the same time and doing some magic with the indices. This is the RGB version (works with anything containing 3 color bands):

filtered = imabsdiff(frame,bg); % differences at each pixel in each color band
totalfiltered = sum(filtered,3); % sums up the differences
                                 % in each color band (RGB)
bgindex = find( totalfiltered < 10 ); % extracts indices of pixels
                                      % with color close to bg
allind = sub2ind( [numel(totalfiltered),3] , repmat(bgindex,1,3) , ...
                  repmat(1:3,numel(bgindex),1) ); % index magic

frame(allind) = custombackground(allind); % copy custom background into frame

EDIT :

Here's a detailed explanation of the index magic.

Let's assume a 50x50 image. Say the pixel at row 2, column 5 is found to be background, then bgindex will contain the number 202 (linear index corresponding to [2,5] = (5-1)*50+2 ). What we need is a set of 3 indices corresponding to the matrix coordinates [2,5,1], [2,5,2] and [2,5,3]. That way, we can change all 3 color bands corresponding to that pixel. To make calculations easier, this approach actually assumes linear indexing for the image and thus converts it to a 2500x1 image. Then it expands the 3 color bands, creating a 2500x3 matrix. We now construct the indices [202,1], [202,2] and [202,3] instead.

To do that, we first construct a matrix of indices by repeating our values. repmat does this for us, it creates the matrices [202 202 202] and [1 2 3]. If there were more pixels in bgindex, the first matrix would contain more rows, each repeating the linear pixel coordinates 3 times. The second matrix would contain additional [1 2 3] rows. The first argument to sub2ind is the size of the matrix, in this case, 2500x3, so we calculate the number of pixels with numel applied to the sum vector (which collapses the image's 3 bands into 1 value and thus has 1 value per pixel) and add a static 3 in the second dimension.

sub2ind now takes each element from the first matrix as a row index, each corresponding element from the second matrix as a column index and converts them to linear indices into a matrix of the size we determined earlier. In our example, this results in the indices [202 2702 5202]. sub2ind preserves the shape of the inputs, so if we had 10 background pixels, this result would have the size 10x3. But since linear indexing doesn't care about the shape of the index matrix, it just takes all of those values.

To confirm this is correct, let's revert the values in the example. The original image data would have the size 50x50x3. For an NxMxP matrix, a linear index to the subscript [n m p] can be calculated as ind = (p-1)*M*N + (m-1)*N + n. Using our values, we get the following:

[2 5 1] => 202
[2 5 2] => 2702
[2 5 3] => 5202

ind2sub confirms this.

OTHER TIPS

Yes, there is a better way. The computer vision system toolbox includes a vision.ForegroundDetector object that does what you need. It implements the Gaussian Mixture Model algorithm for background subtraction.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top