Question

I'm on people tracking in computer vision. I have observations (blob as an output of blob detection after background subtraction) and I want to infer the objects that have produced these observations.

I have troubled with some Kalman filter code. And it's quite clear to me, but my problem is multi-object tracking: my problem is that sometimes the observations are incomplete/noisy. Let me explain better - In a test with clear observations, I have 1 blob for each person. Kalman filter can help me in smoothing the noisy path of the person into a smoothed curve. But, this is not my problem; The problem is that sometimes blob detection is not perfect and I have 2 blobs for 1 person (for example if the person I want to track is dressing a t-shirt of the same color of the background) or sometimes I have 1 blob for 2 persons (for example if the 2 persons are hugging themselves or are too near each other).

I have searched some theory and I have found a lot of papers that are solving the problem of object tracking with particle filter. So I studied Bayesian filter, Monte Carlo method, importance sampling and it is a little bit clear (I don't have math knowledge on probability to understand everything but the idea is clear).

Anyway, I don't still understand how particle filter can help me in detecting cases where 2 blobs correspond to 1 object or 1 blob correspond to 2 objects.

Can someone help in understanding this problem?

Was it helpful?

Solution 4

Kalman Filter are a background subtractor approach in this case. It can not handle data association and only gaussian noise.

In the end I have re-implemented the histogram based particle filter activated by object detections.

If anyone is interested in that just ask as a comment!

OTHER TIPS

Well, first of all, OpenCV VideoSurveillance project is a good way to start dealing with your questions.

It does Data Association of detection responses,as you said you do.Also, it handles False Positives (2 blobs to 1 object as you said) by a simplistic mechanism (initialization is based on a frame threshold and deletion as well). The other problem, as you mentioned, about 1 blob corresponding to 2 objects is usually called occlusion (The term "collision" in VideoSurveillance project is used but it is obsolete nowadays). VideoSurveillance is using a Particle filter implementation based on a 3D color histogram modeling to resolve this situation.

Simple explain: How can you distinguish two different targets based on their appearance (their clothings)? You can store their color histogram and use it later in future frames,right? But, how do you do the search? You can either search all possible centroids in the next frame, or use lets say 200 random points scattered around an area you believe your object is. These 200 points are the particles. How do they work? They compare the area they're focused and produce a probability the object is there. The more close they are, the bigger the possibility is. At the end, you sum up all possibilities and find the "mean" centroid.

In simple words, the clothing of each target is modeled inside the probability function, and near real-time calculations are achieved thanks to the particle filtering idea.

Finally, the kalman filter is a predictor, who helps the tracker using only motion data. It "filters" extreme movement behaviors in case the particle filter result gets crazier than it should be. VideoSurveillance includes this too. It is complementary to appearance and the tracker is more sophisticated when it uses both.

Edit: How can it be useful for multi-target tracking? Assuming we have a simple tracker with data association let's say two objects are about to "collide". The tracker works fine until objects are merged. During the "merging", tracker sees only one object. The other one is lost. After a while, they split, and the tracker detects the old object as a new one! How can we fix this? Let's start over with particle filtering and appearance modeling this time:

  • Before the merging, we have 2 objects moving toward each other. Objects are independent and the tracker can see them clearly. During this time, an appearance modeler (a mechanism which "remembers" how an object looks like) is learning how these two objects look like. Of course, as the frames go by, both objects are slightly changing their appearance. That's why, the modeler has a "learning rate" which lets him adapt his "memory" as the time goes by.

  • During the merging, this time, we set the tracker to be more patient and do not kill the second object so easily like before. The tracker let both objects to be active. The non-occluded object is successfully being tracked as before, and the other object bounding box tries to relocate it's target again. If we are lucky*, after a short time the occluded (hidden) object will re-appear (split) and the bounding box will be attracted there thanks to the particles.

*As aforementioned, the occluded target's bounding box is still modeled by the modeler. If the occluded person stays too long hidden, the modeler will forget the old object and learn what in front of the occlusion area (that is the non-occluded object) or will wander around like an orphan box (this is called drifting). VideoSurveillance does not have a mechanism for that. One simple solution could be to stop the modeler adaptation during occlusion. How? When the two bounding boxes are overlapping.

Kalman filter or particle filter can not handle data association problem (multiple tracking problem where several detections must be matched against several tracks).

What you need is Joint Probability Data Association Filter (JPDAF) which will associate each detection with a track in a soft way (one detection belongs X % to first track, Y % to second track...).

The underlying tracking algorithm can be particle filter or Kalman filter.

Take a look at the JPDAF implementation in C# - implemented for Kalman and particle filter. At this time the working samples are for Kalman and particle filter, JPDAF will come later - but it is implemented and ready.

Accord.NET Extensions library: https://github.com/dajuric/accord-net-extensions

i think the keyword is "fragmentation". an example paper

http://people.csail.mit.edu/cielbleu/pubs/BoseEtalCVPR07Multiclass.pdf

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top