How to Handle Occlusion and Fragmentation

Question 1

There is no single "good" answer to this as handling occlusion (and background substraction) are still open problems! There are several pointers that can be given that might help you along with your project.

You want to detect if a "blob" is one person or a group of people. There are several things you could do to handle this.

Use multiple cameras (it's unlikely that a group of people is detected as a single blob from all angles)
Try to detect parts of the human body. If you detect two heads on a single blob, there are multiple people. Same can be said for 3 legs, 5 shoulders, etc.

On the area of tracking a "lost" person (one walking behind another object), is to extrapolate it's position. You know that a person can only move so much in between frames. By holding this into account, you know that it's impossible for a user to be detected in the middle of your image and then suddenly disappear. After several frames of not seeing that person, you can discard the observation, as the person might have had enough time to move away.

Question 2

Tracking people in video surveillance sequences is still an open problem in the research community. However particule filters (PF) (aka sequential monte-carlo) gives good results towards occlusion and complex scene. You should read this. There is also extra links to example source code after biblio.

An advantage on using PF is the gain in computational time towards tracking by detection (only).

If you go this way, feel free to ask for better understanding about the maths behind the PF.