How to configure Probabilistic Occupancy Map people detector

Question 1

In the associated publication, the authors mention they use the camera calibration to generate the rectangles for a human silhouette in every position in the grid. It seems the code that accomplishes this is not included in the source files, in that case you will have to write it yourself.

In the calibration data for their datasets, you can see that they make use of two homographies per camera, the head plane homography and the ground plane homography. You can use this to quickly obtain the required rectangles.

The head plane homography is a 3x3 matrix that describes a mapping from one plane to another. In this case it describes the mapping from 2D room coordinates (at head level) to 2D image coordinates. You can determine this homography for your own camera with the function findHomography in opencv. All you need to do is measure the coordinates of four points on the ground in the room, and stand an upright pole on those markings. The pole should be as long as the average person you want to track. You can now write a small program that allows you to click on the top of the pole in each camera view. You now have four world points (the coordinates measured in the room) and four image points per camera (the points you clicked). With findHomography you can determine the homography. Do the same for the markings on the ground without the pole, and you have the two homographies per camera.

You can now use the homographies to project the 8 corner points of a rectangle standing on any position in the room onto their image coordinates for each camera. Take the bounding box of all 8 points and you have the rectangle for that room location and that camera.

The authors of the method mentioned using a human silhouette, this indicates that their approach may be more accurate than using a cuboid. However, there is no such thing as the silhouette of a moving person, so the solution with the cuboid is likely to be perfectly workable.

Question 2

I've been recently reading this article and digging the code so what I understood from the article+code is pretty much what you guys have discussed.

To sum up, for every camera in the system, you have to create rectangles which later will be used by POM as a comparison with the real silhouettes obtained from the background substraction algorithm (assuming you've already obtained those) on every possible grid position. Since every camera may not see every grid position in the scene, you put "notvisible" tag in those cases. As it's already mentioned, you need to use the calibration files to map the sizes of the 175 cm height and 50 cm width according to the perspective. i.e. closer rectangles are supposed to be bigger than the further ones.

RECTANGLE 0 414 150 0 159 119 means; Camera 0 hypothetically sees is a black rectangle on the grid 414 with the size of P1(x,y) = (150,0) - P2(x,y) = (159,119). These measures are obtained by reprojecting 175cm - 50cm from head plane(2D camera plane) to the ground plane(3D plane).

UPDATE: I tried what I posted here and yeah, it works.