Position Estimation From Multiple Images

https://stackoverflow.com/questions/22270309

11-06-2023
|

Question

First off, I'd like to state that I'm very new to this field and apologize if the question is a little too repetitive. I've looked around but in vain. I'm working on reading Hartley and Zisserman's book but it's taking me a while.

My problem is That I've got 3 Video Sources of an area and I need to find the camera position at each frame of the video. I do not have any information about the cameras that took the videos (i.e no Intrinsics).

Looking for a solution I came across SfM and tried existing software that exists namely Bundler & Vsfm and they both seem to have worked quite well. However I've got a couple of questions about it.

1) Is SfM really required in my case? Since SfM does a sparse reconstruction and the common points between images are also an output, is it fully necessary? or are there more suitable methods that can do it without since positions are all I really need? Or are there less complex methods I may use instead?

2) From what I've read, I need to calibrate the camera and find it's Intrinsics and Extrinsics. How can I do this without knowing either? I've looked at the 5-pt problem and others but most of them require you to know the intrinsic properties of the camera which I don't have and I cannot use a pattern such as a chessboard to calibrate them since they come from a source outside my control.

Thanks for your time!

La solution

Based on my experience, the short answer is:

1) You cannot reliably estimate the 3D pose of the cameras independently from the 3D of the scene. Moreover, since your cameras are moving independently, I think SfM is the right way to approach your problem.

2) You need to estimate the cameras' intrinsics in order to estimate useful (i.e. Euclidian) poses and scene reconstruction. If you cannot use the standard calibration procedure, with chessboard and co, you can have a look at the autocalibration techniques (see also chapter 19 in Hartley's & Zisserman's book). This calibration procedure is done independently for each camera and only require several image samples at different positions, which seems appropriate in your case.

Autres conseils

You can actually accomplish your task in a massive bundle adjacent procedure up to a scaling parameter. But is is a very complicated thing even if you aren't novice. You dont need 3d reconstruction, just an essential matrix that can be obtained from 2d projections and decomposed i to rotation and translation but this does require Iintrinsic Paramus. To get them you have to have at least three frames. Finally, Drop Zimmerman book it will drive you crazy. Read Simon Princes "Computer Vision"instead.

Licencié sous: CC-BY-SA avec attribution

Non affilié à StackOverflow