Anything wrong with P3P, for example as implemented in OpenCV? Or am I misunderstanding the question?
There are, in fact, fast and low-storage algorithms to solve this problem. One became a fairly successful commercial product in 1999, when a fast PC's CPU was clocked at somewhat less than 0.7 GHz, and RAM cost about 10Mbit/USD. It's name was Canoma.
A very similar system was described in Paul Debevec's PhD thesis at about the same time, see here for details