Solve PnP is a function, which given the 3D model of an object (let's say, a chessboard) and a view of this object in the real world, will give you an approximate position and orientation of the camera relative to the object.
The 3D model and view of the object are sets of corresponding 3D and 2D points. The function will work better when you know the object model (positions of key points of the object) and positions of those key points on the image from camera.
I'm not an expert in 3D image reconstruction, but it would seem that using the information from each following image containing observed scene and its key points, and finding those on images from views you should be able to iteratively improve your model and also improve the approximation of the camera's position.
As you have a disparity map, which shows the distance of key points of the scene viewed from two different points, it could be indeed better to use triangulation, if you know the exact points of view. Or their good approximations (and then you would need to improve those approximations with subsequent new views).