Image stitching usually assumes that the camera center is fixed across all photos, and uses homographies to transform the images so that they seem continuous. When the fixed camera center assumption is not strictly valid, artifacts/distortions may appear due to the 3D of the scene. If the camera center moved by a small distance compared to the relief of the scene, "seamless image blending" techniques may be sufficient to blur out the distortions.
In more extreme cases, ortho-rectification is required. Ortho-rectification (Wikipedia entry) is the task of transforming an image observed from a given perspective camera into an orthographic (Wikipedia entry) and usually vertical point of view. The orthographic property is interesting because it makes the stitching of several images much easier. The following picture from Wikipedia is particularly clear (left is an orthographic or directional projection, right is a perspective or central projection):
The task of ortho-rectification usually requires having a 3D model of the scene, in order to map appropriately intensities observed by the perspective camera to their location with respect to the orthographic camera. In the context of aerial/satellite images, Digital Elevation Models (DEM) are often used for that purpose, but generally have the serious drawback of not including man-made structures (only Earth relief). The NASA provides freely the DEM acquired by the SRTM missions (DEM link).
Another approach, if you have two images acquired at different positions, you could try to do a 3D reconstruction using one of the stereo matching technique, and then to generate the ortho-rectified image by mapping the two images as seen by a third orthographic and vertical camera.
OpenCV has several interesting function for that purpose (e.g. stereo reconstruction, image mapping functions, etc) and might be more appropriate for intensive usage. Matlab probably has interesting functions as well, and might be more appropriate for quick tests.