If I understand you well, you have 4 points in the image and you know their world coordinates (you have 4 2D-3D correspondences) to find the rotation matrix and translation vector (known as the camera pose), you can use the solvePnP function.
This function takes as inputs the 3D coordinates, 2D coordinates, the camera matrix (focal distance and center of projection) and the distortion coefficients, which you should have obtained by an intrinsic calibration process. The output is the rotation and translation vectors. The obtained rotation vector can be transformed in to a rotation matrix with the function Rodrigues.
There is also a version of solvePnP that uses a RANSAC algorithm to discard outliers.
You can find here a document with some theory on pose estimation.
Edit
Obtaining camera intrinsic parameters is known as camera calibration, you can find instructions on how to do so in this OpenCV document. There is also an older tutorial with source code here.