Question

I'm trying to implement an intuitive pointing mechanism, where the user would use his hands to just point to an object on-screen. I have most of it ready, except I'm not sure how to write the final part.

Basically, I have a list of calibration points like the following:

typdef struct {
    Point2D pointOnScreen, // gives an x/y pixel screen position
    Point3D pointingFinger, // gives the position of the user's pointing finger, in space
    Point3D usersEyes // gives the position of the user's eyes, in space
} CalibrationPoint;

std::vector<CalibrationPoint> calibrationPoints;

Now, the idea is that I could use these calibrationPoints to write a function that would look something like this:

Point2D whereIsTheUserPointing(Point3D pointingFinger, Point3D usersEyes) {
     return the corresponding point on screen; // this would need to be calibrated
                                               // somehow using the calibrationPoints
}

But I have trouble figuring out the math of how to do this. The basic idea is that when you're pointing, you're aligning your finger so that your eyes-finger-object you're pointing at are aligned in a straight line. However, since I don't have the position of the screen in 3D, I thought I could instead get the calibration points and deduce where the user is pointing from that. How would I go about writing the whereIsTheUserPointing() function and calibrating the system?

Was it helpful?

Solution

I'm idealizing, but maybe this will be a start:

  • I assume that you can obtain universal 3D coordinates for the eyes and the tip of the finger.

  • Three points in 3D space span a plane. If we could determine three points on your screen, we could locate the screen plane in 3D space. To be safe, let's locate all four corners, so we don't just know the plane, but also its boundaries.

  • Two straight lines in 3D which meet determine a unique point in 3D.

Thus, in order to find the four corners of the screen, produce four pairs of straight lines, two lines through each corner. This could be done by asking the user to point at the four corners, move, and then point at the four corners again.

OTHER TIPS

Let the co-ordinates of the eyes be (a,b,c) and the coordinates of the end of the finger be (x,y,z). You could easily visualise the joining line in 3D. All you need to do now is to extend the line till it intersects the "plane" of your screen.

Parametric coordinates of the line in your case will be:

(a + T(x-a), b + T(y-b), c + T(z-c))

with:

eye at (a,b,c) and finger at (x,y,z).

With T = 0, you get the coordinate of the eye. With T=1 you get the coordinate of the end of the finger. You can "extend" the line with T>1.

Assuming you have the z-coordinate of the plane of the screen, you could easily get the value of T with the following formula:

T = (Z_VALUE_OF_PLANE-c)/(z-c)

Substitute this value of T to get the other two coordinates (x,y).

The final co-ordinates on the 2D plane will be:

X = a + ((Z_VALUE_OF_PLANE-c)/(z-c))*(x-a)
Y = b + ((Z_VALUE_OF_PLANE-c)/(z-c))*(y-b)
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top