I am working on a task where a customer’s gaze direction is calculated to determine whether they looked at the monitor or outside of it. I drew the following to get an understanding of what needs to be done:
The picture depicts the following (measurements in mm):
- Black rectangle with 530×942 dimensions is a monitor.
- A person is standing 500 from the monitor, with the height of 1675 from his eyes to the ground.
- A blue mark located 50mm from the top center from the monitor is a camera.
- P1, P2 and P3 are points where a person looks.
- The distance from the camera to the ground is 2000
- d=537.04 is the distance from the eye to P3, calculated by the pythagorean theorem (√(5002 + 1962))
- Similarly, the distance from eye to P1 = 761.89 (from eye to P2 as well), calculated by the pythagorean theorem (√(540.432 + 537.042))
So far, I manually calculated the distances as coordinates of X,Y,Z. They are as follows:
The eye coordinates relative to the camera are:
Eye=(0,−325,−596.34)
Points on the Monitor Relative to the Camera
P1 (Top-left of the monitor):
Horizontal offset from the center: −265 mm
Vertical offset from the top center of the monitor: −50 mm
Depth: 0 mm (since it’s on the same plane as the camera)
Coordinates:
P1=(−265,−50,0)
P2 (Top-right of the monitor):
Horizontal offset from the center: 265 mm
Vertical offset from the top center of the monitor: −50 mm
Depth: 0 mm
Coordinates:
P2=(265,−50,0)
P3 (Center of the monitor):
Horizontal offset: 0 mm
Vertical offset from the camera: −521 mm
Depth: 0 mm
Coordinates:
P3=(0,−521,0)
Thus, I derived the following:
Eye to P1:
Vector=P1−Eye=(−265,−50−(−325),0−(−596.34))=(−265,275,596.34)
Eye to P2:
Vector=P2−Eye=(265,−50−(−325),0−(−596.34))=(265,275,596.34)
Eye to P3:
Vector=P3−Eye=(0,−521−(−325),0−(−596.34))=(0,−196,596.34)
Now, I would like to know if I have got the gaze directions (of a person’e eye to P1, P2 and P3 from the camera’s PoV) correctly based on the following method where it states:
Please note that although the 3D gaze (gaze_dir) is defined as a
difference between target’s and subject’s positions (target_pos3d –
person_eyes3d) each of them is expressed in different coordinate
system, i.e.gaze_dir = M * (target_pos3d - person_eyes3d)
whereM
depends on a normal direction between eyes and the camera.
Also, how do I calculate the transformation matrix M if ever need be?