I’ve been working to leverage the front-facing TrueDepth camera on supported iPhones and IPad Pros to calculate the real-world distance between two points originally discovered as Vision points, converted to screen points.
I have accurately accessed the depth of the 2 points, which is extremely similar between devices so I am very confident the original conversion from Vision to screen coordinates and then accessing the index in the depth map is correct.
However, the results I am getting are not accurate between devices. In my case an iPhone 12 and a current gen iPad Pro. These results are also NOT real-world accurate, but my thinking is that once I get to a similar result, the overall accuracy will improve. I do have an alternative calculation that gets me very close to accurate on iPad.
let scaleFactor = Float(CGFloat(CVPixelBufferGetWidth(depthPixelBuffer)) / CGFloat(CVPixelBufferGetWidth(videoPixelBuffer)))
let cameraVideoImageSize = CGSize(width: CVPixelBufferGetWidth(videoPixelBuffer), height: CVPixelBufferGetHeight(videoPixelBuffer))
let depthImageSize = CGSize(width: CVPixelBufferGetWidth(depthPixelBuffer), height: CVPixelBufferGetHeight(depthPixelBuffer))
let fx = cameraIntrinsics.columns.0.x
let fy = cameraIntrinsics.columns.1.y
let cx = cameraIntrinsics.columns.2.x
let cy = cameraIntrinsics.columns.2.y
let uPoint1 = Float(convertedCGScreenPointDIP.x - CGFloat(cx))
let vPoint1 = Float(convertedCGScreenPointDIP.y - CGFloat(cy))
let uPoint2 = Float(convertedCGScreenPointWrist.x - CGFloat(cx))
let vPoint2 = Float(convertedCGScreenPointWrist.y - CGFloat(cy))
let xPoint1 = Float(uPoint1 * Float(distanceValue1) / Float(fx))
let yPoint1 = Float(vPoint1 * Float(distanceValue1) / Float(fy))
let xPoint2 = Float(uPoint2 * Float(distanceValue2) / Float(fx))
let yPoint2 = Float(vPoint2 * Float(distanceValue2) / Float(fy))
let newPoint1In3D = simd_float3(xPoint1, yPoint1, distanceValue1)
let newPoint2In3D = simd_float3(xPoint2, yPoint2, distanceValue2)
let newCalcDist = simd_precise_distance(newPoint1In3D, newPoint2In3D)
Here is a console print for the iPad Pro (depth differences is just due to the distance away my hand is from the device, which is being recognized by Vision since it is not stationary and moved slightly)
uPoint1 = -765.81476
vPoint1 = -9.825108
uPoint2 = -573.8772
vPoint2 = 25.938187
xPoint1 = -0.14148907
yPoint1 = -0.0018152503
xPoint2 = -0.10303553
yPoint2 = 0.0046570157
newPoint1In3D = SIMD3<Float>(-0.14148907, -0.0018152503, 0.3380777)
newPoint2In3D = SIMD3<Float>(-0.10303553, 0.0046570157, 0.32853785)
NEW DISTANCE CALCULATED = 0.04014441
Here is a console print for the iPhone 12
uPoint1 = -1854.1045
vPoint1 = -647.44977
uPoint2 = -1735.0592
vPoint2 = -620.3905
xPoint1 = -0.20297308
yPoint1 = -0.07087782
xPoint2 = -0.1781995
yPoint2 = -0.06371729
newPoint1In3D = SIMD3<Float>(-0.20297308, -0.07087782, 0.30029702)
newPoint2In3D = SIMD3<Float>(-0.1781995, -0.06371729, 0.28173378)
NEW DISTANCE CALCULATED = 0.031774163
My thinking now is that maybe I need some combination of the intrinsicMatrixReferenceDimensions and OX/OY? But in my research I have not discovered a clear answer. There are references to a ratio calculation, but I have tried to leverage a scaleFactor calculation between the depth map dimensions and the camera video resolution, but that does not seem to anything except scale my results (which is helpful for accuracy but not accuracy between devices).
I appreciate any insight!
K_C