I am working on generating a point cloud from an image, and I have the following data:
1. An image (height x width)
2. Intrinsic camera parameters (camera matrix).
3. A depth map of the image (height x width)
The depth map does not provide the real-world z-coordinate of each pixel. Instead, it represents the Euclidean distance from the camera to the 3D point corresponding to each pixel. That depth map i got from the DepthAnythingV2
I understand that the typical approach involves transforming 2D pixel coordinates (u,v) into 3D world coordinates (X,Y,Z) using the camera’s intrinsic parameters and depth information. However, since my depth map gives Euclidean distance, I am unsure how to correctly compute the 3D points.
Could someone guide me through the process or provide a formula or algorithm to achieve this transformation given my depth map?
The DepthAnything repository has a method generating point cloud. There is the code snippet:
h, w = depth_map.shape
x, y = np.meshgrid(np.arange(w), np.arange(h))
real_x = (x - camera_matrix[0][2]) / camera_matrix[0, 0]
real_y = (y - self.camera_matrix[1][2]) / camera_matrix[1, 1]
point_cloud = np.stack(
(np.multiply(real_x, depth_map), np.multiply(real_y, depth_map), depth_map)
)
I think that is incorrect, the result coordinates is definitely not in the camera’s coordinate system, at least because z coordinate is not just a depth_map