I am a novice programmer and new to this platform, so I apologise upfront for any ignorance.
Forewarning: The problem I seek to overcome may not be strictly programming(python) based, but instead maybe an error in my logic. Therefore, you will likely have to understand computer vision.
I am wrighting a program in Python to create a motion capture system.
As part of the system, I am tring to use extrinsic calibration to localise 4 PS3Eye cameras. The cameras are positioned in a crossformation with each camera being positions on a leg of the cross focused towards the center. Camera 0 is on ground level, cameras 1,3 are raised up, while camera 2 is raised up by double the amount.
2
|
3----|----1
|
0
I am using a single point(IRED marker) as a world point which the cameras can pick up. I move this around so it is captured by the cameras in multiple diffrent positions(diffrent frame for each position).
The core functionality of the extrinsic calibration part of my program is:
1)Determine extrinsic points for the marker in each frame in each camera:
extrinsic_points = self.process_extrinsic_images()
2)Determine the initital pose estimate using pairwise calibration,between cameras (0,1),(1,2),(2,3):
camera_poses = self.pairwise_calibration(extrinsic_points) self.visualize_camera_poses(camera_poses, "Initial Camera Poses")
3)Bundle adjustment to optermise the initital camera pose.
optimized_camera_poses, optimized_focal_lengths = self.bundle_adjustment(extrinsic_points, camera_poses)
Functions for 1:
Providing the function is not benifiecial. It reads each image, finds the marker, and adds the coordinates(pixels) of the marker to a dictionary in the format {camera:{frame:[coordinates]}:
extrinstic points: {0: {1: [289, 232], 2: [296, 227], 3: [304, 222], 4: [306, 246], 5: [294, 264], 6: [285, 259], 7: [281, 258], 8: [264, 248], 9: [214, 242], 10: [153, 244], 11: [84, 204], 12: [56, 169], 15: [93, 212], 16: [254, 245], 17: [414, 255], 19: [572, 217], 20: [630, 188], 23: [581, 206], 28: [417, 271], 29: [386, 275], 30: [361, 302], 31: [347, 311], 32: [344, 312], 33: [334, 307], 34: [322, 302], 35: [296, 307], 36: [264, 305], 37: [240, 297], 38: [220, 297], 39: [203, 306], 40: [193, 300], 41: [182, 290], 42: [179, 273], 43: [192, 273], 47: [247, 150], 48: [173, 168], 49: [101, 219], 50: [65, 288], 51: [86, 346], 52: [133, 387], 53: [163, 415], 54: [186, 422], 57: [197, 282], 58: [160, 234], 59: [120, 192], 60: [90, 157], 61: [64, 152], 62: [73, 184], 63: [97, 149], 64: [26, 44], 65: [42, 48], 70: [63, 156], 71: [172, 172], 72: [271, 183], 75: [501, 209], 76: [540, 199], 82: [265, 429], 84: [264, 419], 94: [275, 332], 95: [264, 348], 96: [253, 384], 97: [243, 423], 98: [238, 461], 99: [246, 459], 100: [251, 401], 101: [258, 346]}, 1: {1: [397, 161], 2: [378, 158], 3: [352, 162], 4: [352, 202], 5: [341, 233], 6: [354, 221], 7: [355, 220], 8: [332, 213], 9: [328, 207], 13: [2, 211], 14: [37, 255], 15: [53, 297], 16: [7, 335]....}
Each frame is used as a method of synchronisation, to ensure the same world point is being compared
Functions for 2:
def pairwise_calibration(self, extrinsic_points):
camera_pairs = [(0, 1), (1,2), (2,3)]
camera_poses = {0: {"R": np.eye(3), "t": np.zeros((3, 1))}} # Setting the first camera as the origin
for cam1, cam2 in camera_pairs:
print(f"nCalibrating camera pair ({cam1}, {cam2})")
common_frames = set(extrinsic_points[cam1].keys()) & set(extrinsic_points[cam2].keys())
points1 = np.array([extrinsic_points[cam1][f] for f in common_frames], dtype=np.float32)
points2 = np.array([extrinsic_points[cam2][f] for f in common_frames], dtype=np.float32)
for p1, p2 in zip(points1, points2):
if points1.shape[0] < 8 or points2.shape[0] < 8:
print(f"Not enough points for fundamental matrix estimation for camera pair ({cam1}, {cam2})")
continue
F, mask = cv2.findFundamentalMat(points1, points2, cv2.FM_RANSAC, 1, 0.999)
# Number of inliers
num_inliers = np.sum(mask)
print(f"Number of inliers between {cam1} and {cam2}: {num_inliers}")
K1 = self.camera_manager.get_camera_params(cam1)["intrinsic_matrix"]
K2 = self.camera_manager.get_camera_params(cam2)["intrinsic_matrix"]
E = essential_from_fundamental(F, K1, K2)
rotations, translations = decompose_essential_matrix(E)
best_R, best_t = self.choose_best_pose(rotations, translations, K1, K2, points1, points2)
# Update camera poses
if cam1 in camera_poses:
R_cam1 = camera_poses[cam1]["R"]
t_cam1 = camera_poses[cam1]["t"]
R = R_cam1 @ best_R
t = t_cam1 + R_cam1 @ best_t
camera_poses[cam2] = {"R": R, "t": t}
return camera_poses
def essential_from_fundamental(F, K1, K2):
"""
Compute the essential matrix from the fundamental matrix and camera intrinsic matrices.
Args:
F (numpy.ndarray): 3x3 fundamental matrix
K1 (numpy.ndarray): 3x3 intrinsic matrix of the first camera
K2 (numpy.ndarray): 3x3 intrinsic matrix of the second camera
Returns:
numpy.ndarray: 3x3 essential matrix
"""
E = K2.T @ F @ K1
# Enforce the essential matrix properties
U, S, Vt = np.linalg.svd(E)
S = np.diag([1, 1, 0]) # Force the last singular value to be zero
E = U @ S @ Vt
return E
def decompose_essential_matrix(E: np.ndarray) -> Tuple[List[np.ndarray], List[np.ndarray]]:
"""
Calculate the possible rotations and translations from the essential matrix (E).
"""
assert E.shape == (3, 3), "Essential matrix must be 3x3."
# Decompose the essential matrix to obtain rotation and translation
R1, R2, t = cv.decomposeEssentialMat(E)
# Possible rotation matrices and corresponding translations
rotations_matrices = [R1, R2]
translations = [t, -t]
# Ensure that each rotation matrix is a valid rotation matrix
valid_rotations = []
valid_translations = []
for R, t in zip(rotations_matrices, translations):
if np.linalg.det(R) < 0:
R = -R
valid_rotations.append(R)
valid_translations.append(t)
# Create the four possible combinations
rotations = [valid_rotations[0], valid_rotations[0], valid_rotations[1], valid_rotations[1]]
translations = [valid_translations[0], valid_translations[1], valid_translations[0], valid_translations[1]]
return rotations, translations
def visualize_camera_poses(self, camera_poses, title="Camera Poses"):
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot(111, projection='3d')
colors = ['r', 'g', 'b', 'c', 'm', 'y']
for i, pose in camera_poses.items():
R, t = pose['R'], pose['t']
# Plot camera center
ax.scatter(t[0], t[1], t[2], c=colors[i % len(colors)], s=100, label=f'Camera {i}')
# Plot camera axes
for j, c in enumerate(['r', 'g', 'b']):
ax.quiver(t[0], t[1], t[2], R[0, j], R[1, j], R[2, j], length=0.1, color=c)
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Z')
ax.legend()
plt.title(title)
# Set equal aspect ratio
max_range = np.array([ax.get_xlim(), ax.get_ylim(), ax.get_zlim()]).T.flatten()
max_range = max_range.max() - max_range.min()
ax.set_box_aspect((1, 1, 1))
ax.set_xlim(ax.get_xlim()[0] - 0.1*max_range, ax.get_xlim()[1] + 0.1*max_range)
ax.set_ylim(ax.get_ylim()[0] - 0.1*max_range, ax.get_ylim()[1] + 0.1*max_range)
ax.set_zlim(ax.get_zlim()[0] - 0.1*max_range, ax.get_zlim()[1] + 0.1*max_range)
plt.show()
Intrinsic calibration parameters for camera 0,1:
{"camera_0": {
"camera_matrix": [
[540.2672451298703, 0.0, 306.3805722102162],
[0.0, 542.571289478974, 206.0797186095177],
[0.0, 0.0,1.0]],
"dist_coeff": [
[-0.04311827265117092,
-0.30777282290190633,
-0.0024539081793200123,
0.0006065084084472465,
0.8167139977569211]],
"reprojection_error": 0.08752886650470533},
"camera_1": {
"camera_matrix": [
[533.0270992740851,0.0,325.772321445512],
[0.0,535.0781711631702,242.94136395164773],
[0.0,0.0,1.0]],
"dist_coeff": [
[-0.10449205993886189,
0.06912993741771066,
0.007694298596172447,
0.009411962830745068,
0.017734278660417623]],
"reprojection_error": 0.09269225274569631}
Functions for 3:
I will omit this for now. While i have problems with the results of the bundle adjustment, i would like to solve the initital pose being incorrect first.
The Problem
Essentailly my camera pose is incorrect; I get camera 1 and 3 being inverted, with camera 1 being on the left and 3 being on the right, this is incorrect. Camera 2 seems to not be positionsed correctly either.The closest I’ve gotten is with the current data set, and gives me these results:
initial camera pose results
At first glance looks right but camera 1 is in the negative x direction, but accourding to the RH coordinate system it should be in the positive direction. my debug file shows some of the critical parameters which are used calculated along the way.
Here is a debug file my script generates:
Showing the points for camera pair 0 and 1.
I would like to give the dataset for other camera-pairs but fear this will just complicate matters.
Calibrating camera pair (0, 1)
Points:
Points1: [289. 232.] Points2: [397. 161.]
Points1: [296. 227.] Points2: [378. 158.]
Points1: [304. 222.] Points2: [352. 162.]
Points1: [306. 246.] Points2: [352. 202.]
Points1: [294. 264.] Points2: [341. 233.]
Points1: [285. 259.] Points2: [354. 221.]
Points1: [281. 258.] Points2: [355. 220.]
Points1: [264. 248.] Points2: [332. 213.]
Points1: [214. 242.] Points2: [328. 207.]
Points1: [ 93. 212.] Points2: [ 53. 297.]
Points1: [254. 245.] Points2: [ 7. 335.]
Points1: [386. 275.] Points2: [573. 192.]
Points1: [361. 302.] Points2: [572. 243.]
Points1: [240. 297.] Points2: [612. 190.]
Points1: [220. 297.] Points2: [562. 202.]
Points1: [203. 306.] Points2: [522. 224.]
Points1: [193. 300.] Points2: [481. 227.]
Points1: [182. 290.] Points2: [428. 229.]
Points1: [179. 273.] Points2: [368. 229.]
Points1: [192. 273.] Points2: [340. 239.]
Points1: [247. 150.] Points2: [303. 83.]
Points1: [173. 168.] Points2: [241. 160.]
Points1: [101. 219.] Points2: [227. 227.]
Points1: [ 65. 288.] Points2: [233. 284.]
Points1: [ 86. 346.] Points2: [256. 324.]
Points1: [133. 387.] Points2: [290. 357.]
Points1: [163. 415.] Points2: [334. 381.]
Points1: [186. 422.] Points2: [422. 384.]
Points1: [197. 282.] Points2: [388. 233.]
Points1: [160. 234.] Points2: [328. 201.]
Points1: [120. 192.] Points2: [279. 175.]
Points1: [ 90. 157.] Points2: [236. 165.]
Points1: [ 64. 152.] Points2: [202. 182.]
Points1: [ 73. 184.] Points2: [176. 221.]
Points1: [ 97. 149.] Points2: [132. 211.]
Points1: [26. 44.] Points2: [ 81. 174.]
Points1: [42. 48.] Points2: [ 51. 194.]
Points1: [ 63. 156.] Points2: [205. 185.]
Points1: [172. 172.] Points2: [213. 180.]
Points1: [271. 183.] Points2: [215. 174.]
Points1: [501. 209.] Points2: [204. 149.]
Points1: [540. 199.] Points2: [247. 44.]
Points1: [275. 332.] Points2: [539. 280.]
Points1: [264. 348.] Points2: [533. 302.]
Points1: [253. 384.] Points2: [535. 351.]
Points1: [243. 423.] Points2: [540. 400.]
Points1: [238. 461.] Points2: [539. 445.]
Points1: [246. 459.] Points2: [547. 446.]
Points1: [251. 401.] Points2: [551. 371.]
Fundamental matrix for camera pair (0, 1):
[[ 3.31777245e-06 5.42194225e-06 -3.54875406e-03]
[ 8.34200208e-06 -2.81037491e-06 -4.90466518e-03]
[-2.83045668e-03 3.67146393e-03 1.00000000e+00]]
Essential matrix for camera pair (0, 1):
[[ 0.29910721 0.49506775 -0.23639575]
[ 0.77336053 -0.27385882 -0.50156684]
[ 0.03784752 0.82249351 -0.08353075]]
Decomposition of essential matrix
Rotation 1:
[[ 0.01168723 0.42510971 0.90506637]
[ 0.04436732 0.90401632 -0.42518942]
[-0.99894692 0.04512465 -0.00829553]]
Translation 1:
[[ 0.78074314]
[-0.27449144]
[-0.56133288]]
Determinant of Rotation 1: 1.0
Rotation 2:
[[ 0.01168723 0.42510971 0.90506637]
[ 0.04436732 0.90401632 -0.42518942]
[-0.99894692 0.04512465 -0.00829553]]
Translation 2:
[[-0.78074314]
[ 0.27449144]
[ 0.56133288]]
Determinant of Rotation 2: 1.0
Rotation 3:
[[ 0.85913497 -0.33387706 0.38783141]
[-0.35052849 -0.93609163 -0.02936388]
[ 0.37284967 -0.11071842 -0.92126248]]
Translation 3:
[[ 0.78074314]
[-0.27449144]
[-0.56133288]]
Determinant of Rotation 3: 1.0
Rotation 4:
[[ 0.85913497 -0.33387706 0.38783141]
[-0.35052849 -0.93609163 -0.02936388]
[ 0.37284967 -0.11071842 -0.92126248]]
Translation 4:
[[-0.78074314]
[ 0.27449144]
[ 0.56133288]]
Determinant of Rotation 4: 1.0
Best Pose:
R[[ 0.01168723 0.42510971 0.90506637]
[ 0.04436732 0.90401632 -0.42518942]
[-0.99894692 0.04512465 -0.00829553]]
t[[-0.78074314]
[ 0.27449144]
[ 0.56133288]]
Camera 1 pose updated:
R:[[ 0.01168723 0.42510971 0.90506637]
[ 0.04436732 0.90401632 -0.42518942]
[-0.99894692 0.04512465 -0.00829553]]
t:[[-0.78074314]
[ 0.27449144]
[ 0.56133288]]
Calibrating camera pair (1, 2)
.
.
.
Can anyone with knowledge on the subject perhaps see where the errors are creeping in? Is my input data simply incorrect, or is there an error in the pipeline?
nicholas du toit is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.