I’m working on predicting camera motion trajectories using RGB images or other intermediate representations like optical flow. I’m predicting transformations of relative motion.My targets are transformation matrixes(or any equvalent representation) between consecutive frames (image1 -> image2 -> image3).
If I understand correctly, with DistributedSampler in DDP, and assuming I have n=4 GPUs, the data distribution to each GPU would be as follows:
GPU0: data 0, 4, 8, 12, ...
GPU1: data 1, 5, 9, 13, ...
GPU2: data 2, 6, 10, 14, ...
GPU3: data 3, 7, 11, 15, ...
This default sampling messes up the alignment of my images and targets since the targets depend on transformations between consecutive frames.
I am considering two potential solutions:
- Introduce a ‘skip’ logic in my transformation calculations to adapt to the data distribution.
- Reorder my dataset. say I have 133 images and 4 gpu, batch size 32. i should now put image
0-31 to 0, 4, 8, 16 ... 4*31
32-63 to 1, 5, 9, 17 ... 4*31+1
64-95 to 2, 6, 10, 14 ... 4*31+2
96-127 to 3, 7, 11, 15 ... 4*31+3
Repeat this pattern for each Big subsequent batch(batch * gpu_count).
I think I can also use the drop_last to handle data that is not complete.
Are these solutions good enough? Is there a best practice for handling this kind of data distribution in DDP training? Specifically, are there standard methods to ensure that data relevant to sequential transformations remains coherent when distributed across multiple GPUs?
Hongdou Liu is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.