multiprocessing.Queue slowing down after a while, sharing large amounts of data in a PyTorch distributed training setup
I’m experiencing lots of issues with sharing large amounts of data between several processes in a distributed training setup using pytorch on my local machine. When starting the program, everything works fine but after some time the actions involved in transfering the data via multiprocessing.Queues slow down drasticly (.put(..)
and .get(..)
).
mp.Queue slowing down after a while, sharing large amounts of data in a PyTorch distributed training setup
I’m experiencing lots of issues with sharing large amounts of data between several processes in a distributed training setup using pytorch on my local machine. When starting the program, everything works fine but after some time the actions involved in transfering the data via multiprocessing.Queues slow down drasticly (.put(..)
and .get(..)
.