Efficient processing of many small torch.nn module I am looking for a more efficient implementation of the following method: