I am trying to increase effective batch size due to hardware limitations. In order to do this, I am doing N forward passes for several mini-batches, accumulating loss. However, it is known that BatchNorm has running stats that tracks means and standard deviation. For small physical mini-batch sizes, i.e. 1, there is no way to even reproduce training for larger batch sizes. How to deal with this problem?
To my mind comes the following: “update BatchNorm stats manually”, “disable tracking batch statistics”, “decrease momentum”, and “use ghost batch normalization”. However, manual update is costly, disabling batch statistics tracking just disables batch normalization benefits, decreasing momentum does not solve the problem when physical batch size is 1, and ghost batch normalization does the opposite to the problem I am trying to solve.