The PyTorch docs only have documentation on how to tweak its memory management for CUDA allocations. It uses the caching allocator for it’s memory management which maintains the allocated memory blocks and reuses them instead of freeing them immediately.
On CPUs however, memory gets freed immediately after it has been used. How do I get PyTorch to reproduce the caching allocator behavior on CPUs?