I am using cufft
library for performing FFT on GPU.
I want my process to have the minimal possible memory signature on the GPU, since many processes can run in parallel, but each time all but one are paused (SIGSUSPEND
, ctrl+Z
) and are later resumed (fg
). Thus having a fixed memory signature on the device can damage performance (or even crash) other running processes.
In order to do that, I am trying to allocate as much memory as possible with unified memory (i.e., cudaMallocManaged
). From reading cufft
docs and reading online I saw I can use cufftSetWorkArea
, but this doesn’t allocate all memory as unified, and there is still some fixed memory signature. My code currently initializes cufft
as follows:
CufftErrorCheck(cufftCreate(&plan_c2c_));
size_t workspace_size;
CufftErrorCheck(cufftEstimate3d(dims_.z, dims_.y, dims_.x, CUFFT_C2C, &workspace_size));
cudaMalloc((void**)&cufft_workspace_buffer_, workspace_size);
CufftErrorCheck(cufftSetWorkArea(plan_c2c_, cufft_workspace_buffer_));
CufftErrorCheck(cufftMakePlan3d(plan_c2c_, dims_.z, dims_.y, dims_.x, CUFFT_C2C, &workspace_size));
The command cufftMakePlan3d
still leaves a memory signature of about 10MB
(which from what I checked is fixed and does not depend on dims_
), which I’d like to reduce. Using cufftEstiamte3d
and cufftSetWorkArea
indeed reduced most of the signature, but not all of it.
Is what I want even possible to achieve? If not, I’d like a concrete reference.
CUDA version – 12.2