Consider the NVIDIA NVML function:
nvmlReturn_t nvmlDeviceSetApplicationsClocks (
nvmlDevice_t device,
unsigned int memClockMHz,
unsigned int graphicsClockMHz
);
The NVML documentation says that
… [the] CUDA driver requests these clocks during context creation, which means this property defines clocks at which CUDA applications will be running…
so, if I want to create a device context with fixed clocks, I call this, create context, then call nvmlDeviceResetApplicationsClocks()
. Right? Well, not so fast. What if I have several threads running in parallel? Each of them, after all, can create CUDA contexts. So how do I prevent the race conditions such as:
- Thread 1 sets app clocks
- Thread 2 sets app clocks
- Thread 1 creates context
- Thread 1 resets app clocks
- Thread 2 creates context
- Thread 2 resets app clocks
in this situation, neither threads gets the clocks they wanted on the context. And while this might be a bit contrived, here’s a much simpler example:
- Thread 1 sets app clocks
- Thread 1 creates context
- Thread 2 creates context, naively
- Thread 1 resets app clocks
One might say “Oh, just protect the context creation calls with a mutex” – and that’s fine if you control the code running on all threads, but – typically, you don’t; it’s some library which creates threads to run its stuff.
So, is there a way to “weld” the context creation and the clock setting together in a race-free manner?