I am trying to get process statistics like GPU and memory utilization of ALL processes running on NVIDIA GPUs. With NVML C APIs I can discover and monitor a newly submitted GPU consuming application on GPUs. But DCGM fails to discover such new process PID on its own.
Referring this link: https://docs.nvidia.com/datacenter/dcgm/3.0/dcgm-api/dcgm-api-process-stats.html
dcgmWatchPidFields() enables process monitoring whereas dcgmGetPidInfo() gets process statistics for a particular PID. What about a new PID? How DCGM should know about a newly started GPU consuming process? Any DCGM API? or it should always rely on NVML C APIs?
As of now following NVML C APIs working properly but no luck with DCGM C APIs: nvmlInit,nvmlDeviceGetCount,nvmlDeviceGetHandleByIndex,nvmlDeviceGetProcessUtilization,nvmlShutdown
Avdhoot is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.