Applying HPA based on GPU vRAM usage in GKE
I am using GCP as the cloud platform and have a standard cluster running on it. It has 2 GPU Nodepools with Nvidia T4 GPU in it. Now I want to setup HPA based on CPU and GPU’s RAM. I have tried many options but none of them seems to work for me.
e.g. I am trying to setup Nvidia DCGM exporter using the official documentation present in the github page. When I run that, the pods keeps on getting restarted without completing even a single time. Also helm doesn’t seem to work properly for me here. I get so many options but every option is either 3 years old or those steps doesn’t really work for me.