I am using Azure AI Studio and would like to deploy this model davidkim205-rhea-72b-v0.5
.
On Azure AI Studio, when I try to deploy this model, I can only select only one VM called Standard_NC96ads_A100_v4
with 96 cores, 880GB RAM and 256GB Storage. That’s what I choose (I have enough quota), I ask for 1 instance in total and after the instance is running, the endpoint created, I get an error CUDA out of memory Error
. My guess is that the model is too big to be loaded.
How is it possible to deploy such model ? It’s not as if I had any choice in terms of VMs. Why is the model suggested if no VM can handle it ? I am doing something wrong ?
Thank you for your help !
gco is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.