Inference Api ( serverless ) Endpoint on HuggingFace
It seems that the Serversless API of the inference system on HuggingFace has encountered an issue with loading large models, stating that the model is too large. To address this, I attempted to quantize the model using the bitsandbytes package and then uploaded it. The inference endpoint is now visible; however, when I attempt to interact with it, I receive the following error message: “No package metadata was found for bitsandbytes.”