Response time of local LLM on VM not improving after increasing number of CPUs
I am currently using llama-cpp-python to run Mistral-7B-Instruct-v0.3-GGUF on an Azure Virtual Machine.
I am currently using llama-cpp-python to run Mistral-7B-Instruct-v0.3-GGUF on an Azure Virtual Machine.