I’m interested in running a large language model for inference locally on my computer. I want to choose a model freely accessible online, but the specific CPU/RAM or GPU/VRAM requirements are often not provided on their Hugging Face or Azure pages, and I haven’t succeeded in finding them elsewhere online.
How can I estimate the hardware requirements of a given model when no requirement information is provided except for the context length and the number of parameters?
For example: I want to run the largest possible version of the Phi-3 model on my computer, but I can’t find any requirements information on the model pages or on the web. One of the largest versions is this one with a context length of 128_000 and 14B parameters, how can I infer the hardware requirements?