Relative Content

Tag Archive for pythonserverlarge-language-modelray

How to implement ray server with multiple gpus?

I’m trying to implement a multi-gpu local server with ray and vllm. I have uploaded my full code and commands to this github repository. In short, I want to serve a big model that requires 2 gpus, but it can only use 1. I have made sure that my cuda env is in good shape, and that both gpus are detectable by torch. Tahnks in advance for any help.