Relative Content

Tag Archive for cudalarge-language-model

How to load the llama-2 8 bit quantized model onto a Single GPU

I have built a document Question-answering system using LLAMA-2 8 Bit quantized model, recently I migrated the Project from old system to a new system which has 2 X Nvidia RTX A6000 48GB each.
When I run the model, it gets splitted into 2 parts and loads onto separate GPU, which increases the response time.

Thiết kế website giá rẻ

Danh mục

Relative Content

Tag Archive for cudalarge-language-model

How to load the llama-2 8 bit quantized model onto a Single GPU