Relative Content

Tag Archive for pythonhuggingface-transformerstorchllama

CUDA out of memory while using Llama3.1-8B for inference

I have written a simple Python script that uses the HuggingFace transformers library along with torch to run Llama3.1-8B-instruct purely for inference, after feeding in some long-ish bits of text (about 10k-20k tokens). It runs fine on my laptop, which has a GPU with 12GB RAM, but can also access up to 28GB total (I guess from the main system RAM?)

CUDA out of memory while using Llama3.1-8B for inference

I have written a simple Python script that uses the HuggingFace transformers library along with torch to run Llama3.1-8B-instruct purely for inference, after feeding in some long-ish bits of text (about 10k-20k tokens). It runs fine on my laptop, which has a GPU with 12GB RAM, but can also access up to 28GB total (I guess from the main system RAM?)

CUDA out of memory while using Llama3.1-8B for inference

I have written a simple Python script that uses the HuggingFace transformers library along with torch to run Llama3.1-8B-instruct purely for inference, after feeding in some long-ish bits of text (about 10k-20k tokens). It runs fine on my laptop, which has a GPU with 12GB RAM, but can also access up to 28GB total (I guess from the main system RAM?)