“CUDA Out of Memory Error with Fine-Tuned LLaMA Model in Streamlit, Works on Colab”
I have a fine-tuned language model that I’m testing in a Streamlit application. While the model runs without issues on Google Colab even with a batch size of 1, it fails with a CUDA Out of Memory error when I try to run it on Streamlit. Here are the details:
the model i fine-tuned on a json file of textes to be able to analyse them: