HuggingFace pipeline doesn’t use multiple GPUs
I made a RAG app that basically answers user questions based on provided data, it works fine on GPU and a single GPU. I want to deploy it on multiple GPUs (4 T4s) but I always get CUDA out of Memory error on pipeline.