I have a simple FastAPI app which downloads a SentenceTransformers model like
from sentence_transformers import SentenceTransformers
from fastapi import FastAPI, Request
model = SentenceTransformer("intfloat/multilingual-e5-large-instruct")
app = FastAPI()
#some endpoint code below
and a Dockerfile
FROM python:3.11.5-slim-bullseye
WORKDIR /opt/python
COPY README.md ./
COPY poetry.lock ./
COPY pyproject.toml ./
RUN
pip install
poetry==1.5.1
keyring==23.13.1
keyrings.google-artifactregistry-auth==1.1.1 &&
poetry config virtualenvs.create false &&
poetry install --no-root --only main --no-interaction --no-ansi
RUN poetry install --only-root --no-interaction --no-ansi
ENTRYPOINT ["poetry", "run", "python", "run_app.py"]
EXPOSE 3001
the issue is that when the app is being run, the download of the model fails.
I get either a “Cannot find huggingface.co” or if I specify a cache dir i.e adding
#Dockerfile
RUN mkdir /cache_model
RUN chmod 640 /cache_model
#run_app.py
model = SentenceTransformer("intfloat/multilingual-e5-large-instruct", cache_folder="/cache_model)
then I get a
No such file or directory: '../../blobs/f92397ec9462da6e5e34fa22a43cf234093481e8' -> '/root/cache_model/models--intfloat--multilingual-e5-large-instruct/snapshots/baa7be480a7de1539afce709c8f13f833a510e0a/config.json'
What DOES work is to download the model in the Dockerfile
RUN poetry run python -c 'from sentence_transformers import SentenceTransformer; SentenceTransformer("...")'
then the run_app
works fine.
I want to avoid doing that since the model is more than 2GB, thus getting a very big image.
Note, that the SentenceTransformers library is build op top of Transformers, and it is the Transformers library which throws the error.