I’ve deployed an endpoint for a RAG chain in Databricks. I’ve set scale_to_zero_enabled=True
. The problem is: Sometimes, scaling up from zero works fine and sometimes it results in an error. It’s also interesting that in spite of the exception in the logs, the serving endpoint state never changes to Error
, but remains Ready (Scaling from zero)
instead.
The logs look like this:
[b2rtc] File "/opt/conda/envs/mlflow-env/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[b2rtc] raise self._exception
[b2rtc] File "/opt/conda/envs/mlflow-env/lib/python3.10/site-packages/mlflowserving/scoring_server/__init__.py", line 182, in get_model_option_or_exit
[b2rtc] self.model = self.model_future.result()
[b2rtc] File "/opt/conda/envs/mlflow-env/lib/python3.10/concurrent/futures/_base.py", line 451, in result
[b2rtc] return self.__get_result()
[b2rtc] File "/opt/conda/envs/mlflow-env/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
[b2rtc] raise self._exception
[b2rtc] File "/opt/conda/envs/mlflow-env/lib/python3.10/site-packages/mlflowserving/scoring_server/__init__.py", line 182, in get_model_option_or_exit
[b2rtc] self.model = self.model_future.result()
[b2rtc] File "/opt/conda/envs/mlflow-env/lib/python3.10/concurrent/futures/_base.py", line 451, in result
...
...
It goes on and on, but it’s the same six lines over and over again.
I’ve tried to google it, but I haven’t gotten any closer to the cause of this behaviour. Any ideas?