I have multiple combinations of models – for preprocessing and inferencing along with separate a training pipeline. The models include stable diffusion models and I perform inferencing using FastAPI.
The model dir is around 22GB and my environment docker image is 30GB (without models).
I tried reducing the docker image size, using base image – multi-image concept, but that didn’t make even a 1MB difference!!
I don’t want a costlier option and would like to host on a reliable cloud. Can anyone recommend if it’s possible to host this on Severless AWS, GCP, or any other cloud service?
Here’s what I tried (Not sure if it’s correct):
- AWS Sagemaker Inference: Serverless option isn’t available for custom models. And larger systems require AWS approval.
- GCP Vertex AI: model size should be less than 15GB