I have successfully trained a YOLOv8 model using the Ultralytics Python package and now aim to run inference on 100 million images stored in an S3 bucket. Currently, I have a Databricks notebook with GPU acceleration that performs inference, but I don’t know how to scale this.
From the Databricks documentation, I gathered that using Databricks Autoloader to fetch images from S3 and MLflow to manage the model could help in scaling the batch inference process.
How can I efficiently scale the batch inference process for 100 million images in Databricks?
Should I use MLflow to manage and scale the inference jobs?
The current setup is running multiple notebooks with dedicated compute, which seems inefficient.