I am working with a locally hosted LLM (Ollama with Llama 3.1) to process queries based on a large dataset stored in a PostgreSQL database (~1 million rows). I am fetching data in chunks from the database and passing it to the model for processing.
However, I encounter an error: “too many requests” when querying the model with this setup.
The model runs locally, and I need to optimize the data flow to prevent overloading it while generating responses efficiently.
How can I handle such large datasets with this setup?
Are there best practices or techniques for optimizing the interaction between the database and the model to avoid errors like this?
Any suggestions or insights would be appreciated.