i am trying to build a local chatbot for pdf’s using RAG,Ollama,llama3 ,pgvector and streamlit. It is working fine but the time take to generate first token is almost 262.5005s or even more. I don’t have a GPU. Working on windows 11 and CPU of 16gb RAM.When i run the app and upload any pdf it takes almost 7-8minutes to respond to each query. I was thinking if there’s any way we can preprocess the pdf(1000pdf) beforehand and than inject to the vectordata base? Any suggestion would be helpful.
When i upload for instance only 3 pdf the response rate is around 6minutes. Also even is asked the same question twicce it is still taking time to respond. Is there any way i can reduce the response rate for my local chatbot?
Gauravj is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.