Why RAG is slower than LLM?
I used RAG with LLAMA3 for AI bot. I find RAG with chromadb is much slower than call LLM itself.
Following the test result, with just one simple web page about 1000 words, it takes more than 2 seconds for retrieving:
I used RAG with LLAMA3 for AI bot. I find RAG with chromadb is much slower than call LLM itself.
Following the test result, with just one simple web page about 1000 words, it takes more than 2 seconds for retrieving: