How Can I Use Run Manager to Stream Response on RetrievalQA?
I’m working with the langchain library and transformers to build a language model application. I want to integrate a CallbackManagerForLLMRun
to stream responses in my RetrievalQA chain. Below is the code I have so far, including my custom LLAMA
class, which uses the /home/llama/LLM/llama/CACHE-Llama-3-8B-chat-merged
model from transformers.