from langchain.chains.question_answering import load_qa_chain
chain=load_qa_chain(
llm=llm,
#reduce_llm=llm,
chain_type="map_reduce",
question_prompt=question_prompt,
combine_prompt=combine_prompt)
query = "What is .......?"
#Running function without in-built asynchonous method:
res_a = chain.invoke({"input_documents":documents,"question":query}, return_only_outputs=False, verbose=True)
#Running function with in-built asynchonous method:
res_a = await chain.ainvoke({"input_documents":documents,"question":query}, return_only_outputs=True)
There is no difference in inference time when running this on AWS sagemaker notebook with GPU. Is there any reason for this?
New contributor
J Li is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.