I am trying to provide entire document , not some parts nor small context to openai as input and ask questions regarding the document.This is not usual QA.This is more like text generation based on given document.not summarization. openai would return a long answer, not two lines.
I have tried to use RetrivalQA from langchain with openai.
here is my code:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
from langchain.chains import RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm, retriever=vectorstore.as_retriever(), chain_type_kwargs={"prompt": prompt}
)
The problem is when i ask a question, not entire document is provided to openai, only some part of it is provided. so the answer is not quite right. I believe this is how we could see what part is given to openai:
fac = db.as_retriever(search_kwargs={'k': 2})
docs=fac.invoke('some Ques')
When i print docs, i could see that only some parts of the document is printed.
also my text document contains pipe separated values.
please help me to upload full documnet to openai. I make sure only i send limited tokens.