I have a streamlit based question/answering project, which has following structure.
files/loader.py
files/pdf.py
files/database.py
files/query.py
app.py
Modules called by in order such as:
app.py(streamlit chatbot functionality built inside of this file)>>>pdf.py(one upload file streamlit functions to upload pdf and mp3 files and extend them inside of list)>>> loader.py(has code of to process every file kind and return data to pdf.py module)
pdf.py>> database.py(to save procecesed data in a chromadb)
app.py>> query.py(Retrieve data from chromadb and has function to answer queries and return answer by app.py)
app.py sample is:
import different python modules
from files import pdf
other funcitions to process queries:
class single_ques:
def __init__(self):
#some code
class multiple_ques:
def __init--(self):
#some code
def main():
pdf.main() # Here i called pdf module main function, which i want to be called only once
if button==single_ques:
single_ques()
if button==multiple_ques:
multiple_ques()
if __name__=='__main__;
main()
pdf.py sample is:
class doc:
def __init__(self,path)
self.document = self.load()
if self.path.endswith(('.txt','.docx','.pdf')):
self.chunks = self.chunking()
def load(self):
if self.path.endswith(.txt):
#some code to process it from module in loder.py
# same way for other file types
def chunk:
#chunking function
def main:
names=[]
uploaded_files=st.file_uploader("Choose data files", accept_multiple_files=True)
if uploaded_files:
for up in uploaded_files:
names.append(up.name)
# with open(up.name,mode='wb') as w:
# w.write(up.getvalue())
if names:
d={}
for up in names:
if up not in d:
RAG = doc(up)
documents = doc.load()
d[up]=True
uploaded_files=None
if __name__=='__main__;
main()
So when instead of uploading files , i provide hardcoded files addresses and run pdf.py files separately,means i dont call pdf.py module in app.py, ebverything work as expacted.
But i want to upload different files every time , if i run app.py, so it should create new database.
Problem is whenever i click some button or enter queries on alreday running chat bot, files processing repeated, and database got replaced. it keeps on process already uploaded files on each operation, unnecessary. i want pdf.py modules should be called only once, not for every queries file processing function be called.
Please help