I am trying to upload a PDF using st.file_uploader()
from streamlit
and then parse it using LlamaParse()
currently running on localhost
.
Issue is I am getting []
in output which is probably happening because of directly using returned uploaded_file
from the below code.
I am not sure if a returned uploaded object
can directly be used to feed llama_parse
as this PDF works when I directly upload the PDF from my local machine directory into LlamaParse()
instead of uploaded_file = st.file_uploader()
Is there something that I need to do for using uploaded_file
before feeding it into LlamaParse()
?
-
Setting up
streamlit
:import streamlit as st import pandas as pd import os import nest_asyncio nest_asyncio.apply() os.environ["LLMA_CLOUD_API_KEY"] = "my_key" key_input = "my_key" from llama_parse import LlamaParse ############### Setting Configuration ############### st.set_page_config(page_title="Pdf Parsing", layout='wide', initial_sidebar_state="expanded") # title st.markdown("<h1 style='text-align: center; font-size: 70px;color: black;letter-spacing: 4px;'>PDF Parsing</h1>", unsafe_allow_html=True)
-
Uploading file & parsing through Llama:
st.write("checkpoint1") with st.container(border=True): st.write("checkpoint2") uploaded_file = st.file_uploader('Choose your .pdf file to upload', type="pdf") if uploaded_file is not None: st.write(uploaded_file.name) st.success("File is uploaded") if uploaded_file: if st.button('Parse Uploaded pdf file (Powered by AI)'): doc_parsed = LlamaParse(result_type="markdown",api_key=key_input ).load_data(uploaded_file) st.write('checkpoint3') st.write(doc_parsed)
UPDATE:
Is this happening Due to Not Using st.session_state
?
Could it be a possibility that when I click the button then the whole app refreshes/runs again and the file gets lost and nothing goes into llama ?
1
The issues is not because of st.file_uploader
but because of llamaParse()
Adding file_name in , extra_info makes llamaParse
to run with uploaded file within streamlit.
below code works:
doc_parsed = LlamaParse(result_type="markdown",api_key=key_input,
parsing_instruction=parsingInstruction
).load_data(st.session_state.uploaded_file.getvalue(),
extra_info={"file_name": "_"})
More info: Streamlit discussion post