I have invoice’s raw texts, I’m trying to create a desired json object using the that raw texts using any open source models or paid one.
I’m stuck where I have prompt but I don’t know which langchain’s chain should I use for this task.
from langchain.chains import StuffDocumentsChain, LLMChain
from langchain_core.prompts import PromptTemplate
from langchain_core.example_selectors import SemanticSimilarityExampleSelector
from langchain_community.llms import OpenAI
from langchain_community.vectorstores import Chroma
from langchain.prompts import FewShotPromptTemplate
def json_formatter(texts: str):
llm = OpenAI()
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
to_vectorize = [" ".join(example.values()) for example in few_shots]
vectorstore = Chroma.from_texts(to_vectorize, embeddings, metadatas=few_shots)
example_selector = SemanticSimilarityExampleSelector(
vectorstore=vectorstore,
k=2)
example_prompt = PromptTemplate(
input_variables=["raw_texts", "json_structure"],
template="\raw_texts: {raw_texts}njson_structure: {json_structure}",
)
few_shot_prompt = FewShotPromptTemplate(
example_selector=example_selector,
example_prompt=example_prompt,
suffix=PROMPT_SUFFIX,
input_variables=["raw_texts", "json_structure"], # These variables are used in the prefix and suffix
)
llm_chain = LLMChain(llm=llm, prompt=prompt)
chain = StuffDocumentsChain(
llm_chain=llm_chain,
document_prompt=few_shot_prompt
)
here StuffDocumentsChain
is just an example I don’t know which chain should I use for this task.
my json_format is like below
structure = """{
"invoice":{
"invoice_number":"",
"invoice_date":"",
"irn":"",
"acknowledgement":{
"ack_number":"",
"ack_date":""
},
"customer_details":{
"name":"",
"contact":"",
"address":"",
"gstin":"",
"pan":"",
"state":"",
"state_code":""
},
"order_details":{
"order_number":"",
"order_date":""
},
"delivery_agent":{
"name":""
},
"payment_terms":"",
"place_of_supply":"",
"items":[
{
"description":"",
"hsn/sac":"",
"quantity":"",
"rate_per_unit(inclusive_of_tax)":"",
"rate_per_unit(exclusive_of_tax)":"",
"amount":""
}
],
"taxes":{
"igst_rate":"",
"igst_amount":"",
"cgst_rate":"",
"cgst_amount":"",
"sgst_rate/utgst_rate":"",
"sgst_amount/utgst_amount":"",
"total_tax":""
},
"total_amount":"",
"amount_in_words":"",
"discount_in_%":"",
"discount_in_amount":"",
"supplier_details":{
"gstin":"",
"pan":"",
"bank_details":{
"branch":"",
"ifsc_code":""
}
},
"beneficiary_name":"",
"bank_name":"",
"Bank_account_number":"",
"swift_code":"",
"terms_conditions":""
}
}
and the example raw tests are like
raw_text = """
vil Original C) Duplicate C) Triplicate
TAX INVOICE
O PTOTEC H GSTIN: 29JS757SE1ZC
State: 29 - Karnataka
NO R8 SHOPE NO 02 TUBINAKERE
A.) 7892088815 &3 _ [email protected] [°] VILLAGE AND POST KOTHATHI HOBLI
MANDYA TALUK MANDYA DIST
Invoice No.: 110 Bill To:
Place of Supply: 37-Andhra Pradesh Lotus Wireless
— —— Technologies india
PO date: 27/09/2022
PO number: wr/po/22-23/0608 +©~CséP Vt. td B-7,B Block,
Industrial
Park, Autonagar
Visakhapatnam,Andra
Pradesh,INDIA Pin-
530012
Lotus Wireless Technologies india pvt.ltd B-7,B
Block, Industrial Park,Autonagar
Visakhapatnam,Andra Pradesh,INDIA Pin-530012
Contact No.: 00918912761678
State: 37-Andhra Pradesh
a
varistors 180V(MOV14D181K) :
Voltage(MAKE-BOURNS)
%7.00, % 252.00 (18%) = 1,652.00
Sub Total % 1,400.00
Invoice Amount In Words IGST@18% 252.00
One Thousand Six Hundred Fifty Two Rupees only
Total = 1,652.00
Terms And Conditions Received % 1,652.00
Thanks for doing business with us! Balance % 0.00
Account No : 1200983850357
IFSC Code : CNRB002603
BANK : CANARA BANK
"""
**if I'm doing any thing wrong or there is any other efficient way to do it let me know. Any help will be greatful**
1