I want to extract a valid transaction from a bunch of text. Each transaction should start with a date (dd/mm/yyyy) and end before the next date (dd/mm/yyyy). A transaction may have 1 or 2 or more descriptions following the date.
Example of 1 transaction:
03/04/2024 Payments / Collections 10.08 795.04 ---> Date description cost balance
HU INSURANCE UK ---> Description
G0003406201 56171304 2024-04-17 ---> Description
G3406201 ---> Description
Input string
input_text = """
02/04/2024 Funds Transfer 56.00 1,805.12
TOP-UP TO WALLET! :
84343571729
02/04/2024 Bill Payment 1,000.00 805.12
UHJN-5520380040396554 : I-BANK
03/04/2024 Payments / Collections 10.08 795.04
HU INSURANCE UK
G0003406201 56171304 2024-04-17
G3406201
04/04/2024 FAST Payment / Receipt 12,000.00 12,795.04
INVEST
20240404CIBBSTSTBRT3273519
OTHER
04/04/2024 Bill Payment 333.00 12,462.04
GBU -09890340922 : I-BANK
30/04/2024 Interest Earned 0.18 2,385.42
"""
My code:
# Regex pattern for dates (dd/mm/yyyy)
date_pattern = r"d{2}/d{2}/d{4}"
# Find all date matches
date_matches = re.findall(date_pattern, input_text)
# Initialize an empty list to store desired output
output = []
# Extract text between dates
for i in range(len(date_matches) - 1):
start_index = input_text.find(date_matches[i]) + len(date_matches[i])
end_index = input_text.find(date_matches[i + 1])
text_between_dates = input_text[start_index:end_index].strip()
output.append([date_matches[i], text_between_dates])
# Print the desired output
for item in output:
print(item)
Output result:
['02/04/2024', '']
['02/04/2024', 'Funds Transfer 56.00 1,805.12nTOP-UP TO WALLET! :n84343571729n02/04/2024 Bill Payment 1,000.00 805.12nUHJN-5520380040396554 : I-BANK']
['03/04/2024', 'Payments / Collections 10.08 795.04nHU INSURANCE UKnG0003406201 56171304 2024-04-17nG3406201']
['04/04/2024', '']
['04/04/2024', 'FAST Payment / Receipt 12,000.00 12,795.04nINVESTn20240404CIBBSTSTBRT3273519nOTHERn04/04/2024 Bill Payment 333.00 12,462.04nGBU -09890340922 : I-BANK']
Desired outcome:
['02/04/2024', 'Funds Transfer 56.00 1,805.12', 'TOP-UP TO WALLET! :', '84343571729']
['02/04/2024', 'Bill Payment 1,000.00 805.12', 'UHJN-5520380040396554 :', 'I-BANK']
['03/04/2024', 'Payments / Collections 10.08 795.04', 'HU INSURANCE UK', 'G0003406201 56171304 2024-04-17', 'G3406201']