Have the below set up to parse text to individual transactions:
transactions = rx.findall(r"(([A-Z][sS]*?$.*$[,d]+)", text)
For the below sample text, I’d expect to return 6 transactions (one for each ticker represented–denoted initially by a “([A-Z])” pattern.
text string
SP Alphabet Inc. - Class A Common
Stock (GOOGL) [ST]S (partial) 02/01/2024 03/01/2024 $1,001 - $15,000
F S: New
D: Part of my spouse's retirement portfolio.
SP Amazon.com, Inc. - Common Stock
(AMZN) [ST]P 02/12/2024 03/01/2024 $15,001 -
$50,000
F S: New
D: Part of my spouse's retirement portfolio.
SP Koninklijke Philips N.V. NY Registry
Shares (PHG) [ST]P 02/12/2024 03/01/2024 $1,001 - $15,000
F S: New
D: Part of my spouse's retirement portfolio.
SP Pfizer, Inc. Common Stock (PFE) [ST] P 02/12/2024 03/01/2024 $1,001 - $15,000
F S: New
D: Part of my spouse's retirement portfolio.
SP QUALCOMM Incorporated -
Common Stock (QCOM) [ST]P 02/12/2024 03/01/2024 $1,001 - $15,000
F S: New
D: Part of my spouse's retirement portfolio.
SP Unilever PLC Common Stock (UL)
[ST]S 02/12/2024 03/01/2024 $1,001 - $15,000
F S: New
D: Part of my spouse's retirement portfolio.
unformatted text
“SP Alphabet Inc. – Class A CommonnStock (GOOGL) [ST]S (partial) 02/01/2024 03/01/2024 $1,001 – $15,000nFx00x00x00x00x00 Sx00x00x00x00x00: NewnDx00x00x00x00x00x00x00x00x00x00: Part of my spouse’s retirement portfolio.nSP Amazon.com, Inc. – Common Stockn(AMZN) [ST]P 02/12/2024 03/01/2024 $15,001 -n$50,000nFx00x00x00x00x00 Sx00x00x00x00x00: NewnDx00x00x00x00x00x00x00x00x00x00: Part of my spouse’s retirement portfolio.nSP Koninklijke Philips N.V. NY RegistrynShares (PHG) [ST]P 02/12/2024 03/01/2024 $1,001 – $15,000nFx00x00x00x00x00 Sx00x00x00x00x00: NewnDx00x00x00x00x00x00x00x00x00x00: Part of my spouse’s retirement portfolio.nSP Pfizer, Inc. Common Stock (PFE) [ST] P 02/12/2024 03/01/2024 $1,001 – $15,000nFx00x00x00x00x00 Sx00x00x00x00x00: NewnDx00x00x00x00x00x00x00x00x00x00: Part of my spouse’s retirement portfolio.nSP QUALCOMM Incorporated -nCommon Stock (QCOM) [ST]P 02/12/2024 03/01/2024 $1,001 – $15,000nFx00x00x00x00x00 Sx00x00x00x00x00: NewnDx00x00x00x00x00x00x00x00x00x00: Part of my spouse’s retirement portfolio.nSP Unilever PLC Common Stock (UL)n[ST]S 02/12/2024 03/01/2024 $1,001 – $15,000nFx00x00x00x00x00 Sx00x00x00x00x00: NewnDx00x00x00x00x00x00x00x00x00x00: Part of my spouse’s retirement portfolio.n”
This pattern:
transactions = rx.findall(r"(([A-Z][sS]*?$.*$[,d]+)", text)
Only returns 5 transacations–it returns the AMZN and PHG transactions together as one record.
Why is it getting tripped up for security (PHG) here?