I am using python to parse emails to feed into an LLM and I need to truncate these emails if the text is too long. I am using TikToken to check length and I want to strip out text one sentence at a time – with a sentence starting with anything but always ending with a period, exclamation point, question mark or new line return (nr).
No matter what I attempt, the regex keeps stripping one letter off at a time and is ignoring the ending. Can somebody please provide some insights/help?
Here is the test:
convers_text = "This is a test period. This is a test surprise? Is this still a test? I hate ending thingsn I hate old endingsr"
while "." in convers_text or "n" in convers_text:
print(convers_text)
convers_text = convers_text[: convers_text.rfind("\b[^.!?\s]+[$.!?\n\r]")]
print("Done")
Thank you!