I need to put a space between sentences like:
"This is one.This is two"
should be:
"This is one. This is two"
In Python, I used the following regular expression:
text = re.sub(r'.([A-Z])', r'. 1', text)
It worked pretty well until I have links in the text with dots inside, like:
"This is a text with link https://www.website.com/Article.Name.pdf"
turned out to be:
"This is a text with link https://www.website.com/Article. Name.pd"
The case is important, it can’t be changed. That is I need the regex to recognize it’s a link and ignore it. Not sure how it can be done.
4
Only match words that are surrounded by whitespace.
text = re.sub(r'(sw+).(w+s)', r'1 . 2', text)