I have the following regex to match things in the beginning of a text (optional), skip a part and then match the rest (optional again):
([A-Z]{1,3}[a-z]{1,3}.-?s?)*(s?([A-Z]{2})s)?([A-Z]{2}s)?)?(?![A-Z][a-z]s]+)([A-Z]{3}s?)*
here a link to the regex: regex101
whose oversimplified version (note the look-ahead in the middle) is:
somestuff?(?![A-Z][a-z]s]+)someotherstuff?
It works fine if I match either of the 2 or if I match none. If I match both using python, then they are actually 2 matches and not one:
regex = r"(([A-Z]{1,3}[a-z]{1,3}.-?s?)*(s?([A-Z]{2})s)?([A-Z]{2}s)?)?(?![A-Z][a-z]s]+)([A-Z]{3}s?)*"
test_str = "Test.-Ing. (XX) Foo Bar YYY DDDn"
matches = re.finditer(regex, test_str, re.MULTILINE)
for matchNum, match in enumerate(matches, start=1):
print ("Match {matchNum}: {match}".format(matchNum = matchNum, match = match.group().strip()))
This results in:
Match 1: Test.-Ing.
Match 2: YYY DDD
instead of:
Match 1: Test.-Ing. YYY DDD
What can I do to match both in the same match? I’ve tried putting () around everything, it didn’t work.
PS: I know that it also matches empty strings.