I want to match all cases where a hyphenated string (which could be made up of one or multiple hyphenated segments) ends in a consonant that is not the letter m.
In other words, it needs to match strings such as: ‘crack-l’, ‘crac-ken’, ‘cr-ca-cr-cr’ etc. but not ‘crack’ (not hyphenated), ‘br-oom’ (ends in m), br -oo (last segment ends in vowel) or cr-ca-cr-ca (last segment ends in vowel).
It is mostly successful except for cases where there is more than one hyphen, then it will return part of the string such as ‘cr-ca-cr’ instead of the whole string which should be ‘cr-ca-cr-ca’.
Here is the code I have tried with example data:
import re
dummy_data = """
broom
br-oom
br-oo
crack
crack-l
crac-ken
crack-ed
cr-ca-cr-ca
cr-ca-cr-cr
cr-ca-cr-cr-cr
"""
pattern = r'b(?:w+-)+w*[bcdfghjklnpqrstvwxyz](?<!m)b'
final_consonant_hyphenated = [
m.group(0)
for m in re.finditer(pattern, dummy_data, flags=re.IGNORECASE)
]
print(final_consonant_hyphenated)`
expected output: [‘crack-l’, ‘crac-ken’, ‘crack-ed’, ‘cr-ca-cr-cr’, ‘cr-ca-cr-cr-cr’]
current output: [‘crack-l’, ‘crac-ken’, ‘crack-ed’, ‘cr-ca-cr’, ‘cr-ca-cr-cr’, ‘cr-ca-cr-cr-cr’] (bold string is an incorrect match as it’s part of the ‘cr-ca-cr-ca’ string where the final segment ends in a vowel not a consonant).