I’m trying to split a string using a regex pattern and including everything in between (could also be nothing) until the next occurrence of the pattern. However, the “everything in between” is matched with the next occurrence. So for example I have the following string:
content = """
/path/to/source.cpp:8:18: error: 'FOO' was not declared in this scope
8 | std::cout << FOO << std::endl;
| ^~~
/path/to/source.cpp:9:18: error: 'BAR' was not declared in this scope; did you mean 'EBADR'?
9 | std::cout << BAR << std::endl;
| ^~~
| EBADR
"""
and so far I came up with the following regex pattern containing the lookahead assertion to match the next time a path:line:col
match is found in the string:
re_comp = re.compile((
r"^((?P<path>.*?):(?P<line>[0-9]*):(?P<column>[0-9]*): )?"
r"(?P<type>error|warning): "
r".+?[rn]+(?=^.*:[0-9]*:[0-9]*|Z)"
),
re.MULTILINE | re.DOTALL)
The problem is that with the current regex pattern, the message is assigned to the path group of the next finding, when I try to iterate over the findings:
for m in re_comp.finditer(content):
print(m.group(0)
will print
/path/to/source.cpp:8:18: error: 'FOO' was not declared in this scope
for the first iteration and
8 | std::cout << FOO << std::endl;
| ^~~
/path/to/source.cpp:9:18: error: 'BAR' was not declared in this scope; did you mean 'EBADR'?
for the second iteration.
Can you help me fix my regex pattern? Thanks in advance