Given a string, is it possible to write a single POSIX-Extended regular expression to match all shortest substrings that start with a
and end with b
, but do not contain cb
, cd
or fg
? I want to use such a regex with gensub
, match
or split
functions (in gawk
). For example,
- string
"0a3cbsbtacc12bbb"
: matching substring is"acc12b"
; - string
"a4cdddbbb5"
: no matching substring; - string
"1taa/b///fafgfgcb2abb"
: matching substrings are"a/b"
and"ab"
.
The “mark the negative matches” approach (using characters that do not occur in the whole text) is not suitable in my situation: I am only interested in a single regular expression. So, if I use, for example, split("1taa/b///fafgfgcb2abb", arr, regex, seps)
, I expect seps[1] == "a/b"
and seps[2] == "ab"
.
If it is not possible to write a suitable regex (or if any suitable regex is too inefficient performance-wise), I would be interested to know why.