Sample string input: “This is a :key: and this is :anotherKey: but this: is normal text.”
I’m attempting this Powershell, but was hoping to accomplish it all in the Regex expression and use the Matches
method to return an array that I could loop through.
I can’t seem to figure out a Regex pattern that will return the following matches for the above string:
This is a
:key:
and this is
:anotherkey:
but this: is normal text.
Below are the expressions I’ve tried and a description of their output
(:S+:(?!S)|[^:]+)
- The stand-alone colon is not matched
(:S+:(?!S)|[^:]+|:)
- The stand-alone colon is matched, but all on it’s own
(:S+:(?!S)|[^:]+(:|$))
- Each colon is matched on it’s own (this one is way off)
- [More attempts that only get worse]
2
The simplest regex would be :[^:]+:
, which means match a colon followed by one or more not colons followed by a colon. However, that will allow spaces to be be included in the colon wrapped strings. If the colon wrapped string must not contain any whitespace, then perhaps use :[^:s]+:
, which means match a colon followed by one or more not colons or whitespace followed by a colon. In either case, you do not need the negative lookahead.
If you want to get back a list of parts of the string, with the parts split by the colon wrapped parts, then you may need to use a split function in addition to regex. Unfortunately you didn’t indicate which language you are calling your regex from, so I will use Python, which I am most familiar with.
>>> import re
>>> s = "This is a :key: and this is :anotherKey: but this: is normal text."
>>> re.split(r'(:[^:]+:)', s)
['This is a ', ':key:', ' and this is ', ':anotherKey:', ' but this: is normal text.']
As you can see, the re.split
function returned a list of the parts split on the matches to the colon wrapped strings. And because I wrapped the regex in parentheses, the resulting group matches are included in the output. You may need to consult the documentation for whichever language you are using for any slight differences in behavior.
1
I was able to approach the problem differently, thanks to the response from @Waylan.
I’m now using a far simpler expression to only match the colon-wrapped segments, and then splitting on those matches as a delimiter.
$inputString = "This is a :key: and this is :anotherkey: but this: is normal text."
$pattern = '(:[^:s]+:)'
$result = [regex]::Split($inputString, $pattern)
$result
Output
This is a
:key:
and this is
:anotherkey:
but this: is normal text
1