I’m trying to make a substitution in a file with a one liner regex. sed failed me so I turned to perl. I can for example match “setting = value # comment” and extract it in three parts “setting = “, “value” and ” # comment”.
For example:
perl -lne 'print "-1-$1-2-$2-3-$3-E-" if /^s*(settings*=s*)([^s#]+)(s*#s*.*)?/'
and I’ll see a result like -1-setting = -2-value-3- # comment-E-
Now the problem emerges that value might be wrapped in quotes, whether single or double, the simple '
or ‘"
character. Unbiased, same in the opening and closing form.
It’s tempting to use a backreference to detect matching quotes:
perl -lne 'print "-1-$1-2-$2-3-$3-4-$4-5-$5-E-" if /^s*(settings*=s*)(N{APOSTROPHE}|N{QUOTATION MARK})?([^s#]+)(2)?(s*#s*.*)?/'
The problem here is that group 3 captures the closing quotation and group 4 is empty.
And here we run into a problem. We cannot add the backreference to the negative assertion for group 3, perl doesn’t support that for the simple reason that the capture group is not a character (or not guranteed to be one, even though ours is).
So this raises a conundrum. Is there a way to catch these optional matching quotes in a perl regex?
My thinking and reading and experimentation thus far led me to consider the possibility of:
-
A negative assertion on strings not characters. We have
[abc]
which matches a, b or c, and we have[^abc]
matching any charcater that is not a, b or c, and we have(string1|string2|string3)
which matches any of string1, string2, or string3. But is there a negative form of that assertion, any strig that is not string1, string2, or string3? -
Creating a character group, a capture group of one character that can be used in a backreference in a negative character set?
-
Referencing one character in a backreference, like the first character.
-
Referring to a backreference as a set of characters not a string … for inclusion in the negative set.
-
Using a look around assertion. Alas, we have look behind and look ahead but not look at.
Maybe this is too hard for a regex?