I’m trying (perhaps foolishly) to make a substitution in a file with a one liner regex. sed failed me so I turned to perl. And am oh so close. I can for example match “setting = value # comment” and extract it in three parts “setting = “, “value” and ” # comment”.
Lovely. For example:
perl -lne 'print "-1-$1-2-$2-3-$3-E-" if /^s*(settings*=s*)([^s#]+)(s*#s*.*)?/'
and I’ll see a result like -1-setting = -2-value-3- # comment-E-
Just my quick way of spying on those capture groups with clear delimiters that don’t have any special characters in them and are relatively lucid.
Now the problem emerges that value might be wrapped in quotes, wither single or double, the simple '
or ‘"
characater. Unbiased, same in the opening and closing form.
It’s tempting to use a backreference to detect matching quotes:
perl -lne 'print "-1-$1-2-$2-3-$3-4-$4-5-$5-E-" if /^s*(settings*=s*)(N{APOSTROPHE}|N{QUOTATION MARK})?([^s#]+)(2)?(s*#s*.*)?/'
The problem here is that group 3 captures the closing quotation and group 4 is empty.
And here we run into a problem. We cannot add the backgreference to the negative assertion for group 3, perl doesn’t support that (for the simple reason that the capture group is not a character (or not guranteed to be one, even though ours is).
So this raises a conundrum. Is there a way to catch these optional matching quotes in a perl regex?
My thinking and reading and experimentation thus far led me to consider the possibility of:
-
A negative assertion on strigs not characters. We have
[abc]
which matches a, b or c, and we have[^abc]
matching any charcater that is not a, b or c, and we have(string1|string2|string3)
which matches any of string1, string2, or string3. But is there a negative form of that assertion, any strig that is not string1, string2, or string3? -
Creating a character group, a capture group of one character that can be used in a backreference in a negtaive chatracter set? Fat chance.
-
Referencing one charcater ina backgreference, like the first character. Again fat chance.
-
Referring to a backreference as a set of characters not a string … for inclusion in the negative set. Yet again fat chance.
-
Using a look around assertion. Alas, we have look behind and look ahead but not look at. Hmmm.
Out of options.
Maybe this is in the too hard basked for a regex? It’s a trick ask, match me a value with optional quotes, single or double, extracting those if present.