It often happens that after designing my regexp (on regex101.com) I want to paste it in my program. Consider this regexp that matches numbers and string (but keep in mind this is general question!):
^("(?:[^"]|\")*"|-?[0-9]+(?:.[0-9]+)?)$
I overlined all characters that need to be escaped before pasting them into languages that use "
for strings.
Needless to say, doing this manually drives me crazy. I face this problem both at work with C++ project and at home with Java and JavaScript projects.
How can I deal with this efficiently?
10
If you feel it to be worth make your small DSL (or maybe it already exists) so you can do (java):
// ^("(?:[^"]|\")*"|-?[0-9]+(?:.[0-9]+)?)$
// @formatter:off
Pattern pattern = Patterning.start() // ^
.group()
.lookahead()
...
.set("0-9").plus()
.string("E=m.c^2") // Q ... E
.lookaheadEnd()
.groupEnd();
.end() // $
.build();
// @formatter:on
class Patterning { ... }
Though most people know regex; or at least it is worth learning regex, if only to do powerfull replaces in the editor.
1
In C++, use raw string literals (added in C++11). Nothing between the delimiter sequences is treated as an escape:
const char *regex = R"-regexp-(^("(?:[^"]|\")*"|-?[0-9]+(?:.[0-9]+)?)$)-regexp-";
in this case the delimiters are the literal strings -regexp-(
and )-regexp-
1
Use Unicode character escapes instead of literals. For example:
-
Java
boolean b = Pattern.matches("u0022", '"');
-
JavaScript
/u0022/.test('"');
-
Perl
'"' =~ /N{U+0022}/;
In addition, strings that are compiled to regular expressions can use line breaks for added clarity:
-
Java
boolean phone_mask = Pattern.matches("^[^0-9]*"/* Optional non-numeric characters */ + "\+9{3}" /* Followed by a plus sign and three nines */ + "\s9" /* Followed by a space and one nine */ + "\s9{3}" /* Followed by a space and three nines */ + "\s9{4}" /* Followed by a space and four nines */ + "$", "Phone: +999 9 999 9999");
-
JavaScript
var phone_mask = RegExp("^[^0-9]*"/* Optional non-numeric characters */ + "\+9{3}" /* Followed by a plus sign and three nines */ + "\s9" /* Followed by a space and one nine */ + "\s9{3}" /* Followed by a space and three nines */ + "\s9{4}" /* Followed by a space and four nines */ + "$").test("Phone: +999 9 999 9999");
References
- Character Escapes in Regular Expressions | Microsoft Docs
- Java Regex Examples Matching Characters
- perlrequick – perldoc.perl.org
- Java literals
- pcreunicode specification
- Maintaining Regular Expressions
1
Write a program that interprets and escapes your regexps for you.
You can either use this to generate the code needed to paste into your source or have it work on the fly having your regexp in a separate file.
For the file version, a big downside is: not having your logic with your source.
1