In the regex below, s
denotes a space character. I imagine the regex parser, is going through the string and sees and knows that the next character is special.
But this is not the case as double escapes are required.
Why is this?
var res = new RegExp('(\s|^)' + foo).test(moo);
Is there a concrete example of how a single escape could be mis-interpreted as something else?
3
You are constructing the regular expression by passing a string to the RegExp constructor.
is an escape character in string literals.
The is consumed by the string literal parsing…
const foo = "foo";
const string = '(s|^)' + foo;
console.log(string);
… so the data you pass to the RegEx compiler is a plain s
and not s
.
You need to escape the to express the
as data instead of being an escape character itself.
1
Inside the code where you’re creating a string, the backslash is a javascript escape character first, which means the escape sequences like t
, n
, "
, etc. will be translated into their javascript counterpart (tab, newline, quote, etc.), and that will be made a part of the string. Double-backslash represents a single backslash in the actual string itself, so if you want a backslash in the string, you escape that first.
So when you generate a string by saying var someString = '(\s|^)'
, what you’re really doing is creating an actual string with the value (s|^)
.
The Regex needs a string representation of s
, which in JavaScript can be produced using the literal "\s"
.
Here’s a live example to illustrate why "s"
is not enough:
alert("One backslash: snDouble backslashes: \s");
Note how an extra before
s
changes the output.
As has been said, inside a string literal, a backslash indicates an escape sequence, rather than a literal backslash character, but the RegExp constructor often needs literal backslash characters in the string passed to it, so the code should have \
s to represent a literal backslash, in most cases.
A problem is that double-escaping metacharacters is tedious. There is one way to pass a string to new RegExp
without having to double escape them: use the String.raw
template tag, an ES6 feature, which allows you to write a string that will be parsed by the interpreter verbatim, without any parsing of escape sequences. For example:
console.log('\'.length); // length 1: an escaped backslash
console.log(`\`.length); // length 1: an escaped backslash
console.log(String.raw`\`.length); // length 2: no escaping in String.raw!
So, if you wish to keep your code readable, and you have many backslashes, you may use String.raw
to type only one backslash, when the pattern requires a backslash:
const sentence = 'foo bar baz';
const regex = new RegExp(String.raw`bfoosbarsbazb`);
console.log(regex.test(sentence));
But there’s a better option. Generally, there’s not much good reason to use new RegExp
unless you need to dynamically create a regular expression from existing variables. Otherwise, you should use regex literals instead, which do not require double-escaping of metacharacters, and do not require writing out String.raw
to keep the pattern readable:
const sentence = 'foo bar baz';
const regex = /bfoosbarsbazb/;
console.log(regex.test(sentence));
Best to only use new RegExp
when the pattern must be created on-the-fly, like in the following snippet:
const sentence = 'foo bar baz';
const wordToFind = 'foo'; // from user input
const regex = new RegExp(String.raw`b${wordToFind}b`);
console.log(regex.test(sentence));
3
is used in Strings to escape special characters. If you want a backslash in your string (e.g. for the in s) you have to escape it via a backslash. So becomes \ .
EDIT: Even had to do it here, because \ in my answer turned to .