I have a list of say 5 words (foo bar foobar footable somebar). This list of words will be provided as a string each separated by space and in any order. I need a regex matching below requirements
- inputted string should contain only the 5 above mentioned words
- each word separated by space
- each word only once
- regex should match the entire string – if any portion does not match above criteria, it should be failed match
So far I have this
(b(foo|bar|foobar|footable|somebar)b *){1,5}
- Matches only the mentioned words
- Words separated by space
Need a solution for the 3rd & 4th requirements
- each word only once
- regex should match the entire string – if any portion of the string does not match above criteria, it should be a failed match
3
Start with a regular expression that matches exactly 5 words:
^(?:w+s+){4}w+$
The anchors ensure that this matches the entire string.
Then prefix this with lookaheads that match each of the words.
(?=.*bfoob)(?=.*bbarb)(?=.*bfoobarb)(?=.*bfootableb)(?=.*bsomebarb)^(?:w+s+){4}w+$
DEMO
6
“Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.”
I understand that this is probably a homework exercise but don’t use a regexp for this in the real world, just write a simple little script, e.g. using any awk in any shell on every Unix box for any space-separated strings containing any non-space characters (including regexp metachars) you want to compare:
$ cat tst.awk
BEGIN {
numAllowed = split(allow,tmp)
for (i in tmp) {
allowed[tmp[i]]
}
numCheck = split(check,tmp)
if ( numCheck != numAllowed ) {
err = err "Error: Invalid number of words: " numCheck+0 ORS
}
for (i in tmp) {
word = tmp[i]
if ( !(word in allowed) ) {
err = err "Error: Invalid word: " word ORS
}
if ( seen[word]++ ) {
err = err "Error: Repeated word: " word ORS
}
}
if ( err ) {
printf "%s", err
exit 1
}
else {
print "Valid"
}
}
$ awk -v allow='foo bar foobar footable somebar' -v check='foobar foo footable somebar bar' -f tst.awk
Valid
$ awk -v allow='foo bar foobar footable somebar' -v check='foo bar foo foobar stuff footable somebar' -f tst.awk
Error: Invalid number of words: 7
Error: Repeated word: foo
Error: Invalid word: stuff
A subroutine or a pre-defined subroutine can be used to accomplish this task.
- The regex with a subroutine:
^(foobar|footable|somebar|bar|foo)s+(?!1b)((?1))s+(?!(?:1|2)b)((?1))s+(?!(?:1|2|3)b)((?1))s+(?!(?:1|2|3|4)b)(?1)$
An explanation of this regex can be found here.
- The regex with a pre-defined subroutine:
(?(DEFINE)(?<words>foobar|footable|somebar|bar|foo))^((?&words))s+(?!2b)((?&words))s+(?!(?:2|3)b)((?&words))s+(?!(?:2|3|4)b)((?&words))s+(?!(?:2|3|4|5)b)(?&words)$
An explanation of this regex is here.
The demo of using this regex with grep
is here.