I have to write a java regular expression to find if a string has at least 6 digits in it. The digits may be separated by space, hyphen or other characters.
The java pattern that I wrote "^(?=.(?:d[s-]){6,}).*$"
is working fine. However, Sonarlint is saying that I should refactor this repetition that can lead to stack overflow for large inputs.
String patternString1 = ".(d[s-]){6,}.";
String patternString2= "^(?=.(?:d[s-]){6,}).$"
both of the above cause the same warning – need to change it since regular expression should not overflow the stack.
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MyClass {
public static void main(String[] args) {
String[] strings = { "JamesBond1.zip", "JamesBond 1 2 3456.zip", "JaneBond X 3-4-2024"};
String patternString = ".*(\d[\s-]*){6,}.*";
Pattern pattern = Pattern.compile(patternString);
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(""" + s + "" matches the pattern.");
} else {
System.out.println(""" + s + "" does not match the pattern.");
}
}
}
}
The first string should fail and the 2nd and 3rd should pass.
gcpd0711 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
4
Try:
^D*(?:dD?){6}.*$
See regex101
import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class MyClass {
public static void main(String[] args) {
String[] strings = { "JamesBond1.zip", "JamesBond 1 2 3456.zip", "JaneBond X 3-4-2024" };
String patternString = "^\D*(?:\d\D?){6}.*$";
Pattern pattern = Pattern.compile(patternString);
for (String s : strings) {
Matcher matcher = pattern.matcher(s);
if (matcher.matches()) {
System.out.println(""" + s + "" matches the pattern.");
} else {
System.out.println(""" + s + "" does not match the pattern.");
}
}
}
}
which returns
"JamesBond1.zip" does not match the pattern.
"JamesBond 1 2 3456.zip" matches the pattern.
"JaneBond X 3-4-2024" matches the pattern.
See: Online Java Compiler
Explanation
To avoid backtracking get rid of nested *
+ +
or {n,}
constructs and try to be more explicit with regards of multiple entries being able to goble up the same thing. See character classes:
^
: Match the start of lineD*
: Match anything up to the first digit(?:...){6}
: match 6 timesdD?
: a digit followed, optionally, by a non digit (like: space, hyphen or other characters)
.*$
: after you found the 6 digits match the rest of the line
2