Was playing around with regex to learn it better and wondering if we can write regex to match something like the following use case below
Example String: 1_234_3456_345
Expected result: I want to match just the first 4 Numbers from this string using regex. Meaning just 1234 ignoring the underscore in between.
Another Example: 123_7689_54A
Expected Result: 1237 (Only the first 4 Numbers from the string should be matched, ignoring the chars in between them)
I tried out with something like this:
/d.*?d.*?d.*?d/
But in the above examples: (first one) and in the second one 123_7
I want it to omit the _
in between.
What do I do? Is this possible in the first place?
3
A regex match is always a substring of the string being matched against, so it is impossible for a match to “ignore” certain substrings within the matching substring.
You can instead enclose each of the 4 digits in a capture group so that you can join the captured digits into a string:
puts /(d)D*(d)D*(d)D*(d)/.match("1_234_3456_345").captures.join
This outputs:
1234
Demo: https://ideone.com/UD7Q2g
Alternative using regex and substring:
string = "1_234_3456_345"
result = string.gsub(/D/, '')[0, 4]
puts result
Try this:
(?:d|_(?=d)){4}
Explanation short:
Digit or underscores+digit four times.
Explanation long:
(?: ... )
is a non-capturing group.
d
matches any digit.
|
is the alternation operator, meaning “or”.
_(?=d)
matches an underscore only if it’s followed by a digit (using a positive lookahead assertion).
{4}
quantifier specifies that the preceding group should match exactly four times.
6