I’m trying to understand why the following regular expression does not do what I expect in Powershell.
["(Total|Group1|Group2)",(d+),d*,,,,,d+,d+]
I would expect this to return nine matches as there are three groupings that match this expression, each with two tokens. Regex101 gives me the results I expect (see here), but Powershell only returns the first matching group (three matches).
Name Value
---- -----
2 47
1 Total
0 ["Total",47,,,,,,1,2]
It’s easy enough to split this into three separate expressions and process them one at a time. That just feels unnecessary.
Test string:
<script type="text/javascript">
var resourceFolder = "Test_Results_20240621_0910";
var blockSize = 50000;
block0 = [["Total",47,,,,,,1,2],["Group1",29,0,,,,,3,5],["Group2",52,0,,,,,8,4]];
coverageData = [block0];
</script>
EDIT: I should have pointed out I was using the -match
operator as in $test_string -match $exp
, then trying to index into the matches using $Matches[i]
. It seems the answer is to use $matches = [regex]::Matches($test_str, $exp)
instead.
Of course, now I want to know what the difference is between these two methods.
Thank you @user24714692
3
Your post implies that you’re using PowerShell’s regex-based -match
operator
and are looking for the capture groups in the automatic $Matches
variable that -match
populates with the matching results.
The problem is that -match
by design only ever looks for one match in its input.
While a new -matchall
operator – for looking for all matches – has been proposed and green-lit in GitHub issue #7867, no one has stepped up to implement it yet.
Thus, for now (as of PowerShell 7.4.x), you need to use the underlying .NET APIs directly , namely the [regex]::Matches()
method:
# Sample input string.
$str = @'
<script type="text/javascript">
var resourceFolder = "Test_Results_20240621_0910";
var blockSize = 50000;
block0 = [["Total",47,,,,,,1,2],["Group1",29,0,,,,,3,5],["Group2",52,0,,,,,8,4]];
coverageData = [block0];
</script>
'@
# Find all matches of the regex in the input string, and
# report information about each.
[regex]::Matches(
$str,
'["(Total|Group1|Group2)",(d+),d*,,,,,d+,d+]'
) | ForEach-Object {
[pscustomobject] @{
FullMatch = $_.Value
CaptureGroup1 = $_.Groups[1].Value
CaptureGroup2 = $_.Groups[2].Value
}
}
The above outputs:
FullMatch CaptureGroup1 CaptureGroup2
--------- ------------- -------------
["Total",47,,,,,,1,2] Total 47
["Group1",29,0,,,,,3,5] Group1 29
["Group2",52,0,,,,,8,4] Group2 52
Note:
[regex]::Matches()
returns a collection of[System.Text.RegularExpressions.Match]
instances that contain detailed information about each match; the.Value
property contains the matched text and the.Groups
property contains the capture-group matches (which too have a.Value
property), as shown above.
Note that the first capture group is in.Groups[1]
, whereas.Groups[0]
reflects the whole match.