I want to split a semicolon-separated test string with a regular expression in C#.
I know I could use for this example the string split method but the pattern and input string is a bit more complicated and I want to keep it simple.
This is the pattern:
string pattern = @"(?:^|;)(([^;]*|)(?=;|$))";`
And this is the test string:
string input = @"text1;text2;text3;;text5";
The match results are:
[0] text1
[1] text2
[2] text3
[3]
[4] text5
And this looks good to me!
But if I change the test string to:
string input = @";text2;text3;;text5";
then I get these results:
[0]
[1] text3
[2]
[3] text5`
So my questions are:
Why is it a problem for the C# engine when the test string starts with an empty string?
Even if the engine cannot return the first empty string, where is text2?
And what do I have to change that it also works in C#?
You can test it on Regex101. And you can see the difference of the engines if you switch on the left side between PCRE2 and .NET 7.
Matzy is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.