Consider the following three example lists:
List<string> localPatientsIDs = new List<string> { "1550615", "1688", "1760654", "1940629", "34277", "48083" };
List<string> remotePatientsIDs = new List<string> { "000-007", "002443", "002446", "214", "34277", "48083" };
List<string> archivedFiles = new List<string>{
@"G:Archive00-007_20230526175817297.zip",
@"G:Archive02443_20230526183639562.zip",
@"G:Archive02446_20230526183334407.zip",
@"G:Archive14967_20240703150011899.zip",
@"G:Archive214_20231213150003676.zip",
@"G:Archive34277_20230526200048891.zip",
@"G:Archive48083_20240214150011919.zip" };
Please note that each element in archivedFiles
is the full path of a ZIP file, whose name begins with the patientID
that is either in localPatientsIDs
or remotePatientsIDs
.
For example: @"G:Archive00-007_20230526175817297.zip"
: the filename 000-007_20230526175817297.zip
initiate with 000-007
, which is an element in the list remotePatientsIDs
.
A patientID connot be at localPatientsIDs
and archivedFiles
simultaneously, therefore, no duplicates are allowed between these two lists. However, the archivedFiles
can contain patientIDs that are also located in remotePatientsIDs
.
I need to get the elements in archivedFiles
whose file names begin with the elements present in remotePatientsIDs
but not present in localPatientsIDs
. The endpoint is to Unzip those files to the directory that contains localPatientsIDs
database.
For the given example, I would expect to have the following result:
archivedFilesToUnzip == {
@"G:Archive00-007_20230526175817297.zip",
@"G:Archive02443_20230526183639562.zip",
@"G:Archive02446_20230526183334407.zip",
@"G:Archive214_20231213150003676.zip" }
So, how can I use LINQ to do this ?
In my lack of knowledge, I would expect it to be as simple as:
List<string> archivedFilesToUnzip = archivedFiles.Where(name => name.Contains(remotePatients.Except(localPatients)))
I cannot even compile this, since Contains
probably is unable to iterate over the List members and I get the message:
CS1503: Argument 1: cannot convert from 'System.Collections.Generic.IEnumerable<string>' to 'string'
Then my best trial so far is the following sentence (I confess it seems a little messy to me). It always returns an empty list.
List<string> archivedFilesToUnzip = archivedFiles.Where(name => archivedFiles.Any(x => x.ToString().Contains(remotePatients.Except(localPatients).ToString()))).ToList();
I’ve found these helpful posts that helped me to better understand the differences between Where
and Select
:
- Difference between Select and Where in Entity Framework and
- Linq: What is the difference between Select and Where
Also, I’ve been looking for any directions using LINQ on :
- Filter Out List with another list using LINQ
- Filter a list by another list C#
- Filter a list based on another list condition
and other links as well, but I still cannot find a working solution.
fabricioLima is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
3
C# is statically (and mostly strongly) typed language (see the What is the difference between a strongly typed language and a statically typed language? question and The C# type system article if you want to dive deeper). It means that compiler will check variable types and will not allow a lot of mistakes like comparing string and boolean.
remotePatients.Except(localPatients)
is a collection of string
‘s while name
in archivedFiles.Where(name => name
is “just” a string
. Contains
on string can accept either char
(a symbol in a string
) or another string
, not a collection of strings, hence the compilation error.
Your second attempt compiles, but will not achieve anything meaningful – if you assign remotePatients.Except(localPatients).ToString()
to a variable and examine it or print the result to console you will see just the type name (System.Linq.Enumerable+<ExceptIterator>d__99
1[System.String]` to be exact) which obviously is not part of the file name.
As for your question, I would suggest to do the following:
// build the diff hashset for quick lookup for ids to add
// will improve performance if there are "many" ids
var missing = remotePatients.Except(localPatients)
.ToHashSet();
// regular expression to extract id from the file name
// you can implement this logic without regex if needed
var regex = new Regex(@"\(?<id>[d-]+)_d+.zip");
// the result
List<string> archivedFilesToUnzip = archivedFiles
.Where(name =>
{
var match = regex.Match(name); // check the file name for id
if (match.Success) // id found
{
// extract the id from the file name
var id = match.Groups["id"].Value;
return missing.Contains(id); // check if it should be added
}
// failed to match pattern for id
// probably can throw error here to fix the pattern or check the file name
return false;
})
.ToList();
This uses regular expression to extract id from the file name and then search it in the “missing” ids.
Explanation for this particular regular expression can be found @regex101.
1
Here’s simple LINQ query doing what you want:
var filtered = archivedFiles
.Where(file => localPatientsIDs.Any(Path.GetFileName(file).StartsWith))
.ToArray();
Here’s little break down:
localPatientsIDs.Any(Path.GetFileName(file).StartsWith)
will check if there are any items inlocalPatientsIDs
matching criteria, which is:Path.GetFileName(file).StartsWith
–Path.GetFileName
will get the file name from path,StartsWith
will check if the file name starts with any element fromlocalPatientsIDs
To make it more clear, I would define such method:
bool FileNameBeginsWithItem(string filePath, IEnumerable<string> prefixes)
{
var fileName = Path.GetFileName(filePath);
return prefixes.Any(fileName.StartsWith);
}
and then you can use it like this:
var filtered1 = archivedFiles
.Where(file => FileNameBeginsWithItem(file, localPatientsIDs))
.ToArray();
var filtered2 = archivedFiles
.Where(file => FileNameBeginsWithItem(file, remotePatientsIDs))
.ToArray();
UPDATE
If you want to see if the file name begins with defined prefix followed by _
, then use this code:
var filtered = archivedFiles
.Where(file => localPatientsIDs.Any(x => Path.GetFileName(file).StartsWith($"{x}_")))
.ToArray();
and modified method:
bool FileNameBeginsWithItem(string filePath, IEnumerable<string> prefixes)
{
var fileName = Path.GetFileName(filePath);
return prefixes.Any(x => fileName.StartsWith($"{x}_"));
}
5
You can try this LINQ query, which returns the expected result:
using System.Text.RegularExpressions;
List<string> localPatientsIDs = new List<string>
{ "1550615", "1688", "1760654", "1940629", "34277", "48083" };
List<string> remotePatientsIDs = new List<string>
{ "000-007", "002443", "002446", "214", "34277", "48083" };
List<string> archivedFiles = new List<string>
{
@"G:Archive00-007_20230526175817297.zip",
@"G:Archive02443_20230526183639562.zip",
@"G:Archive02446_20230526183334407.zip",
@"G:Archive14967_20240703150011899.zip",
@"G:Archive214_20231213150003676.zip",
@"G:Archive34277_20230526200048891.zip",
@"G:Archive48083_20240214150011919.zip"
};
// a helper function
var getPatientId = (string input) =>
{
string pattern = @"\([^\_]+)_"; // an appropriate pattern
Match match = Regex.Match(input, pattern);
return match.Success ? match.Groups[1].Value : null;
};
var query = from file in archivedFiles
// elements present in remotePatientsIDs
where remotePatientsIDs.Contains(getPatientId(file))
// but not in localPatientsIDs
&& !localPatientsIDs.Contains(getPatientId(file))
select file;
foreach (var file in query)
Console.WriteLine(file);