I’m working on an application where I need to insert data into a database while avoiding duplicates. To achieve this, I group the imported data by several fields, extract the key fields, and then query the database to check if records already exist. The problem I’m facing is that EF Core isn’t able to translate my query to SQL, resulting in runtime exceptions.
Here’s the relevant code:
// Group the imports to ensure uniqueness
var items = items
.GroupBy(i => new { i.Field1, i.Field2, i.Field3, i.Field4, i.Field5 })
.Select(g => g.First())
.ToList();
// Extract keys to query the database only on indexed fields
var keys = items.Select(i => new
{
i.Field1,
i.Field2,
i.Field3,
i.Field4,
i.Field5
}).ToList();
// Query the database for existing records using only the key fields
var existingRecords = await _dbContext.DataSet
.Where(dbRecord => keys.Contains(new
{
dbRecord.Field1,
dbRecord.Field2,
dbRecord.Field3,
dbRecord.Field4,
dbRecord.Field5
}))
.ToListAsync();
// Filter out records that already exist
var recordsToInsert = items
.Where(item => !existingRecords.Any(existing =>
existing.Field1 == item.Field1 &&
existing.Field2 == item.Field2 &&
existing.Field3 == item.Field3 &&
existing.Field4 == item.Field4 &&
existing.Field5 == item.Field5))
.ToList();
EF Core throws an exception indicating that the .Contains() operation cannot be translated into SQL. I understand that EF Core has limitations when it comes to certain operations, but I need to optimize this query to avoid pulling all the records into memory.
Could anyone suggest how I can rewrite this query so that it remains translatable by EF Core and performs efficiently?
3