I have this table:
+-----------------------------+---------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+---------------------+------+-----+---------+----------------+
| id | bigint(20) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(255) | YES | MUL | NULL | |
| exact_location_unknown | tinyint(1) | NO | | 0 | |
| lat | decimal(10,8) | YES | | NULL | |
| lng | decimal(11,8) | YES | | NULL | |
| streetname | varchar(255) | YES | MUL | NULL | |
| house_number | varchar(255) | YES | MUL | NULL | |
| house_number_addition | varchar(255) | YES | MUL | NULL | |
| postalcode | varchar(255) | YES | | NULL | |
| city_id | bigint(20) unsigned | YES | MUL | NULL | |
| municipality_id | bigint(20) unsigned | YES | MUL | NULL | |
| internal_notes | longtext | YES | | NULL | |
| created_at | timestamp | YES | | NULL | |
| updated_at | timestamp | YES | | NULL | |
| deleted_at | timestamp | YES | | NULL | |
+-----------------------------+---------------------+------+-----+---------+----------------+
With indexes set to:
- id
- city_id
- municipality_id
- name
- streetname
- house_number
- house_number_addition
Current count of records is nearly 200.000. I need to list duplicates based on name and/or streetname and house_number and/or house_number_addition. I’m using Laravel v10.26.2 and this is in my controller (simplified):
$query = Company::query();
if ($request->has('filter') && !empty($request->filter) && is_array($request->filter)) {
foreach ($request->filter as $filter => $value) {
...
if ($filter === 'show_duplicate_company_names' && !empty($value)) {
$query->whereIn('name', function ($q) {
$q->select('name')
->from('companies')
->groupBy('name')
->havingRaw('COUNT(*) > 1');
});
}
}
}
$query = $query->paginate($pageLength, ['*'], 'page', $request->page);
return new Collection($query);
As curiosity, I also ran this SQL code directly:
select * from `companies` where `name` in (select `name` from `companies` group by `name` having COUNT(`name`) > 1)
Which should return all duplicates. But the performance is … poor. I honestly left running the SQL code for about an hour and still didn’t get back any results. How can I write the SQL query to be performant in my case? Waiting few seconds is an option (will simply add a note in my webapp that selecting this filter will take time), but waiting a minute or more is not an option.