Here are two sql :
- the first uses join method to execute
- the second one uses dependent subquery to execute
IN the slow log ,we can see the examined rows and query time. When the tow methos have the same examined rows ,the second methods cost more time.
From the performance_schema.events_stages_history_long ,we can see subquery method include 100001 times of stage/sql/executing and stage/sql/Sending data:
follow are the descriptions from offical document:https://dev.mysql.com/doc/refman/5.7/en/general-thread-states.html
- executing
The thread has begun executing a statement. - sending data
The thread is reading and processing rows for a SELECT statement, and sending data to the client. Because operations occurring during this state tend to perform large amounts of disk access (reads), it is often the longest-running state over the lifetime of a given query.
Although there is a lot of information,I don’t exactly understand the difference. Is that means when using subquery,the MySQL server level and innodb engine Level exchange data 100001 times ? Why can’t execute like join method?
Thanks!
Here is all the information I mentioned:
— MySQL 5.7.23
mysql> explain SELECT e.name FROM employees e JOIN departments d ON e.department_id = d.id WHERE d.location = 'New York';
+----+-------------+-------+------------+--------+---------------+---------+---------+------------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------------------+-------+----------+-------------+
| 1 | SIMPLE | e | NULL | ALL | department_id | NULL | NULL | NULL | 99918 | 100.00 | Using where |
| 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | testdb.e.department_id | 1 | 10.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> explain SELECT name FROM employees e WHERE exists (SELECT 1 FROM departments d WHERE d.id=e.department_id and location = 'New York');
+----+--------------------+-------+------------+--------+---------------+---------+---------+------------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+--------+---------------+---------+---------+------------------------+-------+----------+-------------+
| 1 | PRIMARY | e | NULL | ALL | NULL | NULL | NULL | NULL | 99918 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | testdb.e.department_id | 1 | 10.00 | Using where |
+----+--------------------+-------+------------+--------+---------------+---------+---------+------------------------+-------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)
IN the slow log ,we can see the examined rows and query time.
-- JOIN
# Time: 2024-09-04T10:33:41.266352+08:00
# User@Host: root[root] @ localhost [] Id: 2
# Query_time: 1.809189 Lock_time: 0.000534 Rows_sent: 203 Rows_examined: 200000
SET timestamp=1725417221;
SELECT e.name FROM employees e JOIN departments d ON e.department_id = d.id WHERE d.location = 'New York';
# Time: 2024-09-04T10:33:49.587586+08:00
# User@Host: root[root] @ localhost [] Id: 2
# Query_time: 1.801474 Lock_time: 0.000389 Rows_sent: 203 Rows_examined: 200000
SET timestamp=1725417229;
SELECT e.name FROM employees e JOIN departments d ON e.department_id = d.id WHERE d.location = 'New York';
-- DEPENDENT SUBQUERY
# Time: 2024-09-04T10:33:59.096404+08:00
# User@Host: root[root] @ localhost [] Id: 2
# Query_time: 2.547572 Lock_time: 0.000633 Rows_sent: 203 Rows_examined: 200000
SET timestamp=1725417239;
SELECT name FROM employees e WHERE exists (SELECT 1 FROM departments d WHERE d.id=e.department_id and location = 'New York');
# Time: 2024-09-04T10:34:03.799287+08:00
# User@Host: root[root] @ localhost [] Id: 2
# Query_time: 2.616687 Lock_time: 0.000429 Rows_sent: 203 Rows_examined: 200000
SET timestamp=1725417243;
SELECT name FROM employees e WHERE exists (SELECT 1 FROM departments d WHERE d.id=e.department_id and location = 'New York');
stage event
-- join
mysql> SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%employees%';
+----------+----------+-----------------------------------------------------------------------------------------------------------+
| EVENT_ID | Duration | SQL_TEXT |
+----------+----------+-----------------------------------------------------------------------------------------------------------+
| 86 | 2.013948 | SELECT e.name FROM employees e JOIN departments d ON e.department_id = d.id WHERE d.location = 'New York' |
| 103 | 1.717120 | SELECT e.name FROM employees e JOIN departments d ON e.department_id = d.id WHERE d.location = 'New York' |
+----------+----------+-------------------------------------
mysql>
mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration
-> FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=103;
+--------------------------------+----------+
| Stage | Duration |
+--------------------------------+----------+
| stage/sql/starting | 0.000346 |
| stage/sql/checking permissions | 0.000003 |
| stage/sql/checking permissions | 0.000009 |
| stage/sql/Opening tables | 0.000103 |
| stage/sql/init | 0.000088 |
| stage/sql/System lock | 0.000034 |
| stage/sql/optimizing | 0.000034 |
| stage/sql/statistics | 0.000119 |
| stage/sql/preparing | 0.000059 |
| stage/sql/executing | 0.000001 |
| stage/sql/Sending data | 1.716179 |
| stage/sql/end | 0.000009 |
| stage/sql/query end | 0.000031 |
| stage/sql/closing tables | 0.000028 |
| stage/sql/freeing items | 0.000063 |
| stage/sql/cleaning up | 0.000001 |
+--------------------------------+----------+
16 rows in set (0.02 sec)
-- subquery
mysql> SELECT EVENT_ID, TRUNCATE(TIMER_WAIT/1000000000000,6) as Duration, SQL_TEXT
-> FROM performance_schema.events_statements_history_long WHERE SQL_TEXT like '%exists%';
+----------+----------+------------------------------------------------------------------------------------------------------------------------------+
| EVENT_ID | Duration | SQL_TEXT |
+----------+----------+------------------------------------------------------------------------------------------------------------------------------+
| 165 | 2.590511 | SELECT name FROM employees e WHERE exists (SELECT 1 FROM departments d WHERE d.id=e.department_id and location = 'New York') |
+----------+----------+------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql> SELECT event_name AS Stage, TRUNCATE(TIMER_WAIT/1000000000000,6) AS Duration FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=165 limit 100;
+--------------------------------+----------+
| Stage | Duration |
+--------------------------------+----------+
| stage/sql/starting | 0.000309 |
| stage/sql/checking permissions | 0.000003 |
| stage/sql/checking permissions | 0.000008 |
| stage/sql/Opening tables | 0.000082 |
| stage/sql/init | 0.000088 |
| stage/sql/System lock | 0.000039 |
| stage/sql/optimizing | 0.000023 |
| stage/sql/statistics | 0.000173 |
| stage/sql/preparing | 0.000037 |
| stage/sql/optimizing | 0.000022 |
| stage/sql/statistics | 0.000054 |
| stage/sql/preparing | 0.000021 |
| stage/sql/executing | 0.000001 |
| stage/sql/Sending data | 0.000281 |
| stage/sql/executing | 0.000001 |
| stage/sql/Sending data | 0.000161 |
| stage/sql/executing | 0.000001 |
| stage/sql/Sending data | 0.000072 |
...
| stage/sql/executing | 0.000000 |
| stage/sql/Sending data | 0.000018 |
| stage/sql/executing | 0.000000 |
| stage/sql/Sending data | 0.000052 |
| stage/sql/executing | 0.000000 |
| stage/sql/Sending data | 0.000021 |
| stage/sql/executing | 0.000000 |
| stage/sql/Sending data | 0.000018 |
| stage/sql/executing | 0.000000 |
| stage/sql/Sending data | 0.000017 |
| stage/sql/executing | 0.000000 |
| stage/sql/Sending data | 0.000017 |
| stage/sql/executing | 0.000000 |
| stage/sql/Sending data | 0.000017 |
| stage/sql/executing | 0.000000 |
| stage/sql/Sending data | 0.000108 |
| stage/sql/end | 0.000024 |
| stage/sql/query end | 0.000050 |
| stage/sql/closing tables | 0.000029 |
| stage/sql/freeing items | 0.000075 |
| stage/sql/cleaning up | 0.000002 |
+--------------------------------+----------+
200019 rows in set (0.88 sec)
mysql> SELECT count(*) FROM performance_schema.events_stages_history_long WHERE NESTING_EVENT_ID=165 and event_name='stage/sql/executing';
+----------+
| count(*) |
+----------+
| 100001 |
+----------+
1 row in set (1.18 sec)
— MySQL 8.0.35 EXPLAIN ANALYZE
Before executing the SQL, I adjusted the optimizer parameters with SET optimizer_switch=’semijoin=off’, because in MySQL 8.0, the MySQL optimizer has made optimizations to semijoin for the execution of EXISTS.
mysql> explain SELECT e.name FROM employees e JOIN departments d ON e.department_id = d.id WHERE d.location = 'New York';
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------------+-------+----------+-------------+
| 1 | SIMPLE | e | NULL | ALL | department_id | NULL | NULL | NULL | 99918 | 100.00 | Using where |
| 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | zld.e.department_id | 1 | 10.00 | Using where |
+----+-------------+-------+------------+--------+---------------+---------+---------+---------------------+-------+----------+-------------+
2 rows in set, 1 warning (0.00 sec)
mysql> explain SELECT name FROM employees e WHERE exists (SELECT 1 FROM departments d WHERE d.id=e.department_id and location = 'New York');
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------------+-------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------------+-------+----------+-------------+
| 1 | PRIMARY | e | NULL | ALL | NULL | NULL | NULL | NULL | 99918 | 100.00 | Using where |
| 2 | DEPENDENT SUBQUERY | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | zld.e.department_id | 1 | 10.00 | Using where |
+----+--------------------+-------+------------+--------+---------------+---------+---------+---------------------+-------+----------+-------------+
2 rows in set, 2 warnings (0.00 sec)
mysql> explain analyze SELECT e.name FROM employees e JOIN departments d ON e.department_id = d.id WHERE d.location = 'New York';
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Nested loop inner join (cost=45035 rows=9992) (actual time=0.117..309 rows=203 loops=1)
-> Filter: (e.department_id is not null) (cost=10064 rows=99918) (actual time=0.0981..76.3 rows=100000 loops=1)
-> Table scan on e (cost=10064 rows=99918) (actual time=0.0971..58.7 rows=100000 loops=1)
-> Filter: (d.location = 'New York') (cost=0.25 rows=0.1) (actual time=0.00208..0.00208 rows=0.00203 loops=100000)
-> Single-row index lookup on d using PRIMARY (id=e.department_id) (cost=0.25 rows=1) (actual time=0.00152..0.00158 rows=1 loops=100000)
|
+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.31 sec)
mysql> explain analyze SELECT name FROM employees e WHERE exists (SELECT 1 FROM departments d WHERE d.id=e.department_id and location = 'New York');
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: exists(select #2) (cost=10064 rows=99918) (actual time=0.0827..433 rows=203 loops=1)
-> Table scan on e (cost=10064 rows=99918) (actual time=0.0584..52.8 rows=100000 loops=1)
-> Select #2 (subquery in condition; dependent)
-> Limit: 1 row(s) (cost=0.26 rows=0.1) (actual time=0.0029..0.0029 rows=0.00203 loops=100000)
-> Filter: (d.location = 'New York') (cost=0.26 rows=0.1) (actual time=0.00264..0.00264 rows=0.00203 loops=100000)
-> Single-row index lookup on d using PRIMARY (id=e.department_id) (cost=0.26 rows=1) (actual time=0.00203..0.00209 rows=1 loops=100000)
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set, 1 warning (0.44 sec)
3
Use the JOIN
formulation and have these indexes:
d: INDEX(location) -- allowing the JOIN to start with d
e: INDEX(department_id, name) -- for JOINing and "covering"
The “stage” info is mostly useless.