row satisfies join conditions, send to client
MySQL Join Buffer有如下特点:
join buffer可以被使用在表连接类型为ALL,index,range。换句话说,只有索引不可能被使用,或者索引全扫描,索引范围扫描等代价较大的查询才会使用Block Nested-Loop Join算法;
仅仅用于连接的列数据才会被存在连接缓存中,而不是整行数据
join_buffer_size系统变量用来决定每一个join buffer的大小
MySQL为每一个可以被缓存的join语句分配一个join buffer,以便每一个查询都可以使用join buffer。
在执行连接之前分配连接缓冲区,并在查询完成后释放连接缓冲区。
(三)表连接顺序
在关系型数据库中,对于多表连接,位于嵌套循环外部的表我们称为驱动表,位于嵌套循环内部的表我们称为被驱动表,驱动表与被驱动表的顺序对于Join性能影响非常大,接下来我们探索一下MySQL中表连接的顺序。因为RIGHT JOIN和FULL JOIN在MySQL中最终都会转换为LEFT JOIN,所以我们只需讨论INNER JOIN和LEFT JOIN即可。
这里为了确保测试准确,我们使用MySQL提供的测试数据库employees,下载地址为:https://github.com/datacharmer/test_db。其ER图如下:
(3.1)INNER JOIN
对应INNER JOIN,MySQL永远选择结果集小的表作为驱动表。
例子1:查看员工部门对应信息
-- 将employees,dept_manager , departments 3个表进行内连接即可
select e.emp_no,e.first_name,e.last_name,d.dept_name
from employees e inner join dept_manager dm on e.emp_no = dm.emp_no
inner join departments d on dm.dept_no = d.dept_no;
我们来看一下3个表的大小,需要注意的是,这里仅仅是MySQL粗略统计行数,在这个例子中,实际行数与之有一定的差距:
+--------------+------------+
| table_name | table_rows |
+--------------+------------+
| departments | 9 |
| dept_manager | 24 |
| employees | 299468 |
+--------------+------------+
最终的执行计划为:
+----+-------------+-------+------------+--------+-----------------+-----------+---------+---------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-----------------+-----------+---------+---------------------+------+----------+-------------+
| 1 | SIMPLE | d | NULL | index | PRIMARY | dept_name | 42 | NULL | 9 | 100.00 | Using index |
| 1 | SIMPLE | dm | NULL | ref | PRIMARY,dept_no | dept_no | 4 | employees.d.dept_no | 2 | 100.00 | Using index |
| 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.emp_no | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+-----------------+-----------+---------+---------------------+------+----------+-------------+
可以看到,在INNER JOIN中,MySQL并不是按照语句中表的出现顺序来按顺序执行的,而是首先评估每个表结果集的大小,选择小的作为驱动表,大的作为被驱动表,不管我们如何调整SQL中的表顺序,MySQL优化器选择表的顺序与上面相同。
这里需要特别说明的是:通常我们所说的"小表驱动大表"是非常不严谨的,在INNER JOIN中,MySQL永远选择结果集小的表作为驱动表,而不是小表。这有什么区别呢?结果集是指表进行了数据过滤后形成的临时表,其数据量小于或等于原表。下面提及的"小表和大表"都是指结果集大小。
例子2:查看工号为110567的员工部门对应信息
select e.emp_no,e.first_name,e.last_name,d.dept_name
from employees e inner join dept_manager dm on e.emp_no = dm.emp_no and e.emp_no = 110567
inner join departments d on dm.dept_no = d.dept_no;
最终的执行计划为:
+----+-------------+-------+------------+--------+-----------------+---------+---------+----------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+-----------------+---------+---------+----------------------+------+----------+-------------+
| 1 | SIMPLE | e | NULL | const | PRIMARY | PRIMARY | 4 | const | 1 | 100.00 | NULL |
| 1 | SIMPLE | dm | NULL | ref | PRIMARY,dept_no | PRIMARY | 4 | const | 1 | 100.00 | Using index |
| 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.dept_no | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+-----------------+---------+---------+----------------------+------+----------+-------------+
可以看到,这里驱动表是employees,这个表是数据量最大的表,但是为什么选择它作为驱动表呢?因为他的结果集最小,在执行查询时,MySQL会首先选择employees表中emp_no=110567的数据,而这样的数据只有1条,其结果集也就最小,所以优化器选择了employees作为驱动表。
(3.2)LEFT JOIN
对于LEFT JOIN,执行顺序永远是从左往右,我们可以通过例子来看一下。
例子2:LEFT JOIN表顺序的选择测试
-- 表顺序:e --> dm --> d
mysql> explain select e.emp_no,e.first_name,e.last_name,d.dept_name
from employees e left join dept_manager dm on e.emp_no = dm.emp_no
left join departments d on dm.dept_no = d.dept_no;
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
| 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 100.00 | NULL |
| 1 | SIMPLE | dm | NULL | ref | PRIMARY | PRIMARY | 4 | employees.e.emp_no | 1 | 100.00 | Using index |
| 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.dept_no | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
-- 表顺序:dm --> e --> d
mysql> explain select e.emp_no,e.first_name,e.last_name,d.dept_name
from dept_manager dm left join employees e on e.emp_no = dm.emp_no
left join departments d on dm.dept_no = d.dept_no;
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+------+----------+-------------+
| 1 | SIMPLE | dm | NULL | index | NULL | dept_no | 4 | NULL | 24 | 100.00 | Using index |
| 1 | SIMPLE | e | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.emp_no | 1 | 100.00 | NULL |
| 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.dept_no | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+------+----------+-------------+
-- 表顺序:e --> dm --> d
mysql> explain select e.emp_no,e.first_name,e.last_name,d.dept_name
from employees e left join dept_manager dm on e.emp_no = dm.emp_no
left join departments d on dm.dept_no = d.dept_no;
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
| 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299468 | 100.00 | NULL |
| 1 | SIMPLE | dm | NULL | ref | PRIMARY | PRIMARY | 4 | employees.e.emp_no | 1 | 100.00 | Using index |
| 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 4 | employees.dm.dept_no | 1 | 100.00 | NULL |
+----+-------------+-------+------------+--------+---------------+---------+---------+----------------------+--------+----------+-------------+
如果右表存在谓词过滤条件,MySQL会将left join转换为inner join,详见本文:(5.3)left join优化
(四)ON和WHERE的思考
在表连接中,我们可以在2个地方写过滤条件,一个是在ON后面,另一个就是WHERE后面了。那么,这两个地方写谓词过滤条件有什么区别呢?我们还是通过INNER JOIN和LEFT JOIN分别看一下。
(4.1)INNER JOIN
使用INNER JOIN,不管谓词条件写在ON部分还是WHERE部分,其结果都是相同的。
-- 将过滤条件写在ON部分
mysql> select e.empno,e.ename,d.dname
from emp e inner join dept d
on e.deptno = d.deptno and d.dname = 'HR';
+-------+-------+-------+
| empno | ename | dname |
+-------+-------+-------+
| 2 | bb | HR |
| 6 | ff | HR |
+-------+-------+-------+
-- 将过滤条件写在WHERE部分
mysql> select e.empno,e.ename,d.dname
from emp e inner join dept d
on e.deptno = d.deptno
where d.dname = 'HR';
+-------+-------+-------+
| empno | ename | dname |
+-------+-------+-------+
| 2 | bb | HR |
| 6 | ff | HR |
+-------+-------+-------+
-- 使用非标准写法,将表连接条件和过滤条件写在WHERE部分
mysql> select e.empno,e.ename,d.dname
from emp e inner join dept d
where e.deptno = d.deptno
and d.dname = 'HR';
+-------+-------+-------+
| empno | ename | dname |
+-------+-------+-------+
| 2 | bb | HR |
| 6 | ff | HR |
+-------+-------+-------+
实际上,通过trace报告可以看到,在inner join中,不管谓词条件写在ON部分还是WHERE部分,MySQL都会将SQL语句的谓词条件等价改写到where后面。
(4.2)LEFT JOIN
我们继续来看LEFT JOIN中ON与WHERE的区别。
使用ON作为谓词过滤条件:
mysql> select e.empno,e.ename,d.dname
from emp e left join dept d
on e.deptno = d.deptno and d.dname = 'HR';
+-------+-------+-------+
| empno | ename | dname |
+-------+-------+-------+
| 1 | aa | NULL |
| 2 | bb | HR |
| 3 | cc | NULL |
| 4 | dd | NULL |
| 5 | ee | NULL |
| 6 | ff | HR |
+-------+-------+-------+
我们可以把使用ON的情况用下图来描述,先使用ON条件进行关联,并在关联的时候进行数据过滤:
再看看使用where的结果:
mysql> select e.empno,e.ename,d.dname
from emp e left join dept d
on e.deptno = d.deptno
where d.dname = 'HR';
+-------+-------+-------+
| empno | ename | dname |
+-------+-------+-------+
| 2 | bb | HR |
| 6 | ff | HR |
+-------+-------+-------+
我们可以把使用where的情况用下图来描述,先使用ON条件进行关联,然后对关联的结果进行数据过滤:
可以看到,在LEFT JOIN中,过滤条件放在ON和WHERE之后结果是不同的:
如果过滤条件在ON后面,那么将使用左表与右表每行数据进行连接,然后根据过滤条件判断,如果满足判断条件,则左表与右表数据进行连接,如果不满足判断条件,则返回左表数据,右表数据用NULL值代替;
如果过滤条件在WHERE后面,那么将使用左表与右表每行数据进行连接,然后将连接的结果集进行条件判断,满足条件的行信息保留。
(五)JOIN优化
JOIN语句相对而言比较复杂,我们根据SQL语句的结构考虑优化方法,JOIN相关的主要SQL结构如下:
inner join
inner join + 排序(group by 或者 order by)
left join
(5.1)inner join优化
常规inner join的SQL语法如下:
SELECT <select_list>
FROM <left_table> inner join <right_table> ON <join_condition>
WHERE <where_condition>
优化方法:
1.对于inner join,通常是采用小表驱动大表的方式,即小标作为驱动表,大表作为被驱动表(相当于小表位于for循环的外层,大表位于for循环的内层)。这个过程MySQL数据局优化器以帮助我们完成,通常无需手动处理(特殊情况,表的统计信息不准确)。注意,这里的“小表”指的是结果集小的表。
2.对于inner join,需要对被驱动表的连接条件创建索引
3.对于inner join,考虑对连接条件和过滤条件(ON、WHERE)创建复合索引
例子1:对于inner join,需要对被驱动表的连接条件创建索引
-- ---------- 构造测试表 --------------------------
-- 创建新表employees_new
mysql> create table employees_new like employees;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into empployees_new select * from employees;
Query OK, 300024 rows affected (2.69 sec)
Records: 300024 Duplicates: 0 Warnings: 0
-- 创建新表salaries_new
mysql> create table salaries_new like salaries;
Query OK, 0 rows affected (0.01 sec)
mysql> insert into salaries_new select * from salaries;
Query OK, 2844047 rows affected (13.00 sec)
Records: 2844047 Duplicates: 0 Warnings: 0
-- 删除主键
mysql> alter table employees_new drop primary key;
Query OK, 300024 rows affected (1.84 sec)
Records: 300024 Duplicates: 0 Warnings: 0
mysql> alter table salaries_new drop primary key;
Query OK, 2844047 rows affected (9.58 sec)
Records: 2844047 Duplicates: 0 Warnings: 0
-- 表大小
mysql> select table_name,table_rows
from information_schema.tables a
where a.table_schema = 'employees'
and a.table_name in ('employees_new','salaries_new');
+---------------+------------+
| table_name | table_rows |
+---------------+------------+
| employees_new | 299389 |
| salaries_new | 2837194 |
+---------------+------------+
此时测试表ER关系如下:
进行表连接查询,语句如下:
select e.emp_no,e.first_name,e.last_name,s.salary,s.from_date,s.to_date
from employees_new e inner join salaries_new s
on e.emp_no = s.emp_no ;
-- 1. 被驱动表没有索引,执行时间:大于800s,(800s未执行完)
-- 执行计划:
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------+
| 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299389 | 100.00 | NULL |
| 1 | SIMPLE | s | NULL | ALL | NULL | NULL | NULL | NULL | 2837194 | 10.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+---------+----------+----------------------------------------------------+
-- 2. 在被驱动表连接条件上创建索引,执行时间: 37s
-- 创建索引语句
create index idx_empno on salaries_new(emp_no);
-- 执行计划:
+----+-------------+-------+------------+------+---------------+-----------+---------+--------------------+--------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+-----------+---------+--------------------+--------+----------+-------+
| 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 299389 | 100.00 | NULL |
| 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL |
+----+-------------+-------+------------+------+---------------+-----------+---------+--------------------+--------+----------+-------+
-- 3. 更进一步,在驱动表连接条件上也创建索引,执行时间: 40s
-- 创建索引语句
create index idx_employees_new_empno on employees_new(emp_no);
-- 执行计划:
+----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------+
| 1 | SIMPLE | e | NULL | ALL | idx_employees_new_empno | NULL | NULL | NULL | 299389 | 100.00 | NULL |
| 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL |
+----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------+
通过以上测试可见,在被驱动表的连接条件上创建索引是非常有必要的,而在驱动表连接条件上创建索引则不会显著提高速度。
例子2:对于inner join,考虑对连接条件和过滤条件(ON、WHERE)创建复合索引
进行表连接查询,语句如下(以下2个SQL在MySQL优化器中解析为相同SQL):
select e.emp_no,e.first_name,e.last_name,s.salary,s.from_date,s.to_date
from employees_new e inner join salaries_new s
on e.emp_no = s.emp_no and e.first_name = 'Georgi'
-- 或者
select e.emp_no,e.first_name,e.last_name,s.salary,s.from_date,s.to_date
from employees_new e inner join salaries_new s
on e.emp_no = s.emp_no
where e.first_name = 'Georgi'
-- 1. 未在连接条件和过滤条件上创建复合索引,执行时间: 0.162s
-- 执行计划:
+----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------------+
| 1 | SIMPLE | e | NULL | ALL | idx_employees_new_empno | NULL | NULL | NULL | 299389 | 10.00 | Using where |
| 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL |
+----+-------------+-------+------------+------+-------------------------+-----------+---------+--------------------+--------+----------+-------------+
-- 2.在连接条件和过滤条件上创建复合索引,执行时间: 0.058s
-- 创建索引语句
create index idx_employees_first_name_emp_no on employees_new(first_name,emp_no);
create index idx_employees_emp_no_first_name on employees_new(emp_no,first_name);
-- 执行计划:
+----+-------------+-------+------------+------+-----------------------------------------------------------------------------------------+---------------------------------+---------+--------------------+------+----------+-------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-----------------------------------------------------------------------------------------+---------------------------------+---------+--------------------+------+----------+-------+
| 1 | SIMPLE | e | NULL | ref | idx_employees_new_empno,idx_employees_first_name_emp_no,idx_employees_emp_no_first_name | idx_employees_first_name_emp_no | 16 | const | 253 | 100.00 | NULL |
| 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL |
+----+-------------+-------+------------+------+-----------------------------------------------------------------------------------------+---------------------------------+---------+--------------------+------+----------+-------+
通过以上测试可见,表的连接条件上和过滤条件上创建复合索引可以提高查询速度,从本例子看,速度没有较大提高,因为对employees_new表全表扫描速度很快,但是在非常大的表中,复合索引能够有效提高速度。
(5.2)inner join + 排序(group by 或者 order by)优化
常规inner join+排序的SQL语法如下:
SELECT <select_list>
FROM <left_table> inner join <right_table> ON <join_condition>
WHERE <where_condition>
GROUP BY <group_by_list>
ORDER BY <order_by_list>
优化方法:
1.与inner join一样,在被驱动表的连接条件上创建索引
2.inner join + 排序往往会在执行计划里面伴随着Using temporary Using filesort关键字出现,如果临时表或者排序的数据量很大,那么将会导致查询非常慢,需要特别重视;反之,临时表或者排序的数据量较小,例如只有几百条,那么即使执行计划有Using temporary Using filesort关键字,对查询速度影响也不大。如果说排序操作消耗了大部分的时间,那么可以考虑使用索引的有序性来消除排序,接下来对该优化方法进行讨论。
group by和order by都会对相关列进行排序,根据SQL是否存在GROUP BY或者ORDER BY关键字,分3种情况讨论:
如果SQL语句中同时含有group by和order by,只需考虑group by后面的排序即可。
因为MySQL先执行group by,后执行order by,通常group by之后数据量已经较少了,
后续的order by直接在磁盘上排序即可
对于上面3种情况:
1.如果优化考虑的排序列全部来源于驱动表,则可以考虑:在等值谓词过滤条件上+排序列上创建复合索引,这样可以使用索引先过滤数据,再使用索引按顺序获取数据。
2.如果优化考虑的排序列全部来源于某个被驱动表,则可以考虑:使用表连接hint(Straight_JOIN)控制连接顺序,将排序相关表设置为驱动表,然后按照1创建复合索引;
3.如果优化考虑的排序列来源于多个表,貌似没有好的解决办法,有想法的同学也可以留言,一起进步。
例子1:如果优化考虑的排序列全部来源于驱动表,则可以考虑:在等值谓词过滤条件上+排序列上创建复合索引,这样可以使用索引先过滤数据,再使用索引按顺序获取数据。
-- 1.驱动表e上存在排序
mysql> explain select e.first_name,sum(salary)
from employees_new e inner join salaries_new s on e.emp_no = s.emp_no
where e.last_name = 'Aamodt'
group by e.first_name;
+----+-------------+-------+------------+------+------------------------------------------------------+------------------------------+---------+--------------------+------+----------+-----------------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+------------------------------------------------------+------------------------------+---------+--------------------+------+----------+-----------------------------------------------------------+
| 1 | SIMPLE | e | NULL | ref | idx_employees_new_empno,idx_lastname_empno_firstname | idx_lastname_empno_firstname | 18 | const | 205 | 100.00 | Using where; Using index; Using temporary; Using filesort |
| 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL |
+----+-------------+-------+------------+------+------------------------------------------------------+------------------------------+---------+--------------------+------+----------+-----------------------------------------------------------+
-- 2.在驱动表e上的等值谓词过滤条件last_name和排序列first_name上创建索引
mysql> create index idx_lastname_firstname on employees_new (last_name,first_name);
-- 3.可以看到,排序消除
mysql> explain select e.first_name,sum(salary)
from employees_new e inner join salaries_new s on e.emp_no = s.emp_no
where e.last_name = 'Aamodt'
group by e.first_name;
+----+-------------+-------+------------+------+----------------------------------------------------------------------------------+------------------------+---------+--------------------+------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+----------------------------------------------------------------------------------+------------------------+---------+--------------------+------+----------+-----------------------+
| 1 | SIMPLE | e | NULL | ref | idx_employees_new_empno,idx_employees_new_empno_firstname,idx_lastname_firstname | idx_lastname_firstname | 18 | const | 205 | 100.00 | Using index condition |
| 1 | SIMPLE | s | NULL | ref | idx_empno | idx_empno | 4 | employees.e.emp_no | 9 | 100.00 | NULL |
+----+-------------+-------+------------+------+----------------------------------------------------------------------------------+------------------------+---------+--------------------+------+----------+-----------------------+
需要说明的是,消除排序只是提供了一种数据优化的方式,消除排序后,其速度并不一定会比之前快,需要具体问题具体分析测试。
例子2:如果优化考虑的排序列全部来源于某个被驱动表,则可以考虑:使用表连接hint(Straight_JOIN)控制连接顺序,将排序相关表设置为驱动表,然后按照1创建复合索引;
-- 1. 被驱动表s上存在排序
mysql> explain select s.from_date,sum(salary)
from employees_new e inner join salaries_new s on e.emp_no = s.emp_no
where e.last_name = 'Aamodt'
and s.salary = 40000
group by s.from_date;
+----+-------------+-------+------------+------+------------------...-------+------------------------+---------+--------------------+------+----------+---------------------------------+
| id | select_type | table | partitions | type | possible_keys ... | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+------------------...-------+------------------------+---------+--------------------+------+----------+---------------------------------+
| 1 | SIMPLE | e | NULL | ref | idx_employees_new...stname | idx_lastname_firstname | 18 | const | 205 | 100.00 | Using temporary; Using filesort |
| 1 | SIMPLE | s | NULL | ref | idx_empno ... | idx_empno | 4 | employees.e.emp_no | 9 | 10.00 | Using where |
+----+-------------+-------+------------+------+------------------...-------+------------------------+---------+--------------------+------+----------+---------------------------------+
-- 2. 使用Straight_join改变表的连接顺序
mysql> explain select s.from_date,sum(salary)
from salaries_new s STRAIGHT_JOIN employees_new e on e.emp_no = s.emp_no
where e.last_name = 'Aamodt'
and s.salary = 40000
group by s.from_date;
+----+-------------+-------+------------+------+-----------------...----------+-------------------------+---------+--------------------+---------+----------+----------------------------------------------+
| id | select_type | table | partitions | type | possible_keys ... | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+-----------------...----------+-------------------------+---------+--------------------+---------+----------+----------------------------------------------+
| 1 | SIMPLE | s | NULL | ALL | idx_empno ... | NULL | NULL | NULL | 2837194 | 10.00 | Using where; Using temporary; Using filesort |
| 1 | SIMPLE | e | NULL | ref | idx_employees_ne...firstname | idx_employees_new_empno | 4 | employees.s.emp_no | 1 | 5.00 | Using where |
+----+-------------+-------+------------+------+-----------------...----------+-------------------------+---------+--------------------+---------+----------+----------------------------------------------+
-- 3. 在新的驱动表上创建等值谓词+排序列索引
mysql> create index idx_salary_fromdate on salaries_new(salary,from_date);
Query OK, 0 rows affected (5.39 sec)
Records: 0 Duplicates: 0 Warnings: 0
-- 4. 可以看到,消除排序
mysql> explain select s.from_date,sum(salary)
from salaries_new s STRAIGHT_JOIN employees_new e on e.emp_no = s.emp_no
where e.last_name = 'Aamodt'
and s.salary = 40000
group by s.from_date;
+----+-------------+-------+------------+------+---------------------------------...--+-------------------------+---------+--------------------+--------+----------+-----------------------+
| id | select_type | table | partitions | type | possible_keys ... | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------------------------...--+-------------------------+---------+--------------------+--------+----------+-----------------------+
| 1 | SIMPLE | s | NULL | ref | idx_empno,idx_salary_fromdate ... | idx_salary_fromdate | 4 | const | 199618 | 100.00 | Using index condition |
| 1 | SIMPLE | e | NULL | ref | idx_employees_new_empno,idx_empl...e | idx_employees_new_empno | 4 | employees.s.emp_no | 1 | 5.00 | Using where |
+----+-------------+-------+------------+------+---------------------------------...--+-------------------------+---------+--------------------+--------+----------+-----------------------+
需要说明的是,大部分情况下,MySQL优化器会自动选择最优的表连接方式,Straight_join的引入往往会造成大表做驱动表的情况出现,虽然消除了排序,但是又引入了新的麻烦。到底是排序带来的开销大,还是NLJ循环嵌套不合理带来的开销大,需要具体情况具体分析。
(5.3)left join优化
在MySQL中外连接(left join、right join 、full join)会被优化器转换为left join,因此,外连接只需讨论left join即可。常规left join的SQL语法如下:
SELECT <select_list>
FROM <left_table> left join <right_table> ON <join_condition>
WHERE <where_condition>
GROUP BY <group_by_list>
ORDER BY <order_by_list>
优化方法:
1.与inner join一样,在被驱动表的连接条件上创建索引
2.left join的表连接顺序都是从左像右的,我们无法改变表连接顺序。但是如果右表在where条件中存在谓词过滤,则MySQL会将left join自动转换为inner join,其原理图如下:
例子1:.如果右表在where条件中存在谓词过滤,则MySQL会将left join自动转换为inner join
创建测试表:
create table dept
deptno int,
dname varchar(20)
insert into dept values (10, 'sales'),(20, 'hr'),(30, 'product'),(40, 'develop');
create table emp
empno int,
ename varchar(20),
deptno varchar(20)
insert into emp values (1,'aa',10),(2,'bb',10),(3,'cc',20),(4,'dd',30),(5,'ee',30);
执行left join,查看其执行计划,发现并不是左表作为驱动表
mysql> explain select d.dname,e.ename
from dept d left join emp e
on d.deptno = e.deptno
where e.deptno = 30;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
| 1 | SIMPLE | e | NULL | ALL | NULL | NULL | NULL | NULL | 5 | 20.00 | Using where |
| 1 | SIMPLE | d | NULL | ALL | NULL | NULL | NULL | NULL | 4 | 25.00 | Using where; Using join buffer (Block Nested Loop) |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+----------------------------------------------------+
通过trace追踪,发现MySQL对其该语句进行了等价改写,将外连接改为了内连接。
mysql> set optimizer_trace="enabled=on",end_markers_in_JSON=on;
Query OK, 0 rows affected (0.00 sec)
mysql> select d.dname,e.ename
from dept d left join emp e
on d.deptno = e.deptno
where e.deptno = 30;
+---------+-------+
| dname | ename |
+---------+-------+
| product | dd |
| product | ee |
+---------+-------+
2 rows in set (0.03 sec)
mysql> select * from information_schema.optimizer_trace;
| MISSING_BYTES_BEYOND_MAX_MEM_SIZE | INSUFFICIENT_PRIVILEGES |
select d.dname,e.ename
from dept d left join emp e
on d.deptno = e.deptno
where e.deptno = 30 | {
"steps": [
"join_preparation": {
"select#": 1,
"steps": [
"expanded_query": "/* select#1 */ select `d`.`dname` AS `dname`,`e`.`ename` AS `ename` from (`dept` `d` left join `emp` `e` on((`d`.`deptno` = `e`.`deptno`))) where (`e`.`deptno` = 30)"
"transformations_to_nested_joins": {
"transformations": [
"outer_join_to_inner_join",
"JOIN_condition_to_WHERE",
"parenthesis_removal"
] /* transformations */,
"expanded_query": "/* select#1 */ select `d`.`dname` AS `dname`,`e`.`ename` AS `ename` from `dept` `d` join `emp` `e` where ((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))"
} /* transformations_to_nested_joins */
] /* steps */
} /* join_preparation */
"join_optimization": {
"select#": 1,
"steps": [
"condition_processing": {
"condition": "WHERE",
"original_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))",
"steps": [
"transformation": "equality_propagation",
"resulting_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))"
"transformation": "constant_propagation",
"resulting_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))"
"transformation": "trivial_condition_removal",
"resulting_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))"
] /* steps */
} /* condition_processing */
"substitute_generated_columns": {
} /* substitute_generated_columns */
"table_dependencies": [
"table": "`dept` `d`",
"row_may_be_null": false,
"map_bit": 0,
"depends_on_map_bits": [
] /* depends_on_map_bits */
"table": "`emp` `e`",
"row_may_be_null": true,
"map_bit": 1,
"depends_on_map_bits": [
] /* depends_on_map_bits */
] /* table_dependencies */
"ref_optimizer_key_uses": [
] /* ref_optimizer_key_uses */
"rows_estimation": [
"table": "`dept` `d`",
"table_scan": {
"rows": 4,
"cost": 1
} /* table_scan */
"table": "`emp` `e`",
"table_scan": {
"rows": 5,
"cost": 1
} /* table_scan */
] /* rows_estimation */
"considered_execution_plans": [
"plan_prefix": [
] /* plan_prefix */,
"table": "`dept` `d`",
"best_access_path": {
"considered_access_paths": [
"rows_to_scan": 4,
"access_type": "scan",
"resulting_rows": 4,
"cost": 1.8,
"chosen": true
] /* considered_access_paths */
} /* best_access_path */,
"condition_filtering_pct": 100,
"rows_for_plan": 4,
"cost_for_plan": 1.8,
"rest_of_plan": [
"plan_prefix": [
"`dept` `d`"
] /* plan_prefix */,
"table": "`emp` `e`",
"best_access_path": {
"considered_access_paths": [
"rows_to_scan": 5,
"access_type": "scan",
"using_join_cache": true,
"buffers_needed": 1,
"resulting_rows": 1,
"cost": 2.6007,
"chosen": true
] /* considered_access_paths */
} /* best_access_path */,
"condition_filtering_pct": 100,
"rows_for_plan": 4,
"cost_for_plan": 4.4007,
"chosen": true
] /* rest_of_plan */
"plan_prefix": [
] /* plan_prefix */,
"table": "`emp` `e`",
"best_access_path": {
"considered_access_paths": [
"rows_to_scan": 5,
"access_type": "scan",
"resulting_rows": 1,
"cost": 2,
"chosen": true
] /* considered_access_paths */
} /* best_access_path */,
"condition_filtering_pct": 100,
"rows_for_plan": 1,
"cost_for_plan": 2,
"rest_of_plan": [
"plan_prefix": [
"`emp` `e`"
] /* plan_prefix */,
"table": "`dept` `d`",
"best_access_path": {
"considered_access_paths": [
"rows_to_scan": 4,
"access_type": "scan",
"using_join_cache": true,
"buffers_needed": 1,
"resulting_rows": 4,
"cost": 1.8002,
"chosen": true
] /* considered_access_paths */
} /* best_access_path */,
"condition_filtering_pct": 100,
"rows_for_plan": 4,
"cost_for_plan": 3.8002,
"chosen": true
] /* rest_of_plan */
] /* considered_execution_plans */
"attaching_conditions_to_tables": {
"original_condition": "((`e`.`deptno` = 30) and (`d`.`deptno` = `e`.`deptno`))",
"attached_conditions_computation": [
] /* attached_conditions_computation */,
"attached_conditions_summary": [
"table": "`emp` `e`",
"attached": "(`e`.`deptno` = 30)"
"table": "`dept` `d`",
"attached": "(`d`.`deptno` = `e`.`deptno`)"
] /* attached_conditions_summary */
} /* attaching_conditions_to_tables */
"refine_plan": [
"table": "`emp` `e`"
"table": "`dept` `d`"
] /* refine_plan */
] /* steps */
} /* join_optimization */
"join_execution": {
"select#": 1,
"steps": [
] /* steps */
} /* join_execution */
] /* steps */
} | 0 | 0 |
+----------------------------------------------------------------------------
mysql>
View Code
1.嵌套循环连接算法:https://dev.mysql.com/doc/refman/5.7/en/nested-loop-joins.html
2.外部连接优化:https://dev.mysql.com/doc/refman/5.7/en/outer-join-optimization.html
Note:MySQL菜鸟一枚,文章仅代表个人观点,如有不对,敬请指出,共同进步,谢谢。