最左前缀原理与相关优化

栏目: 数据库 · 发布时间: 4年前

内容简介:MySQL 中的索引可以以一定顺序引用多个列,这种索引叫做联合索引,一般的,一个联合索引是一个有序元组我们在

最左前缀原理与相关优化

MySQL 中的索引可以以一定顺序引用多个列,这种索引叫做联合索引,一般的,一个联合索引是一个有序元组 <a1, a2, …, an> ,其中各个元素均为数据表的一列。另外,单列索引可以看成联合索引元素数为 1 的特例。

我们在 Employees Sample Database 中实验,MySQL 版本 5.7。

以 employees.titles 为例,查看其索引:

SHOW INDEX FROM employees.titles;

+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| Table  | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+
| titles |          0 | PRIMARY  |            1 | emp_no      | A         |      301292 |     NULL | NULL   |      | BTREE      |         |               |
| titles |          0 | PRIMARY  |            2 | title       | A         |      442605 |     NULL | NULL   |      | BTREE      |         |               |
| titles |          0 | PRIMARY  |            3 | from_date   | A         |      442605 |     NULL | NULL   |      | BTREE      |         |               |
+--------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+---------------+

全列匹配

EXPLAIN SELECT * FROM employees.titles
WHERE
	emp_no = '10009'
	AND title = 'Senior Engineer'
	AND from_date = '1995-02-18';

+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref               | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
|  1 | SIMPLE      | titles | NULL       | const | PRIMARY       | PRIMARY | 159     | const,const,const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+

当按照索引中所有列进行精确匹配(这里精确匹配指 =IN 匹配)时,索引可以被用到。

这里有一点需要注意,理论上索引对顺序是敏感的,但是由于 MySQL查询优化器会自动调整 where 子句的条件顺序 以使用适合的索引,例如我们将 where 中的条件顺序颠倒:

EXPLAIN SELECT * FROM employees.titles
WHERE
	from_date = '1995-02-18'
	AND emp_no IN ( '10009' )
	AND title = 'Senior Engineer';

+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref               | rows | filtered | Extra |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+
|  1 | SIMPLE      | titles | NULL       | const | PRIMARY       | PRIMARY | 159     | const,const,const |    1 |   100.00 | NULL  |
+----+-------------+--------+------------+-------+---------------+---------+---------+-------------------+------+----------+-------+

和上面是一样的。

最左前缀匹配

EXPLAIN SELECT * FROM employees.titles
WHERE
	emp_no = '10009';

+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-------+
| id | select_type | table  | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-------+
|  1 | SIMPLE      | titles | NULL       | ref  | PRIMARY       | PRIMARY | 4       | const |    3 |   100.00 | NULL  |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-------+

当查询条件精确匹配索引的 左边连续一个或几个列 时,如 <emp_no><emp_no, title> ,所以可以被用到,但是只能用到一部分,即条件所组成的最左前缀。

上面的查询从分析结果看用到了 PRIMARY 索引,但是 key_len 为 4,说明只用到了索引的第一列前缀。

中间某个条件未提供

查询条件用到了索引中列的精确匹配,但是中间某个条件未提供。

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10009' AND from_date='1995-02-18';

+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | ref  | PRIMARY       | PRIMARY | 4       | const |    1 |    10.00 | Using where |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-------------+

此时索引使用情况和情况二相同,因为 title 未提供,所以查询只用到了索引的第一列,而后面的 from_date 虽然也在索引中,但是由于 title 不存在而无法和左前缀连接,因此需要对结果进行扫描过滤 from_date (这里由于 emp_no 唯一,所以不存在扫描)。

如果想让 from_date 也使用索引而不是 where 过滤,可以增加一个 辅助索引 <emp_no, from_date> ,此时上面的查询会使用这个索引。

除此之外,还可以使用一种称之为 隔离列 的优化方法,将 emp_nofrom_date 之间的 _坑_ 填上。

首先我们看下 title 有几种不同的值:

SELECT DISTINCT(title) FROM employees.titles;

只有 7 种。在这种成为 _坑_ 的列值比较少的情况下,可以考虑用 IN 来填补这个 _坑_ 从而形成最左前缀:

EXPLAIN SELECT * FROM employees.titles
WHERE emp_no='10009'
AND title IN ('Senior Engineer', 'Staff', 'Engineer', 'Senior Staff', 'Assistant Engineer', 'Technique Leader', 'Manager')
AND from_date='1995-02-18';

+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | range | PRIMARY       | PRIMARY | 159     | NULL |    7 |   100.00 | Using where |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

这次 key_len159 ,说明索引被用全了,但是从 typerows 看出 IN 实际上执行了一个 range 查询,这里检查了 7 个 key。看下两种查询的性能比较:

SET profiling = 1;
SELECT * FROM...  -- 1
SELECT * FROM... -- 2
SHOW PROFILES;

| Query_ID | Duration   | Query   |
+----------+------------+---------+
|        1 | 0.00083950 | SELECT * ..|
|        2 | 0.00063700 | SELECT * ..|

“填坑” 后性能提升了一点。如果经过 emp_no 筛选后余下很多数据,则后者性能优势会更加明显。当然,如果 title 的值很多,用填坑就不合适了,必须建立辅助索引。(笔者:多次测试后发现是有快有慢,可能是数据的原因,效果并不明显)

查询条件没有指定索引第一列

EXPLAIN SELECT * FROM employees.titles WHERE from_date='1995-02-18';

+----+-------------+--------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+--------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 442605 |    10.00 | Using where |
+----+-------------+--------+------------+------+---------------+------+---------+------+--------+----------+-------------+

由于不是最左前缀,索引这样的查询显然用不到索引。

匹配某列的前缀字符串

EXPLAIN SELECT * FROM employees.titles WHERE emp_no=10009 AND title LIKE 'Senior%';

+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | range | PRIMARY       | PRIMARY | 156     | NULL |    1 |   100.00 | Using where |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

此时可以用到索引,如果通配符 % 不出现在开头,则可以用到索引,但根据具体情况不同可能只会用其中一个前缀。

范围查询

EXPLAIN SELECT * FROM employees.titles WHERE emp_no < '10010' and title = 'Senior Engineer';

+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |   14 |    10.00 | Using where |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

范围列可以用到索引(必须是最左前缀),但是范围列后面的列无法用到索引。同时,索引最多用于一个范围列,因此如果查询条件中有两个范围列则无法全用到索引。

EXPLAIN SELECT * FROM employees.titles
WHERE emp_no < '10010'
AND title = 'Senior Engineer'
AND from_date BETWEEN '1986-01-01' AND '1986-12-31';

+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |   14 |     1.11 | Using where |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

可以看到索引对第二个范围索引无能为力。这里特别要说明 MySQL 一个有意思的地方,那就是仅用 explain 可能无法区分 范围索引 和 多值匹配,因为在 type 中这两者都显示为 range

同时,用了 between 并不意味着就是范围查询,例如下面的查询:

EXPLAIN SELECT * FROM employees.titles
WHERE emp_no BETWEEN '10001' AND '10010'
AND title='Senior Engineer'
AND from_date BETWEEN '1986-01-01' AND '1986-12-31';

+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | range | PRIMARY       | PRIMARY | 159     | NULL |   15 |     1.11 | Using where |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

看起来是用了两个范围查询,但作用于 emp_no 上的 BETWEEN 实际上相当于 IN ,也就是说 emp_no 实际是多值精确匹配。可以看到这个查询用到了索引全部三个列。因此在 MySQL 中要谨慎地区分多值匹配和范围匹配,否则会对 MySQL 的行为产生困惑。

还有个值得注意的事情:

EXPLAIN SELECT * FROM employees.titles where emp_no > 10000 AND emp_no < 10011 AND title='Senior Engineer' AND from_date BETWEEN '1986-01-01' AND '1986-12-31';

+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | range | PRIMARY       | PRIMARY | 4       | NULL |   15 |     1.11 | Using where |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

EXPLAIN SELECT * FROM employees.titles where emp_no >= 10001 and emp_no <= 10010 AND title='Senior Engineer' AND from_date BETWEEN '1986-01-01' AND '1986-12-31';

+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
| id | select_type | table  | partitions | type  | possible_keys | key     | key_len | ref  | rows | filtered | Extra       |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | range | PRIMARY       | PRIMARY | 159     | NULL |   15 |     1.11 | Using where |
+----+-------------+--------+------------+-------+---------------+---------+---------+------+------+----------+-------------+

疑问: = 影响 范围索引 还是 多值匹配?

查询条件中含有函数或表达式

很不幸,如果查询条件中含有函数或表达式,则 MySQL 不会为这列使用索引(虽然某些在数学意义上可以使用)。例如:

EXPLAIN SELECT * FROM employees.titles WHERE emp_no='10009' AND left(title, 6)='Senior';

+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys | key     | key_len | ref   | rows | filtered | Extra       |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | ref  | PRIMARY       | PRIMARY | 4       | const |    3 |   100.00 | Using where |
+----+-------------+--------+------------+------+---------------+---------+---------+-------+------+----------+-------------+

虽然这个查询和情况五中功能相同,但是由于使用了函数 left,则无法为 title 列应用索引,而情况五中用 LIKE 则可以。

EXPLAIN SELECT * FROM employees.titles WHERE emp_no - 1=10000;

+----+-------------+--------+------------+------+---------------+------+---------+------+--------+----------+-------------+
| id | select_type | table  | partitions | type | possible_keys | key  | key_len | ref  | rows   | filtered | Extra       |
+----+-------------+--------+------------+------+---------------+------+---------+------+--------+----------+-------------+
|  1 | SIMPLE      | titles | NULL       | ALL  | NULL          | NULL | NULL    | NULL | 442605 |   100.00 | Using where |
+----+-------------+--------+------------+------+---------------+------+---------+------+--------+----------+-------------+

显然这个查询等价于查询 emp_no10001 的函数,但是由于查询条件是一个表达式,MySQL 无法为其使用索引。看来 MySQL 还没有智能到自动优化常量表达式的程度,因此在写查询语句时尽量避免表达式出现在查询中,而是先手工私下代数运算,转换为无表达式的查询语句。


以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Rework

Rework

Jason Fried、David Heinemeier Hansson / Crown Business / 2010-3-9 / USD 22.00

"Jason Fried and David Hansson follow their own advice in REWORK, laying bare the surprising philosophies at the core of 37signals' success and inspiring us to put them into practice. There's no jarg......一起来看看 《Rework》 这本书的介绍吧!

Base64 编码/解码
Base64 编码/解码

Base64 编码/解码

XML 在线格式化
XML 在线格式化

在线 XML 格式化压缩工具

UNIX 时间戳转换
UNIX 时间戳转换

UNIX 时间戳转换