内容简介:在测试环境开启GTID运行一年多之后,我们准备近期上线生产。为了保证GTID顺利的上线,在测试环境模拟各种故障场景,观察GTID 的表现案例采用sandbox测试,GTID的一些关键测试参数如下其中Master的UUID是00021800-1111-1111-1111-111111111111,Slave的UUID是 00021802-3333-3333-3333-333333333333,我们简称 Master的UUID为1111,Slave的UUID为3333。
前言
在测试环境开启GTID运行一年多之后,我们准备近期上线生产。为了保证GTID顺利的上线,在测试环境模拟各种故障场景,观察GTID 的表现
案例采用sandbox测试,GTID的一些关键测试参数如下
master-info-repository=table relay-log-info-repository=table gtid_mode=ON log-slave-updates enforce-gtid-consistency relay_log_recovery=on
案例一
其中Master的UUID是00021800-1111-1111-1111-111111111111,Slave的UUID是 00021802-3333-3333-3333-333333333333,我们简称 Master的UUID为1111,Slave的UUID为3333。
| 时间线 | Master | Slave | 备注 |
|---|---|---|---|
| t1 | CREATE TABLE t1 | 生成的GTID为1111:1856 | |
| t2 | STOP SLAVE SQL_THREAD | 停止从库的SQL THREAD 线程 | |
| t3 | RESET MASTER; | 清空从库的gtid_executed和binlog信息 | |
| t4 | SET GLOBAL GTID_PURGED='1111:1-1857' | 手动让从库认为1111:1857已经执行 | |
| t5 | DROP TABLE t1 | 主库删除t1,GTID为1111:1857 | |
| t6 | CREATE TABLE t1 | 主库创建t1,GTID为1111:1858 | |
| t7 | START SLAVE SQL_THREAD | 启动从库 SQL 线程 | |
| t8 | SHOW SLAVE STATUS\G | Error 'Table 't1' already exists' | |
| t9 | STOP SLAVE SQL_THREAD; | ||
| t10 | RESET MASTER | ||
| t11 | SET GLOBAL GTID_PURGED='1111:1-1856' | 让从库重新去执行1111:1857的事务 | |
| t12 | START SLAVE SQL_THREAD; | Error 'Table 't1' already exists' on query |
我们发现当我们不小心在从库让 SQL THREAD 跳过一个GTID事务的时候,想让从库重新执行这个跳过的GTID,却始终报错,我们把从库的线程的关键信息提供一下
Master_Log_File: mysql-bin.000009
Read_Master_Log_Pos: 2768
Relay_Log_File: mysql-relay.000006
Relay_Log_Pos: 594
Relay_Master_Log_File: mysql-bin.000009
Slave_IO_Running: Yes
Slave_SQL_Running: No
Exec_Master_Log_Pos: 2606
Last_SQL_Errno: 1050
Last_SQL_Error: Error 'Table 't1' already exists' on query. Default database: 'test'. Query: 'create table t1(id int)'
Replicate_Ignore_Server_Ids:
Retrieved_Gtid_Set: 00021800-1111-1111-1111-111111111111:1857-1858
Executed_Gtid_Set: 00021800-1111-1111-1111-111111111111:1-1856
Auto_Position: 1
刚刚我们执行的操作只是关闭了SQL THREAD,那么主库执行的DROP TABLE操作(被从库第一次手动跳过的)以及CREATE 操作都应该在relay log里面,我们解析一下,结果删去了一些不必要的输出。
$ mysqlbinlog mysql-relay.000006 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #190308 10:56:36 server id 102 end_log_pos 123 CRC32 0x73fa3393 Start: binlog v 4, server v 5.7.25-28-log created 190308 10:56:36 # This Format_description_event appears in a relay log and was generated by the slave thread. # at 123 #190308 10:56:36 server id 102 end_log_pos 154 CRC32 0x4c964c64 Previous-GTIDs # [empty] # at 154 #700101 8:00:00 server id 1 end_log_pos 0 CRC32 0x6677862d Rotate to mysql-bin.000009 pos: 4 # at 201 #190308 9:30:10 server id 1 end_log_pos 123 CRC32 0x536ddf69 Start: binlog v 4, server v 5.7.25-28-log created 190308 9:30:10 # at 320 #190308 10:56:36 server id 0 end_log_pos 367 CRC32 0x72b7d6b4 Rotate to mysql-bin.000009 pos: 194 # at 367 #190308 10:56:36 server id 0 end_log_pos 414 CRC32 0x73628a00 Rotate to mysql-bin.000009 pos: 2426 # at 414 #190308 10:57:44 server id 1 end_log_pos 2491 CRC32 0x18c5896c GTID last_committed=13 sequence_number=14 rbr_only=no SET @@SESSION.GTID_NEXT= '00021800-1111-1111-1111-111111111111:1857'/*!*/; # at 479 #190308 10:57:44 server id 1 end_log_pos 2606 CRC32 0x7c575de0 Query thread_id=4 exec_time=0 error_code=0 use `test`/*!*/; DROP TABLE `t1` /* generated by server */ 删除表的操作 /*!*/; # at 594 #190308 10:57:48 server id 1 end_log_pos 2671 CRC32 0x7138d3a8 GTID last_committed=14 sequence_number=15 rbr_only=no SET @@SESSION.GTID_NEXT= '00021800-1111-1111-1111-111111111111:1858'/*!*/; # at 659 #190308 10:57:48 server id 1 end_log_pos 2768 CRC32 0x3a8eeb36 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1552013868/*!*/; create table t1(id int) 创建表的操作 /*!*/; SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
我们可以看到 GTID 1111:1857 和 GTID 1111:1858 都在relay log里面,但是SQL thread 始终没有执行,我们看看为什么
Relay_Log_Pos = 594 表示的是SQL thread 执行的位点在relay log 的位置
我们看看594 在relay log 是什么event
# at 594 #190308 10:57:48 server id 1 end_log_pos 2671 CRC32 0x7138d3a8 GTID last_committed=14 sequence_number=15 rbr_only=no SET @@SESSION.GTID_NEXT= '00021800-1111-1111-1111-111111111111:1858'/*!*/; # at 659 #190308 10:57:48 server id 1 end_log_pos 2768 CRC32 0x3a8eeb36 Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1552013868/*!*/; create table t1(id int)
但是DROP TABLE 的位点是在 414 的位点,而SQL thread 位点却在594,所以始终没办法执行到 414 位点的DROP TABLE 操作,无法修复,那是为啥呢?
因为414 的DROP TABLE 操作的GTID 是 1111:1857 ,被我们人为的认为从库已经执行,当从库线程发现1111:1857已经执行过,则跳过该事务,继续往下执行回放。当执行到594位点,即GTID为1111:1858。这时候,当我们在t11时刻,设置gtid_purged=1111:1-1856',让从库可以重新执行GTID 为1111:1157 的事务,SQL thread 还是从relay log 的594位点开始执行,无法恢复。那怎么修复呢?
解决方案
1.让SQL thread 从414 位点开始执行
STOP SLAVE; CHANGE MASTER TO RELAY_LOG_FILE='mysql-relay.000006' ,RELAY_LOG_POS=414 ,MASTER_AUTO_POSITION=0; CHANGE MASTER TO MASTER_AUTO_POSITION=1; START SLAVE; SHOW SLAVE STATUS\G
我们再观察下slave的状态
Master_Log_File: mysql-bin.000009
Read_Master_Log_Pos: 2768
Relay_Log_File: mysql-relay.000002
Relay_Log_Pos: 756
Relay_Master_Log_File: mysql-bin.000009
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Exec_Master_Log_Pos: 2768
Retrieved_Gtid_Set: 00021800-1111-1111-1111-111111111111:1857-1858
Executed_Gtid_Set: 00021800-1111-1111-1111-111111111111:1-1858
我们看到1111:1157 已经被加入Executed_Gtid_Set。
2.重新启动MySQL Slave
因为relay_log_recovery为on的状态,老的relay log 会被删除,重新根据gtid_executed 去主库拉取binlog,这时候就会顺利往下执行了。
案例二
| 时间线 | Master | Slave | 备注 |
|---|---|---|---|
| t1 | CREATE TABLE t1 | 生成的GTID为1111:1858 | |
| t2 | STOP SLAVE | 停止从库的线程 | |
| t3 | RESET MASTER; | 清空从库的gtid_executed和binlog信息 | |
| t4 | SET GLOBAL GTID_PURGED='1111:1-1859' | 手动让从库认为1111:1859已经执行 | |
| t5 | DROP TABLE t1 | 主库删除t1,GTID为1111:1859 | |
| t6 | CREATE TABLE t1 | 主库创建t1,GTID为1111:1860 | |
| t7 | START SLAVE | 启动从库线程 | |
| t8 | SHOW SLAVE STATUS\G | Error 'Table 't1' already exists' | |
| t9 | STOP SLAVE; | ||
| t10 | RESET MASTER | ||
| t11 | SET GLOBAL GTID_PURGED='1111:1-1858' | 让从库重新去执行1111:1859的事务 | |
| t12 | START SLAVE; | Error 'Table 't1' already exists' on query |
这次和案例一不一样的地方是本次停止的是整个从库线程,包括IO线程和SQL线程。在启动前IO线程是没有拿到DROP的GTID为1111:1159的事务的,启动SLAVE线程后 MySQL 会怎么处理呢?
我们查一下官方文档,发现了以下这句话
In the initial handshake, the slave sends a GTID set containing the transactions that it has already received, committed, or both. This GTID set is equal to the union of the set of GTIDs in the gtid_executed system variable (@@GLOBAL.gtid_executed), and the set of GTIDs recorded in the Performance Schema replication_connection_status table as received transactions (the result of the statement SELECT RECEIVED_TRANSACTION_SET FROM PERFORMANCE_SCHEMA.replication_connection_status).
https://dev.mysql.com/doc/refman/5.7/en/replication-gtids-auto-positioning.html
也就是说,当从库连上主库的时候,从库会发送一个GTID集合,可能是已经执行的GTID集合,也可能是已经收到的GTID集合,或者是两者。这个GTID是已经执行的GTID集合和已经收到的GTID集合的并集。
当t7 时刻启动slave的时候,已经执行的GTID集合为1111:1-1859,已经收到的GTID集合为1111:1-1858,发送给主库的GTID集合为1111:1-1859。主库需要发送给从库的GTID为1111:1860,也就是CREATE TABLE语句,我们去relay log里面验证下
当前slave 的状态
Master_Log_File: mysql-bin.000009
Read_Master_Log_Pos: 3110
Relay_Log_File: mysql-relay.000005
Relay_Log_Pos: 414
Relay_Master_Log_File: mysql-bin.000009
Slave_IO_Running: Yes
Slave_SQL_Running: No
Exec_Master_Log_Pos: 2948
Last_SQL_Errno: 1050
Last_SQL_Error: Error 'Table 't1' already exists' on query. Default database: 'test'. Query: 'create table t1(id int)'
Retrieved_Gtid_Set: 00021800-1111-1111-1111-111111111111:1859-1860
Executed_Gtid_Set: 00021800-1111-1111-1111-111111111111:1-1858
我们看下 mysql-relay.000005的内容
$mysqlbinlog mysql-relay.000005 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #190308 12:15:33 server id 102 end_log_pos 123 CRC32 0x1f2e3e22 Start: binlog v 4, server v 5.7.25-28-log created 190308 12:15:33 # This Format_description_event appears in a relay log and was generated by the slave thread. # at 123 #190308 12:15:33 server id 102 end_log_pos 154 CRC32 0x5422dcce Previous-GTIDs # [empty] # at 154 #700101 8:00:00 server id 1 end_log_pos 0 CRC32 0x6677862d Rotate to mysql-bin.000009 pos: 4 # at 201 #190308 9:30:10 server id 1 end_log_pos 123 CRC32 0x536ddf69 Start: binlog v 4, server v 5.7.25-28-log created 190308 9:30:10 # at 320 #190308 12:15:33 server id 0 end_log_pos 367 CRC32 0x182ec41f Rotate to mysql-bin.000009 pos: 194 # at 367 #190308 12:15:33 server id 0 end_log_pos 414 CRC32 0x7c85ab34 Rotate to mysql-bin.000009 pos: 2948 # at 414 #190308 12:15:27 server id 1 end_log_pos 3013 CRC32 0x4ab790eb GTID last_committed=16 sequence_number=17 rbr_only=no SET @@SESSION.GTID_NEXT= '00021800-1111-1111-1111-111111111111:1860'/*!*/; # at 479 #190308 12:15:27 server id 1 end_log_pos 3110 CRC32 0xe74d818a Query thread_id=4 exec_time=0 error_code=0 use `test`/*!*/; create table t1(id int) 建表的操作 /*!*/; # at 576 #190308 12:26:37 server id 102 end_log_pos 625 CRC32 0x532b5fe7 Rotate to mysql-relay.000006 pos: 4 SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
我们发现 Relay_Log_Pos: 414 是CREATE TABLE的语句,GTID为1111:1860,在relay log mysql-relay.000005未发现GTID 为1111:1859 的DROP语句,我们发现后面还要一个relay log为mysql-relay.000006,我们看看里面的内容
$mysqlbinlog mysql-relay.000006 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #190308 12:26:37 server id 102 end_log_pos 123 CRC32 0x4357741b Start: binlog v 4, server v 5.7.25-28-log created 190308 12:26:37 # This Format_description_event appears in a relay log and was generated by the slave thread. # at 123 #190308 12:26:37 server id 102 end_log_pos 194 CRC32 0xa087d1c5 Previous-GTIDs # 00021800-1111-1111-1111-111111111111:1860 # at 194 #700101 8:00:00 server id 1 end_log_pos 0 CRC32 0x6677862d Rotate to mysql-bin.000009 pos: 4 # at 241 #190308 9:30:10 server id 1 end_log_pos 123 CRC32 0x536ddf69 Start: binlog v 4, server v 5.7.25-28-log created 190308 9:30:10 # at 360 #190308 12:26:37 server id 0 end_log_pos 407 CRC32 0xaea58e04 Rotate to mysql-bin.000009 pos: 194 # at 407 #190308 12:26:37 server id 0 end_log_pos 454 CRC32 0xb19784ba Rotate to mysql-bin.000009 pos: 2768 # at 454 #190308 12:15:23 server id 1 end_log_pos 2833 CRC32 0x2279702d GTID last_committed=15 sequence_number=16 rbr_only=no SET @@SESSION.GTID_NEXT= '00021800-1111-1111-1111-111111111111:1859'/*!*/; # at 519 #190308 12:15:23 server id 1 end_log_pos 2948 CRC32 0xe93f8d9d Query thread_id=4 exec_time=0 error_code=0 use `test`/*!*/; DROP TABLE `t1` /* generated by server */ 删表的操作 /*!*/; # at 634 #190308 12:26:37 server id 0 end_log_pos 681 CRC32 0xd7fd7810 Rotate to mysql-bin.000009 pos: 3110 SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
我们发现GTID为1111:1859的事务在1111:1860的后面,这也是符合预期的,因为在t7时刻启动从库线程的时候,主库需要发给从库的GTID只有1111:1860,因此mysql-relay.000005里面只有1111:1860,当我们在t11时刻执行SET GLOBAL GTID_PURGED='1111:1-1858'的时候,从库发送给主库的GTID集合为 1111:1-1858和 1111:1-1858:1860合集,也就是1111:1-1158:1860,这时候主库需要发送给从库的GTID 为1111:1859。
但是由于1111:1859 在 1111:1860 的后面,因此始终无法执行1111:1859的DROP事务,导致一直报错。
解决方案
重启MySQL Slave
在t12操作之后,直接重启从库,此时,mysql-relay.000005 和 mysql-relay.000006 都被删除,这时候由于gtid_executed 变量为1111:1-1858,那么从主库重新拉取的GTID为1111:1859-1860,我们从relay log mysql-relay.000007 和 mysql-relay.000008 里面验证一下
由于mysql-relay.000007 里面没有包含实际的事务信息,我们直接看mysql-relay.000008
$mysqlbinlog mysql-relay.000008 /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #190308 13:22:11 server id 102 end_log_pos 123 CRC32 0xd3d8ddd6 Start: binlog v 4, server v 5.7.25-28-log created 190308 13:22:11 # This Format_description_event appears in a relay log and was generated by the slave thread. # at 123 #190308 13:22:11 server id 102 end_log_pos 154 CRC32 0x5c6327ce Previous-GTIDs # [empty] # at 154 #700101 8:00:00 server id 1 end_log_pos 0 CRC32 0x6677862d Rotate to mysql-bin.000009 pos: 4 # at 201 #190308 9:30:10 server id 1 end_log_pos 123 CRC32 0x536ddf69 Start: binlog v 4, server v 5.7.25-28-log created 190308 9:30:10 BINLOG ' osWBXA8BAAAAdwAAAHsAAAAAAAQANS43LjI1LTI4LWxvZwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAAXwAEGggAAAAICAgCAAAACgoKKioAEjQA AWnfbVM= '/*!*/; # at 320 #190308 13:22:11 server id 0 end_log_pos 367 CRC32 0x08b48d1c Rotate to mysql-bin.000009 pos: 194 # at 367 #190308 13:22:11 server id 0 end_log_pos 414 CRC32 0xc7dd1334 Rotate to mysql-bin.000009 pos: 2768 # at 414 #190308 12:15:23 server id 1 end_log_pos 2833 CRC32 0x2279702d GTID last_committed=15 sequence_number=16 rbr_only=no SET @@SESSION.GTID_NEXT= '00021800-1111-1111-1111-111111111111:1859'/*!*/; # at 479 #190308 12:15:23 server id 1 end_log_pos 2948 CRC32 0xe93f8d9d Query thread_id=4 exec_time=0 error_code=0 use `test`/*!*/; DROP TABLE `t1` /* generated by server */ /*!*/; # at 594 #190308 12:15:27 server id 1 end_log_pos 3013 CRC32 0x4ab790eb GTID last_committed=16 sequence_number=17 rbr_only=no SET @@SESSION.GTID_NEXT= '00021800-1111-1111-1111-111111111111:1860'/*!*/; # at 659 #190308 12:15:27 server id 1 end_log_pos 3110 CRC32 0xe74d818a Query thread_id=4 exec_time=0 error_code=0 SET TIMESTAMP=1552018527/*!*/; create table t1(id int) /*!*/; SET @@SESSION.GTID_NEXT= 'AUTOMATIC' /* added by mysqlbinlog */ /*!*/; DELIMITER ; # End of log file /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/;
我们看到mysql-relay.000008 包含了1111:1859-1860,从库SQL线程也恢复正常状态
Master_Log_File: mysql-bin.000009
Read_Master_Log_Pos: 3110
Relay_Log_File: mysql-relay.000008
Relay_Log_Pos: 756
Relay_Master_Log_File: mysql-bin.000009
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Retrieved_Gtid_Set: 00021800-1111-1111-1111-111111111111:1859-1860
Executed_Gtid_Set: 00021800-1111-1111-1111-111111111111:1-1860
结语
GTID在上线生产的时候,需要多做测试,明白其底层的原理,并不是所有未执行的GTID事务一定会被执行,也需要注意GTID的事务的执行顺序。重启大法还是很有用的哈哈哈
以上就是本文的全部内容,希望本文的内容对大家的学习或者工作能带来一定的帮助,也希望大家多多支持 码农网
本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们。
An Introduction to Genetic Algorithms
Melanie Mitchell / MIT Press / 1998-2-6 / USD 45.00
Genetic algorithms have been used in science and engineering as adaptive algorithms for solving practical problems and as computational models of natural evolutionary systems. This brief, accessible i......一起来看看 《An Introduction to Genetic Algorithms》 这本书的介绍吧!