そもそも5.1から5.7へのレプリケーションはサポートされていないのは知っている
レプリカが
character_set_server = utf8mb4
で起動した後、SET GLOBAL character_set_server = utf8
はしていたけど、mysqldを再起動しないとI/Oスレッドはutf8mb4でコネクションをネゴシエートしようとして、5.1側に「そんなcharsetは知らん」と言われていたようだ。レプリカのmysqldを再起動して解決。
MySQL 5.1からMySQL 5.7にお引越ししようと、5.7のインスタンスを作って5.1をソースにした CHANGE MASTER TO
してから START SLAVE
した。
2022-01-20T17:47:46.873075+09:00 19926 [Note] 'CHANGE MASTER TO FOR CHANNEL '' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='xxxxx', master_port= 3306, master_log_file='binary_log.000005', master_log_pos= 237438337, master_bind=''.
2022-01-20T17:47:46.878668+09:00 19927 [Note] Slave I/O thread: Start semi-sync replication to master 'repluser@xxxxx:3306' in log 'binary_log.000005' at position 237438337
2022-01-20T17:47:46.878742+09:00 19927 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2022-01-20T17:47:46.880010+09:00 19928 [Note] Slave SQL thread for channel '' initialized, starting replication in log 'binary_log.000005' at position 237438337, relay log '/usr/local/mysql/data/relay_log.000001' position: 4
2022-01-20T17:47:46.887069+09:00 19927 [Note] Slave I/O thread for channel '': connected to master 'repluser@xxxxx:3306',replication started in log 'binary_log.000005' at position 237438337
2022-01-20T17:47:46.890918+09:00 19927 [Warning] Slave I/O for channel '': Notifying master by SET @master_binlog_checksum= @@global.binlog_checksum failed with error: Unknown system variable 'binlog_checksum', Error_code: 1193
2022-01-20T17:47:46.891412+09:00 19927 [Warning] Slave I/O for channel '': Unknown system variable 'SERVER_UUID' on master. A probable cause is that the variable is not supported on the master (version: 5.1.54-log), even though it is on the slave (version: 5.7.27-log), Error_code: 1193
2022-01-20T17:47:46.901204+09:00 19927 [Warning] Master server does not support semi-sync, fallback to asynchronous replication
2022-01-20T17:47:53.468844+09:00 19927 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2022-01-20T17:47:53.470482+09:00 19927 [ERROR] Slave I/O for channel '': error reconnecting to master 'repluser@xxxxx:3306' - retry-time: 60 retries: 1, Error_code: 0
2022-01-20T17:48:53.472305+09:00 19927 [ERROR] Slave I/O for channel '': error reconnecting to master 'repluser@xxxxx:3306' - retry-time: 60 retries: 2, Error_code: 0
2022-01-20T17:49:53.474224+09:00 19927 [ERROR] Slave I/O for channel '': error reconnecting to master 'repluser@xxxxx:3306' - retry-time: 60 retries: 3, Error_code: 0
2022-01-20T17:50:53.476546+09:00 19927 [ERROR] Slave I/O for channel '': error reconnecting to master 'repluser@xxxxx:3306' - retry-time: 60 retries: 4, Error_code: 0
..
折角なので細かく見ていく。
CHANGE MASTER TO
が実行されると吐くログ。Previous state
とNew state
がそれぞれ記録される。べんり。
START SLAVE
した瞬間にI/Oスレッドが動き始めたログ。- semi-syncプラグインを噛ませてる(けれど5.1にはsemi-syncはない)のはあるけど、取り敢えずこの段階ではスルーする
CHANGE MASTER TO
でmaster_user
とmaster_password
を指定してると出るようになったログ。5.6から。
- SQLスレッドが起動したログ
- 起動したI/Oスレッドがレプリケーションソースに接続できたログ
- binlog_checksum はMySQL 5.6で追加されたパラメーターなので、5.6とそれ以降のレプリカが5.5とそれ以前のレプリケーションソースに接続しに行こうとすると出るログ。
- チェックサムが検証できないってだけで、直接レプリケーションが止まるようなものではない
server_uuid がMySQL 5.6で追加された関係で、5.6とそれ以降の略
これもレプリケーションを直接止めるような原因ではない
- semisyncはMySQL 5.5からなので、ソースが5.1なら当然こうなる。
- いつもの(?) asyncレプリケーションに自動でフォールバックするので問題ない
- I/Oスレッドがasyncモードに切り替えて再接続しようとしたのでもう一発同じのが出た
- フツーは
Error_code
のところに失敗理由(ソース由来ならば多くは1000番台か3000番台のエラー、疎通ができなかったとかなら2000万台のエラー)が出るはずなんだけれど、出ない( ゚д゚) - ううむ
ちなみにソース側のエラーログは何も吐いてなかった。
構築してたやつは↓のような形になっていて、緑色の矢印(今回問題になったところ)の他に、赤い矢印(5.7同士で準同期レプリケーション)もある。
確か準同期と非同期って混ぜると上手く動かなくなかったっけな? と思ったので、5.7のNew SourceとNew ReplicaでそれぞれsemisyncをOFFにしてSTOP SLAVE/START SLAVE。
状況は、変わらず。
- ちなみに、俺が疑った準同期と非同期混ぜると上手く動かないのは、マルチソースレプリケーションの場合に混ぜると動かないやつとごっちゃになっただけでした
MySQLプロトコル上の問題かとも思って、 mysql
コマンドラインクライアントでも接続してみる。
$ mysql -h xxxxx -u replicator -pyyyy
mysql> SHOW GRANTS;
+---------------------------------------------------------------------------------------------------------------------+
| Grants for replicator@% |
+---------------------------------------------------------------------------------------------------------------------+
| GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'%' IDENTIFIED BY PASSWORD '*xxx' |
+---------------------------------------------------------------------------------------------------------------------+
mysql> SHOW BINLOG EVENTS;
----------------------------------------------------------------------------------------------------------------------------------------------------------+
| Log_name | Pos | Event_type | Server_id | End_log_pos | Info |
+-------------------+-----------+-------------+-----------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| binary_log.000004 | 4 | Format_desc | 1502 | 106 | Server ver: 5.1.54-log, Binlog ver: 4 |
| binary_log.000004 | 106 | Query | 1502 | 208 | UPDATE ... |
接続もできたし権限も足りてるしI/Oスレッド相当の SHOW BINLOG EVENTS
も動く。
もう訳がわからないので大人しくgdbに頼る。
エラーログを吐いてる関数は error_log_print
.
# gdb -p $(pidof mysqld)
(gdb) b error_log_print
(gdb) c
Breakpoint 1, error_log_print (level=ERROR_LEVEL, format=0x1606eb8 "Slave %s%s: %s%s Error_code: %d", args=0x7f86538ab160) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/log.cc:2119
2119 in /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/log.cc
(gdb) bt
#0 error_log_print (level=ERROR_LEVEL, format=0x1606eb8 "Slave %s%s: %s%s Error_code: %d", args=0x7f86538ab160) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/log.cc:2119
#1 0x0000000000c68727 in sql_print_error (format=<optimized out>) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/log.cc:2145
#2 0x0000000000f24e7e in Slave_reporting_capability::va_report (this=<optimized out>, level=<optimized out>, err_code=0, prefix_msg=0x0, msg=0x1603a90 "error %s to master '%s@%s:%d' - retry-time: %d retries: %lu",
args=0x7f86538ab6f0) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_reporting.cc:170
#3 0x0000000000f24fdf in Slave_reporting_capability::report (this=<optimized out>, level=<optimized out>, err_code=<optimized out>, msg=<optimized out>)
at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_reporting.cc:113
#4 0x0000000000f153c3 in connect_to_master (thd=0x7f852cdea000, mysql=0x7f852ce6d420, mi=0x7f852f3b5000, reconnect=true, suppress_warnings=false)
at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_slave.cc:8932
#5 0x0000000000f1c68e in safe_reconnect (suppress_warnings=true, mi=0x7f852f3b5000, mysql=0x7f852ce6d420, thd=0x7f852cdea000) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_slave.cc:8986
#6 try_to_reconnect (thd=0x7f852cdea000, mysql=0x7f852ce6d420, mi=0x7f852f3b5000, retry_count=<optimized out>, suppress_warnings=true, messages=0x1de0e80 <reconnect_messages+96>)
at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_slave.cc:5515
#7 0x0000000000f1dbc7 in handle_slave_io (arg=0x7f852f3b5000) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_slave.cc:5815
#8 0x0000000000f71ea4 in pfs_spawn_thread (arg=0x7f852e085920) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/storage/perfschema/pfs.cc:2190
#9 0x00007f865540dea5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f8653ec596d in clone () from /lib64/libc.so.6
reportっぽい関数は呼び出し元から err_code
を受け取っているだけっぽかったので、 err_code
を生成してそうな connect_to_master
までさかのぼって読む。
while (!(slave_was_killed = io_slave_killed(thd,mi))
&& (reconnect ? mysql_reconnect(mysql) != 0 :
mysql_real_connect(mysql, mi->host, user,
password, 0, mi->port, 0, client_flag) == 0))
{
/*
SHOW SLAVE STATUS will display the number of retries which
would be real retry counts instead of mi->retry_count for
each connection attempt by 'Last_IO_Error' entry.
*/
last_errno=mysql_errno(mysql);
suppress_warnings= 0;
mi->report(ERROR_LEVEL, last_errno,
"error %s to master '%s@%s:%d'"
" - retry-time: %d retries: %lu",
(reconnect ? "reconnecting" : "connecting"),
mi->get_user(), mi->host, mi->port,
mi->connect_retry, err_count + 1);
と、こうなので mysql_real_connect
( mysql_reconnect
も最終的に mysql_real_connect
に来る) が返すエラー番号 ( mysql_errno
が返すやつ)が何故か0になっていると。
というわけで mysql_real_connect
にブレークポイントを張って
(gdb) b mysql_real_connect
(gdb) c
(gdb) bt
#0 mysql_real_connect (mysql=0x7f86538ab290, host=0x7f852cd812c0 "xxxx", user=0x7f852cc150c0 "replicator", passwd=0x7f852cc150f0 "yyyy", db=0x0, port=3306, unix_socket=0x0, client_flag=2176360965)
at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql-common/client.c:3970
#1 0x0000000000ded80b in mysql_reconnect (mysql=0x7f852ce6d420) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql-common/client.c:4622
#2 0x0000000000f15334 in connect_to_master (thd=0x7f852cdea000, mysql=0x7f852ce6d420, mi=0x7f852f3b5000, reconnect=true, suppress_warnings=<optimized out>)
at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_slave.cc:8915
#3 0x0000000000f1c68e in safe_reconnect (suppress_warnings=true, mi=0x7f852f3b5000, mysql=0x7f852ce6d420, thd=0x7f852cdea000) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_slave.cc:8986
#4 try_to_reconnect (thd=0x7f852cdea000, mysql=0x7f852ce6d420, mi=0x7f852f3b5000, retry_count=<optimized out>, suppress_warnings=true, messages=0x1de0e80 <reconnect_messages+96>)
at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_slave.cc:5515
#5 0x0000000000f1dbc7 in handle_slave_io (arg=0x7f852f3b5000) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/sql/rpl_slave.cc:5815
#6 0x0000000000f71ea4 in pfs_spawn_thread (arg=0x7f852e085920) at /export/home/pb2/build/sb_0-34537258-1560179931.8/mysql-5.7.27/storage/perfschema/pfs.cc:2190
#7 0x00007f865540dea5 in start_thread () from /lib64/libpthread.so.0
#8 0x00007f8653ec596d in clone () from /lib64/libc.so.6
mysql
構造体の中身を覗いたところで、電撃が走った。
(gdb) p mysql
$3 = (MYSQL *) 0x7f852ce6d420
(gdb) p *mysql
$4 = {net = {vio = 0x0, buff = 0x0, buff_end = 0x7f852cc0a020 "", write_pos = 0x7f852cc06020 "\001", read_pos = 0x7f852cc06020 "\001", fd = 24, remain_in_buf = 0, length = 0, buf_length = 0, where_b = 0, max_packet = 16384,
max_packet_size = 1073745872, pkt_nr = 12, compress_pkt_nr = 12, write_timeout = 60, read_timeout = 4, retry_count = 10, fcntl = 0, return_status = 0x0, reading_or_writing = 0 '\000', save_char = 0 '\000', unused1 = 0 '\000',
unused2 = 0 '\000', compress = 0 '\000', unused3 = 0 '\000', unused = 0x0, last_errno = 0, error = 2 '\002', unused4 = 0 '\000', unused5 = 0 '\000',
last_error = "\000here is an attribute with the same name already", '\000' <repeats 463 times>, sqlstate = "00000", extension = 0x0}, connector_fd = 0x0, host = 0x7f852cd812c0 "xxxxx",
user = 0x7f852cc150c0 "replicator", passwd = 0x7f852cc150f0 "yyyy", unix_socket = 0x0, server_version = 0x7f852cd812d8 "5.1.54-log", host_info = 0x7f852cd812a0 "xxxxx via TCP/IP", info = 0x0, db = 0x0,
charset = 0x1e63760 <my_charset_utf8mb4_general_ci>, fields = 0x0, field_alloc = {free = 0x0, used = 0x0, pre_alloc = 0x0, min_malloc = 32, block_size = 8160, block_num = 4, first_block_usage = 0, max_capacity = 0,
allocated_size = 0, error_for_capacity_exceeded = 0 '\000', error_handler = 0x0, m_psi_key = 0}, affected_rows = 18446744073709551615, insert_id = 0, extra_info = 0, thread_id = 43147884, packet_length = 7, port = 3306,
client_flag = 2176360965, server_capabilities = 63487, protocol_version = 10, field_count = 0, server_status = 2, server_language = 33, warning_count = 0, options = {connect_timeout = 4, read_timeout = 4, write_timeout = 0,
port = 0, protocol = 0, client_flag = 0, host = 0x0, user = 0x0, password = 0x0, unix_socket = 0x0, db = 0x0, init_commands = 0x0, my_cnf_file = 0x0, my_cnf_group = 0x0,
charset_dir = 0x7f852cdaf080 "/usr/local/mysql/share/charsets/", charset_name = 0x7f852cc15090 "utf8mb4", ssl_key = 0x0, ssl_cert = 0x0, ssl_ca = 0x0, ssl_capath = 0x0, ssl_cipher = 0x0, shared_memory_base_name = 0x0,
max_allowed_packet = 1073745872, use_ssl = 0 '\000', compress = 0 '\000', named_pipe = 0 '\000', unused1 = 0 '\000', unused2 = 0 '\000', unused3 = 0 '\000', unused4 = 0 '\000', methods_to_use = MYSQL_OPT_GUESS_CONNECTION,
ci = {client_ip = 0x0, bind_address = 0x0}, unused5 = 0 '\000', report_data_truncation = 1 '\001', local_infile_init = 0x0, local_infile_read = 0x0, local_infile_end = 0x0, local_infile_error = 0x0,
local_infile_userdata = 0x0, extension = 0x7f852cda9120}, status = MYSQL_STATUS_READY, free_me = 1 '\001', reconnect = 1 '\001', scramble = "1.1Jc4vwt`bL7k:yvfyP", unused1 = 0 '\000', unused2 = 0x0, unused3 = 0x0,
unused4 = 0x0, unused5 = 0x0, stmts = 0x0, methods = 0x1de0360 <client_methods>, thd = 0x0, unbuffered_fetch_owner = 0x0, info_buffer = 0x0, extension = 0x7f852cdc1040}
Σ(゚д゚lll) utf8mb4!!
確かに character_set_server = utf8mb4
で起動しちゃったので、my.cnfを直した上で SET GLOBAL character_set_server = utf8
で再設定していたのだ。
しかしレプリケーションスレッドには効かなかった…ということだろうか。
mysqld再起動したらつながるようになりました。
あー楽しかった!