MySQL node join group replication failed to due extra transactions after failover

3/13/2018

I have a 3-node mysql InnoDB cluster deployed on kubernetes(k8s). Each mysql node is a k8s POD. To test the availability, I manually removed the docker container mysql primary node corresponding to(with command 'docker rm xxxx -f). As expected, k8s recreated the container successfully. But, the newly created one failed to join the cluster(more specifically, failed to join the group replication). It complained the member has transaction not in current group. But, the extra transaction is the operation to set SESSION.GTID_NEXT to AUTOMATIC. I don't know why it is in binlog and made the rejoining failed. Would anyone give me some hint or suggestion? Thanks a lot!

MySQL version is mysql-community-server-minimal-5.7.20-1. And it's installed from rpm.

InnodDB cluster is created by adopting an existent replication group. This say I've created replication group first, then created the cluster.

MySQL log is this:

2018-03-12T08:17:21.208124Z 0 [Note] Plugin group_replication reported: 'new state x_run' 2018-03-12T08:17:21.218361Z 0 [ERROR] Plugin group_replication reported: 'This member has more executed transactions than those present in the group. Local transactions: 925deea1-75db-4e92-86a9-00259c8d8078:1-50 > Group transactions: 925deea1-75db-4e92-86a9-00259c8d8078:1-2' 2018-03-12T08:17:21.218433Z 0 [ERROR] Plugin group_replication reported: 'The member contains transactions not present in the group. The member will now exit the group.' 2018-03-12T08:17:21.218444Z 0 [Note] Plugin group_replication reported: 'To force this member into the group you can use the group_replication_allow_local_disjoint_gtids_join option' 2018-03-12T08:17:21.218493Z 3 [Note] Plugin group_replication reported: 'Going to wait for view modification' 2018-03-12T08:17:21.218546Z 0 [Note] Plugin group_replication reported: 'A new primary with address t-baj3bsrg4cp000a1sp3g-mysql-0.mysql:3306 was elected, enabling conflict detection until the new primary applies all relay logs.' 2018-03-12T08:17:21.218597Z 15 [Note] Plugin group_replication reported: 'This server is working as primary member.' 2018-03-12T08:17:21.218634Z 0 [Note] Plugin group_replication reported: 'Group membership changed to t-baj3bsrg4cp000a1sp3g-mysql-0.mysql:3306, t-baj3bsrg4cp000a1sp3g-mysql-1.mysql:3306, on view 15208422427926210:3.' 2018-03-12T08:17:21.218655Z 0 [Note] Plugin group_replication reported: 'The member with address t-baj3bsrg4cp000a1sp3g-mysql-2.mysql:3306 was declared online within the replication group' 2018-03-12T08:17:21.218825Z 4 [ERROR] Plugin group_replication reported: 'Unable to update certification result on server side, thread_id: 136' 2018-03-12T08:17:21.218838Z 4 [ERROR] Plugin group_replication reported: 'Error at event handling! Got error: 1' 2018-03-12T08:17:21.218849Z 4 [ERROR] Plugin group_replication reported: 'Fatal error during execution on the Applier process of Group Replication. The server will now leave the group.' 2018-03-12T08:17:21.218864Z 4 [Warning] Plugin group_replication reported: 'Skipping leave operation: concurrent attempt to leave the group is on-going.' 2018-03-12T08:17:21.218996Z 7 [Note] Error reading relay log event for channel 'group_replication_applier': slave SQL thread was killed

Bin log events shown on the failed node are:

+-------------------------------------------+-----+----------------+-----------+-------------+-------------------------------------------+ | Log_name | Pos | Event_type | Server_id | End_log_pos | Info | +-------------------------------------------+-----+----------------+-----------+-------------+-------------------------------------------+ | t-baj3bsrg4cp000a1sp3g-mysql-0-bin.000003 | 4 | Format_desc | 100 | 123 | Server ver: 5.7.20-log, Binlog ver: 4 | | t-baj3bsrg4cp000a1sp3g-mysql-0-bin.000003 | 123 | Previous_gtids | 100 | 190 | 925deea1-75db-4e92-86a9-00259c8d8078:1-50 | +-------------------------------------------+-----+----------------+-----------+-------------+-------------------------------------------+

Transaction of 925deea1-75db-4e92-86a9-00259c8d8078:1-50 shown in binlog file: enter image description here

FYI, according to mysql log, setting group_replication_allow_local_disjoint_gtids_join option did work for me. But, refer to latest doc of option group_replication_allow_local_disjoint_gtids_join, it would be removed in mysql 5.7.22. So, I have to figure out the root cause of issue. Thanks!

-- Tony Wen
kubernetes
mysql
mysql-group-replication

1 Answer

12/21/2018

My advice is to recheck your infrastructure, as you see you don't have 1 extra transaction. If you look into the information you pasted

925deea1-75db-4e92-86a9-00259c8d8078:1-50 > 925deea1-75db-4e92-86a9-00259c8d8078:1-2 

1-50 vs 1-2 means you have 48 extra transactions.

Also, you are looking at a new binlog file, as you purged the older ones or there was a rotation. In this new file, the only information you see is that the Previous-GTID information is 925deea1-75db-4e92-86a9-00259c8d8078:1-50, meaning the older binlog files contained information about these 50 transactions.

You can use the commands

 SHOW BINARY LOGS;

And then

SHOW BINLOG EVENTS [IN 'log_name']

To get more information about the older events or use the mysqlbinlog utility to check the other files.

-- Pedro Gomes
Source: StackOverflow