MySQL cluster: Failover between data node

https://dba.stackexchange.com/questions/73044

20-10-2020
|

質問

Today, I do some expirements for understanding MySQL cluster better.

MySQL cluster works fine:

ndb_mgm> show

Connected to Management Server at: localhost:1186

Cluster Configuration

[ndbd(NDB)]2 node(s)

id=2@192.168.56.2  (mysql-5.6.19 ndb-7.3.6, Nodegroup: 0, *)

id=3@192.168.56.3  (mysql-5.6.19 ndb-7.3.6, Nodegroup: 0)

[ndb_mgmd(MGM)]1 node(s)

id=1@192.168.56.1  (mysql-5.6.19 ndb-7.3.6)

[mysqld(API)]1 node(s)

id=4@192.168.56.4  (mysql-5.6.19 ndb-7.3.6)

Case 1: I turn off management node by killall ndb_mgmd then kill a data node by killall ndbd. I see remain data node is turned off although I don't do anything with it. There is no live data node so sql node is impossible to get or change data.

Case 2: I keep management node live. I kill a data node by killall ndbd. I see the remain data node is sitll living so sql node still works well.

I have a question. Should I keep the management node for controlling data nodes ? I don't read the mysql reference manual mention the similar case.

Thank you in advance.

解決

The management node is not essential for normal mysql operations, except on:

Installation and other cluster-wide maintenance operations
It is by default the arbitrator node, which means that it holds the responsibility to decide if a particular partition continues running in the case of a split brain.

Split brain is an important concept in clustering, which happens when apparently, a group of nodes identify some other group of nodes as down (a group can be a single node). However, you cannot be sure if they are really down or the communication among them is down. In order to maintain consistency, an algorithm must decide what to do. If not, your separate partitions of the cluster could end up with different states/data sets. Typically, and that is what happens, if your partition has more than 50% of the nodes, you continue, if not, you stop yourself.

In your particular case, without an arbitrator, as 1 node is only 50% of the data nodes, it decides to "kill themself" -it is not a failure, it is a design decision. With the arbitrator, we know that the other node will not continue by himself, as the arbitrator cannot see it, so it decides to continue with 50% of the nodes (1, that has all the data, as the default is noofreplicas = 2) + arbitrator. In general, you want to run with an odd number of nodes to minimize the possibility of a full cluster shutdown.

Typically, on an NDB setup, your SQL nodes are also a primary and a secondary arbitrator node (you can modify its weights). You normally have more than one API node, to avoid a single point of failure. So the most typical setup is 2 API/SQL/arbitrator nodes and 2 NDB nodes. That way, in most cases, 2 nodes have to go down for the whole cluster to fail.

ライセンス： CC-BY-SA と帰属

所属していません dba.stackexchange