Understanding split brain in a Galera cluster

Split brain is a condition when a cluster gets partitioned and each part is operating independently. This is an unwanted situation that one would always want to avoid. So, how is it handled in a MariaDB Galera cluster? In order to understand this, let’s first start by looking into the logs of a node from a split-brain(ed) cluster.

In the event of a network partition, some nodes of the cluster may no longer be reachable from the other nodes. They try to reconnect to these suspecting nodes and later move them to partitioned list by marking them as inactive when no response is received. A voting for quorum is then taken on each node to see if they belong to the majority partition (Primary Component) using the following formula :

where,

  • : members of the last seen primary component,
  • : members that are known to have left gracefully,
  • : current components members, and
  • : member’s weight

In a Galera cluster, nodes outside the primary component are not allowed to process queries. It is mainly done preserve data consistency.

MariaDB [test]> select 1;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

Now, as shown in the logs above, when the cluster gets split into two partitions of equal size, (i.e. both the partitions get equal weight, split-brain), the quorum algorithm fails find the the primary component. As a result, the cluster has no primary component and can no longer process any queries. This can be resolved by finding the node with most recent updates and bootstraping the cluster using that node.

Reference: http://galeracluster.com/documentation-webpages/weightedquorum.html

Leave a Reply

Your email address will not be published. Required fields are marked *