Donor selection in a Galera cluster

In a Galera cluster, when a node joins or rejoins the cluster it needs to acquire the state in order to sync with the cluster. Galera implements a pretty decent algorithm to elect a node as donor of the state, aka Donor node. In this post, I will attempt to describe the algorithm in a simplified manner.
When a node joins the cluster, it first checks if it can receive the state via incremental state transfer (IST) instead of a full snapshot state transfer (SST). This can happen when the node was previously part of the cluster (with same group UUID), and its missing updates are still cached on at least one of the nodes in the cluster. If none of the node have required updates cached, it will fall back to looking for a suitable node for a full snapshot state transfer (SST).

Note: The following algorithm is derived from Galera v3.12. It may change in future versions.

Donor selection algorithm:

  1. First, try to find an IST donor (a node that has the joiner’s missing updates cached into its gcache) by looking for
    1. a SYNCED node from the wsrep-sst-donor list with the highest cache sequence number, else
    2. a SYNCED node which is
      1. not stateless (i.e. not garbd) and
      2. a local node (in same segment as joiner) with highest cache sequence number, else
      3. a remote node (in a different segment) with highest cache sequence number
    150917 21:48:06 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.0.2.15:4011
    150917 21:48:06 [Note] WSREP: Member 0.0 (my_node2) requested state transfer from '*any*'. Selected 1.0 (my_node1)(SYNCED) as donor.
    150917 21:48:06 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 10)
    150917 21:48:06 [Note] WSREP: Requesting state transfer: success, donor: 1
    150917 21:48:06 [Note] WSREP: 1.0 (my_node1): State transfer to 0.0 (my_node2) complete.
    
  2. Else, try to find an SST donor, by looking for
    1. a SYNCED node from the wsrep-sst-donor list, else
    2. a SYNCED node which is
      1. not stateless (i.e. not garbd) and
      2. a local node in same segment as joiner, else
      3. a remote node from a different segment
150917 21:42:21 [Warning] WSREP: Failed to prepare for incremental state transfer: Local state UUID (00000000-0000-0000-0000-000000000000) does not match group state UUID (925fb059-1b51-11e5-a295-7739e970a4a4): 1 (Operation not permitted)
     at galera/src/replicator_str.cpp:prepare_for_IST():456. IST will be unavailable.
150917 21:42:21 [Note] WSREP: Member 1.0 (my_node2) requested state transfer from '*any*'. Selected 0.0 (my_node1)(SYNCED) as donor.
150917 21:42:21 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 5)
150917 21:42:21 [Note] WSREP: Requesting state transfer: success, donor: 0
150917 21:42:23 [Note] WSREP: (8052c243, 'tcp://0.0.0.0:4010') turning message relay requesting off
150917 21:42:24 [Note] WSREP: 0.0 (my_node1): State transfer to 1.0 (my_node2) complete.
150917 21:42:24 [Note] WSREP: Member 0.0 (my_node1) synced with group.