Tag Archives: Cluster

Understanding split brain in a Galera cluster

Split brain is a condition when a cluster gets partitioned and each part is operating independently. This is an unwanted situation that one would always want to avoid. So, how is it handled in a MariaDB Galera cluster? In order to understand this, let’s first start by looking into the logs of a node from a split-brain(ed) cluster.

In the event of a network partition, some nodes of the cluster may no longer be reachable from the other nodes. They try to reconnect to these suspecting nodes and later move them to partitioned list by marking them as inactive when no response is received. A voting for quorum is then taken on each node to see if they belong to the majority partition (Primary Component) using the following formula :

where,

  • : members of the last seen primary component,
  • : members that are known to have left gracefully,
  • : current components members, and
  • : member’s weight

In a Galera cluster, nodes outside the primary component are not allowed to process queries. It is mainly done preserve data consistency.

MariaDB [test]> select 1;
ERROR 1047 (08S01): WSREP has not yet prepared node for application use

Now, as shown in the logs above, when the cluster gets split into two partitions of equal size, (i.e. both the partitions get equal weight, split-brain), the quorum algorithm fails find the the primary component. As a result, the cluster has no primary component and can no longer process any queries. This can be resolved by finding the node with most recent updates and bootstraping the cluster using that node.

Reference: http://galeracluster.com/documentation-webpages/weightedquorum.html

New Information schema tables for Galera membership and status

MariaDB Galera server logs all the cluster related information like node status, cluster status, membership, etc. in the error log. MariaDB 10.1.2 introduces a new INFORMATION SCHEMA plugin WSREP_INFO that enables querying these information via INFORMATION SCHEMA tables. The WSREP_INFO plugin adds two new tables to the Information Schema, WSREP_MEMBERSHIP and WSREP_STATUS. The plugin is not enabled by default, so in order to use it, it needs to be installed first :

MariaDB [test]> INSTALL SONAME 'wsrep_status';
Query OK, 0 rows affected (0.04 sec)

MariaDB [test]> SHOW PLUGINS;
+-----------------------------+----------+--------------------+-----------------+---------+
| Name                        | Status   | Type               | Library         | License |
+-----------------------------+----------+--------------------+-----------------+---------+
...
| WSREP_MEMBERSHIP            | ACTIVE   | INFORMATION SCHEMA | wsrep_status.so | GPL     |
| WSREP_STATUS                | ACTIVE   | INFORMATION SCHEMA | wsrep_status.so | GPL     |
+-----------------------------+----------+--------------------+-----------------+---------+
52 rows in set (0.01 sec)

MariaDB [test]> SHOW CREATE TABLE INFORMATION_SCHEMA.WSREP_MEMBERSHIP\G
*************************** 1. row ***************************
       Table: WSREP_MEMBERSHIP
Create Table: CREATE TEMPORARY TABLE `WSREP_MEMBERSHIP` (
  `INDEX` int(11) NOT NULL DEFAULT '0',
  `UUID` varchar(36) NOT NULL DEFAULT '',
  `NAME` varchar(32) NOT NULL DEFAULT '',
  `ADDRESS` varchar(256) NOT NULL DEFAULT ''
) ENGINE=MEMORY DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

MariaDB [test]> SHOW CREATE TABLE INFORMATION_SCHEMA.WSREP_STATUS\G
*************************** 1. row ***************************
       Table: WSREP_STATUS
Create Table: CREATE TEMPORARY TABLE `WSREP_STATUS` (
  `NODE_INDEX` int(11) NOT NULL DEFAULT '0',
  `NODE_STATUS` varchar(16) NOT NULL DEFAULT '',
  `CLUSTER_STATUS` varchar(16) NOT NULL DEFAULT '',
  `CLUSTER_SIZE` int(11) NOT NULL DEFAULT '0',
  `CLUSTER_STATE_UUID` varchar(36) NOT NULL DEFAULT '',
  `CLUSTER_STATE_SEQNO` bigint(21) NOT NULL DEFAULT '0',
  `CLUSTER_CONF_ID` bigint(21) NOT NULL DEFAULT '0',
  `GAP` varchar(10) NOT NULL DEFAULT '',
  `PROTOCOL_VERSION` int(11) NOT NULL DEFAULT '0'
) ENGINE=MEMORY DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

MariaDB [test]> SELECT @@wsrep_provider;
+----------------------------------+
| @@wsrep_provider                 |
+----------------------------------+
| /usr/lib/galera/libgalera_smm.so |
+----------------------------------+
1 row in set (0.00 sec)

Now that WSREP_INFO plugin is installed, lets look into the contents of these tables on a 3-node cluster.

MariaDB [test]> SELECT * FROM INFORMATION_SCHEMA.WSREP_MEMBERSHIP;
+-------+--------------------------------------+----------+-----------------+
| INDEX | UUID                                 | NAME     | ADDRESS         |
+-------+--------------------------------------+----------+-----------------+
|     0 | 19058073-8940-11e4-8570-16af7bf8fced | my_node1 | 10.0.2.15:16001 |
|     1 | 19f2b0e0-8942-11e4-9cb8-b39e8ee0b5dd | my_node3 | 10.0.2.15:16003 |
|     2 | d85e62db-8941-11e4-b1ef-4bc9980e476d | my_node2 | 10.0.2.15:16002 |
+-------+--------------------------------------+----------+-----------------+
3 rows in set (0.00 sec)

MariaDB [test]> SELECT * FROM INFORMATION_SCHEMA.WSREP_STATUS\G
*************************** 1. row ***************************
         NODE_INDEX: 0
        NODE_STATUS: Synced
     CLUSTER_STATUS: Primary
       CLUSTER_SIZE: 3
 CLUSTER_STATE_UUID: 190604d7-8940-11e4-a084-ebee5211c190
CLUSTER_STATE_SEQNO: 2
    CLUSTER_CONF_ID: 3
                GAP: NO
   PROTOCOL_VERSION: 3
1 row in set (0.00 sec)

As seen above, WSREP_MEMBERSHIP table shows information about current members in the cluster which includes node’s name and incoming address. WSREP_STATUS table, on the other hand, shows status information about the node and cluster as a whole.

SHOW command can also be used to query these tables. Its quick and reduces the number of columns for WSREP_STATUS to fit to the screen.

MariaDB [test]> SHOW WSREP_MEMBERSHIP;
+-------+--------------------------------------+----------+-----------------+
| Index | Uuid                                 | Name     | Address         |
+-------+--------------------------------------+----------+-----------------+
|     0 | 19058073-8940-11e4-8570-16af7bf8fced | my_node1 | 10.0.2.15:16001 |
|     1 | 19f2b0e0-8942-11e4-9cb8-b39e8ee0b5dd | my_node3 | 10.0.2.15:16003 |
|     2 | d85e62db-8941-11e4-b1ef-4bc9980e476d | my_node2 | 10.0.2.15:16002 |
+-------+--------------------------------------+----------+-----------------+
3 rows in set (0.00 sec)

MariaDB [test]> SHOW WSREP_STATUS;
+------------+-------------+----------------+--------------+
| Node_Index | Node_Status | Cluster_Status | Cluster_Size |
+------------+-------------+----------------+--------------+
|          0 | Synced      | Primary        |            3 |
+------------+-------------+----------------+--------------+
1 row in set (0.00 sec)

Configuring SELinux for Galera cluster

Setting up a MariaDB Galera cluster can easily become tedious if its being setup on a Linux system with SELinux enabled.

140805 16:16:20 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer ''
140805 16:16:20 [ERROR] WSREP: Permission denied
140805 16:16:20 [ERROR] WSREP: failed to open gcomm backend connection: 13: error while trying to listen 'tcp://0.0.0.0:4567?socket.non_blocking=1', asio error 'Permission denied': 13 (Permission denied)
         at gcomm/src/asio_tcp.cpp:listen():814
140805 16:16:20 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -13 (Permission denied)
140805 16:16:20 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'my_wsrep_cluster' at 'gcomm://': -13 (Permission denied)
140805 16:16:20 [ERROR] WSREP: gcs connect failed: Permission denied
140805 16:16:20 [ERROR] WSREP: wsrep::connect() failed: 7
140805 16:16:20 [ERROR] Aborting

In a test environment one can disable/enable SELinux for mysqld using the following commands (Thanks to Daniel Black for the tip!) :

$ sudo semanage permissive -a mysqld_t
$ sudo semanage permissive -d mysqld_t

Disabling SELinux for mysql works, but its certainly not the best solution. So, I tried to configure SELinux for a 2-node MariaDB Galera cluster on CentOS by using some tools provided to manage SELinux policies. The basic idea is to let the MariaDB Galera nodes run under permissive mode in order to get all possible operations (which SELinux would have otherwise denied) logged into the audit log and then create a policy module using allow2audit tool after carefully analyzing the “denials”. The resulting module can then be installed before enabling (enforcing) SELinux for mysqld again.

Prepare the hosts
* Install MariaDB Galera server packages ($ sudo yum install MariaDB-Galera-server)
* Setup MariaDB configuration options ($sudo vi /etc/my.cnf.d/server.cnf)
* Install SELinux policy management tools ($ sudo yum install policycoreutils-python)
* Firewall settings (see resources below)

Generate the policy module
* Disable SELinux for mysqld on both the hosts. With SELinux in permissive mode, it logs all the denial operations as warnings instead of enforcing them.

$ sudo semanage permissive -a mysqld_t
  • Once mysqld on both the hosts are in permissive mode, our goal is to trigger all sorts of events that can happen on a node in a MariaDB Galera cluster (the more extensive, the better!) like, starting the node as donor/joiner with different snapshot state transfer (SST) methods and incremental state transfer (IST). The idea is to let all possible denials get logged into the audit log, which we later use to generate the policy module.
  • Carefully analyze (sealert Messages) all the “denials” logged in the audit log (/var/log/audit/audit.log). If the denials are expected, create a local policy module using allow2audit.
  $ sudo grep mysql audit.log | audit2allow -M mariadb-galera
  • Install the policy module.
  $ sudo semodule -i mariadb-galera.pp
  • Put mysqld back to enforcing mode.
$ sudo semanage permissive -d mysqld_t

Resources

Galera: Unknown command

Here is the scenario: You connect to one of the nodes in a MariaDB/MySQL Galera cluster, execute a valid command and the node responds with “Unknown command” error. Strange! After all, you executed a valid SQL command. What’s going on? I will try to demystify it in this article.

ERROR 1047 (08S01): Unknown command

In a “split-brain” situation, due to network partitioning for instance, there is a higher risk of data becoming inconsistent across various nodes in the cluster. So, in order to avoid this, Galera nodes in minority start rejecting incoming commands** until this issue has been resolved – and thus, it returns “Unknown command” error to the client. Its a reasonable decision that helps preserve data consistency in the cluster. Besides split-brain, there can be other situations when a node is not ready or fully prepared and thus would reject commands with the same error.

** SHOW and SET commands are not rejected.

EDIT: The error message has been corrected recently in MariaDB Galera Cluster (MDEV-6171) :

ERROR 1047 (08S01): WSREP has not yet prepared node for application use

A note on Galera versioning

The knowledge of Galera versioning comes in handy when you are working with MariaDB Galera cluster. Galera libraries are versioned using the following format : xx.yy.zz (e.g. 25.3.2), where :

xx : WSREP Interface Version
yy : Galera library major version
zz : Galera library minor version

However, the library itself is often referred by its major and minor version only (galera-3.2, for instance). While the last two numbers (yy.zz) describe the set of features/bug fixes, the first one (xx) gives the WSREP interface version of the Galera library.

A MariaDB Galera cluster node consists of two main components :

  1. MariaDB Galera server
  2. Galera library

The Galera library (aka the WSREP provider) plugs into MariaDB Galera server to provide support for writeset replication. In order to successfully plug in, the WSREP interface version of galera library must match with that provided by MariaDB Galera Server. The WSREP interface version of MariaDB Galera Server can be found using version_comment system variable.

MariaDB [test]> select @@version_comment;
+---------------------------------------+
| @@version_comment                     |
+---------------------------------------+
| Source distribution, wsrep_25.9.r3961 |
+---------------------------------------+

So, what happens when interface versions of MariaDB Galera Server and Galera library are different? The Galera plugin would fail to load :

140311 14:14:02 [Note] WSREP: Read nil XID from storage engines, skipping position init
140311 14:14:02 [Note] WSREP: wsrep_load(): loading provider library '/home/packages/galera-23.2.7-src/libgalera_smm.so'
140311 14:14:03 [ERROR] WSREP: provider interface version mismatch: need '25', found '23'
140311 14:14:03 [ERROR] WSREP: wsrep_load(): interface version mismatch: my version 25, provider version 23
140311 14:14:03 [ERROR] WSREP: wsrep_load(/home/packages/galera-23.2.7-src/libgalera_smm.so) failed: Invalid argument (22). Reverting to no provider.
140311 14:14:03 [Note] WSREP: Read nil XID from storage engines, skipping position init
140311 14:14:03 [Note] WSREP: wsrep_load(): loading provider library 'none'
140311 14:14:03 [ERROR] Aborting

Lastly, it is interesting to note that a Galera library can be loaded into the MariaDB Galera Server regardless of its version (2.XX or 3.XX) as long as their WSREP interface versions are same.

Galera: Connection timed out scenarios

While joining an existing Galera cluster, if a node fails to “reach” any of the nodes in the cluster, it generally – aborts – with the following error :

140505 10:01:46 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '127.0.0.1:4567'
140505 10:01:49 [Warning] WSREP: no nodes coming from prim view, prim not possible
140505 10:01:49 [Note] WSREP: view(view_id(NON_PRIM,cb63893e-d45d-11e3-9dee-0a8c1484c284,1) memb {
        cb63893e-d45d-11e3-9dee-0a8c1484c284,0
} joined {
} left {
} partitioned {
})
140505 10:01:49 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.51674S), skipping check
140505 10:02:19 [Note] WSREP: view((empty))
140505 10:02:19 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():141
140505 10:02:19 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -110 (Connection timed out)
140505 10:02:19 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'my_wsrep_cluster' at 'gcomm://127.0.0.1:4567': -110 (Connection timed out)
140505 10:02:19 [ERROR] WSREP: gcs connect failed: Connection timed out
140505 10:02:19 [ERROR] WSREP: wsrep::connect() failed: 7
140505 10:02:19 [ERROR] Aborting

Assuming that there are no network issues, this error may occur due to any of the following scenarios:

Scenario #1: A node tries join a non-existing cluster

This issue and the possible resolution has been discussed here.

Scenario #2: A node with Galera 3 (25.3.xx) wsrep provider tries to join an existing cluster consisting of Galera 2 (25.2.xx) nodes

This results in error because, by default, Galera-2 and Galera-3 uses different checksum algorithms on network packets. While, for Galera-2 its plain CRC32 (socket.checksum=1), Galera-3 uses CRC32-C hw-accelerated (socket.checksum=2). Checksum algorithms can be controlled by socket.checksum galera parameter.

So, the solution to this problem would be to start the Galera-3 node with wsrep_provider_options=’socket.checksum=1′ option to make sure it uses the same checksum algorithm which other nodes of the cluster (Galera-2) are using.

References: http://galeracluster.com/documentation-webpages/galeraparameters.html

Scenario #3: telnet: Unable to connect to remote host: No route to host

Solution: Configure firewall settings :

(a) Flush all the firewall rules: Quick, but advisable for test setups only.
$ sudo iptables -F ## Thanks to Qian Joe for the suggestion!

(b) Allow only specific hosts/ports: Its important to note that by flushing (-F), one would essentially delete all the firewall rules. This is undesirable. Only specific ports & hosts should be allowed instead. (see firewall settings)

EDIT: Do let me know (via comment) if the given resolutions did not work for you and I will be happy to add the failure scenario & resolution here.

Auto increments in Galera cluster

Lets start by considering a scenario where records are being inserted in a single auto-increment table via different nodes of a multi-master cluster. One issue that might arise is ‘collision’ of generated auto-increment values on different nodes, which is precisely the subject of this article.

As the cluster is multi-master, it allows writes on all master nodes. As a result of which a table might get same auto-incremented values on different nodes on INSERTs. This issue is discovered only after the writeset is replicated and that’s a problem!

Galera cluster suffers with the similar problem.

Lets try to emulate this on a 2-node Galera cluster :

1) On node #1:

MariaDB [test]> CREATE TABLE t1(c1 INT AUTO_INCREMENT PRIMARY KEY, c2 INT)ENGINE=InnoDB;
Query OK, 0 rows affected (0.07 sec)

MariaDB [test]> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSERT INTO t1(c2) VALUES (1);
Query OK, 1 row affected (0.05 sec)

2) On node #2:

MariaDB [test]> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSERT INTO t1(c2) VALUES(2);
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> COMMIT;
Query OK, 0 rows affected (0.05 sec)

3) On node #1

MariaDB [test]> COMMIT;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

MariaDB [test]> SELECT * FROM t1;
+----+------+
| c1 | c2   |
+----+------+
|  1 |    2 |
+----+------+
1 row in set (0.00 sec)

As expected, the second commit could not succeed because of the collision.

So, how do we handle this issue? Enter @@auto_increment_increment and @@auto_increment_offset! Using these two system variables one can control the sequence of auto-generated values on a MySQL/MariaDB server. The trick is to set them in such a way that every node in the cluster generates a sequence of non-colliding numbers.

For instance, lets discuss this for a 3-node cluster (n=3):
Node 1: @@auto_increment_increment=3, @@auto_increment_offset=1 => Sequence : 1, 4, 7, 10, ...
Node 2: @@auto_increment_increment=3, @@auto_increment_offset=2 => Sequence : 2, 5, 8, 11, ...
Node 3: @@auto_increment_increment=3, @@auto_increment_offset=3 => Sequence : 3, 6, 9, 12, ...

As you can see, by setting each node’s auto_increment_increment to the total number of nodes (n) in the cluster and auto_increment_offset to a number between [1,n], we can assure that auto-increment values, thus generated, would be unique across the cluster, thus, would avoid any conflict or collision.

In Galera cluster this is already taken care of by default. As and when a node joins the cluster, the two auto-increment variables are adjusted automatically to avoid collision. However, this capability can be controlled by using wsrep_auto_increment_control variable.

Node #1:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 1     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

Node #2:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 2     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

Node #3:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 3     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

With this setting the last COMMIT in the above example would succeed.

What is Galera Arbitrator?

Galera Arbitrator (garbd) is a stateless daemon that can act like a node in a Galera cluster. It is normally used to avoid split-brain situation which mostly occurs because of hardware/link failure, as a result of which the cluster gets divided into two parts and each part remains operational thinking they are in majority (primary component). This may lead to inconsistent data sets. Garbd should be installed on a separate machine. However, it can share the machine running load-balancer. It is interesting to note that as garbd joins the cluster, it makes a request for SST.

Let us now try to add galera arbitrator (garbd) to an existing 2-node MariaDB Galera cluster. Check out this post for steps to setup a MariaDB Galera cluster. We start by connecting to one of the nodes to get the size and name of the cluster.

MariaDB [(none)]> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 2     |
+--------------------+-------+
1 row in set (0.00 sec)

MariaDB [(none)]> show variables like 'wsrep_cluster_name';
+--------------------+------------------+
| Variable_name      | Value            |
+--------------------+------------------+
| wsrep_cluster_name | my_wsrep_cluster |
+--------------------+------------------+
1 row in set (0.00 sec)

Now, lets start the garbd as a daemon:

$ cd galera-23.2.7-src/garb

$ ./garbd
    --address='gcomm://127.0.0.1:4567'
    --options='gmcast.listen_addr=tcp://127.0.0.1:4569'
    --group='my_wsrep_cluster'
    --log=garbd.err
    --daemon

That’s it, garbd should now be running and connected to the cluster. Let’s verify this by checking wsrep_cluster_size again.

MariaDB [test]> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.00 sec)

Great! So, the new garbd is now part of the cluster.

Setting up MariaDB Galera Cluster on Ubuntu

MariaDB Galera Cluster is a multi-master synchronous replication system. In this article, I would be setting up a 3-node cluster on a single machine running Ubuntu. However, in a production scenario it is advised to run each cluster node on a separate box in a WAN.

Requirements

  1. MariaDB Galera Cluster
    • Download it from the official site : https://downloads.mariadb.org/mariadb-galera/, OR
    • Install it using Advanced packaging tool (APT), steps can be found in erkules’ blog post. OR
    • Build it from source : lp:~maria-captains/maria/maria-5.5-galera
      Note : Build would require additional cmake options : WITH_WSREP=ON and WITH_INNODB_DISALLOW_WRITES=1
  2. Galera wsrep provider (libgalera_smm.so)
  3. Some extra Ubuntu packages (in case you choose to build Galera from source!)
    • scons (Build utility, replacement for make)
    • check (Unit test framework for C)
    • libboost-dev
    • libboost-program-options-dev
    • libboost-system-dev (for 23.2.7)
    • libssl-dev

Setup

Now that we have all requirements in place, lets bring up the cluster nodes.

  1. Node#1 : Start 1st node (at port 4001 for instance) with empty cluster address (–wsrep_cluster_address=’gcomm://’).
     $ mysqld
        --no-defaults 
        --basedir=.
        --datadir=./data
        --port=4001
        --socket=/tmp/mysql_4001.sock
        --binlog_format=ROW
        --wsrep_provider=/path-to-galera-provider/libgalera_smm.so
        --wsrep_cluster_address='gcomm://'

    There are certain points to note before we proceed :

    • wsrep_provider option points to the galera wsrep provider, i.e. libgalera_smm.so library.
    • wsrep_cluster_address contains the address of existing cluster members. But in this case, as it is the very first node, the address must be empty.

      Important! In case you are planning to put these options in a config file (my.cnf) – once the cluster is up and running, make sure to change this option to hold a valid address. Failure to do so would disable the node’s capability to auto-join the cluster in case it restarts after a shutdown or crash.

    • From the server’s error log make a note of the base_host & base_port (default 4567) of wsrep. This information would be required to start the subsequent nodes.
  2. Node#2 : Start 2nd node at a different port (4002).
     $ mysqld
        --no-defaults
        --basedir=.
        --datadir=./data
        --port=4002
        --socket=/tmp/mysql_4002.sock
        --binlog_format=ROW
        --wsrep_provider=/path-to-galera-provider/libgalera_smm.so
        --wsrep_cluster_address='gcomm://127.0.0.1:4567'
        --wsrep_provider_options='gmcast.listen_addr=tcp://127.0.0.1:4568'

    Here, we have to specify the address of 1st node via wsrep_cluser_address option. It consists of base_host & base_port of the 1st node that we noted earlier in step 1 (i.e. gcomm://127.0.0.1:4567). Also, as we are starting this 2nd node on the same machine, in order to avoid port conflict, we must provide a different port for wsrep provider to listen to via gmcast.listen_addr wsrep provider option (tcp://127.0.0.1:4568).

  3. Node#3 : As with 2nd node, 3rd (and all subsequent nodes) can be started by same set of options with appropriate selection of available ports.
     $ mysqld
        --no-defaults
        --basedir=.
        --datadir=./data
        --port=4003
        --socket=/tmp/mysql_4003.sock
        --binlog_format=ROW
        --wsrep_provider=/path-to-galera-provider/libgalera_smm.so
        --wsrep_cluster_address='gcomm://127.0.0.1:4567'
        --wsrep_provider_options='gmcast.listen_addr=tcp://127.0.0.1:4569'

The cluster should now be up and running with 3 nodes. This can easily be verified and monitored further by inspecting server’s status & system variables :

MariaDB [(none)]> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.06 sec)

MariaDB [(none)]> show variables like 'wsrep%';
...
MariaDB [(none)]> show status like 'wsrep%';
...

Before I close, let me list out some important options which were omitted from this article for brevity :

  • default_storage_engine=INNODB
  • innodb_autoinc_lock_mode=2
  • innodb_locks_unsafe_for_binlog=1
  • wsrep_sst_auth=”user:pass” : to be used by SST (Snapshot state transfer) script

That’s all for now!