Tag Archives: MariaDB

mariadb-galera-server deb dependency graph

The following commands can be used to generate dependency graph for a debian package:

# Get relationship between packages (.dot)
$ debtree --no-conflicts --no-provides --with-suggests --max-depth=1 mariadb-galera-server-5.5 > mariadb-galera-server-5.5.dot
# Generate the image
$ dot -T png -o mariadb-galera-server-5.5.png mariadb-galera-server-5.5.dot

Reference: http://askubuntu.com/a/261808

Galera: runtime adjustment of applier threads

In a MariaDB Galera node, writesets received from other node(s) can be applied parallely through multiple applier threads. The number of slave applier threads is controlled by server’s wsrep_slave_threads system variable. Its a dynamic variable and thus the number of slave applier threads can be adjusted in runtime. The current number of slave applier threads can be checked through “SHOW PROCESSLIST” or wsrep_thread_count status variable (MDEV-6206).

One interesting point to note here is that when the number of @@global.wsrep_slave_threads is increased at runtime, the additional requested applier threads gets spawned immediately. However, when the number is decreased, the effect can not be noticed immediately. What happens internally is that when the number is decreased, the extra applier threads are not killed right away. The process is deferred, and the extra threads exit gracefully only after each apply one last writeset (transaction) after receiving it. So, one will not notice the number of applier threads decreasing on an idle node. The thread count will decrease only after the node starts receiving writesets to apply.

Here are some snippets to help you understand this further :

1. Calculate the change.

  wsrep_slave_count_change += (var->value->val_int() - wsrep_slave_threads);

2a: If its positive, spawn new applier threads.

  if (wsrep_slave_count_change > 0)
  {
    wsrep_create_appliers(wsrep_slave_count_change);
    wsrep_slave_count_change = 0;
  }

2b: else mark the thread as “done” after it has applied (commits or rollbacks) the given last writeset.

wsrep_cb_status_t wsrep_commit_cb(void*         const     ctx,
                                  uint32_t      const     flags,
                                  const wsrep_trx_meta_t* meta,
                                  wsrep_bool_t* const     exit,
                                  bool          const     commit)
{
...
  if (commit)
    rcode = wsrep_commit(thd, meta->gtid.seqno);
  else
    rcode = wsrep_rollback(thd, meta->gtid.seqno);

...

  if (wsrep_slave_count_change < 0 && commit && WSREP_CB_SUCCESS == rcode)
  {
    mysql_mutex_lock(&LOCK_wsrep_slave_threads);
    if (wsrep_slave_count_change < 0)
    {
      wsrep_slave_count_change++;
      *exit = true;
    }
...

And the thread exits :

static void wsrep_replication_process(THD *thd)
{
...
  rcode = wsrep->recv(wsrep, (void *)thd);
  DBUG_PRINT("wsrep",("wsrep_repl returned: %d", rcode));

  WSREP_INFO("applier thread exiting (code:%d)", rcode);
...

Go projects in MySQL/MariaDB ecosystem

Please let me know if you come across a MySQL/MariaDB Go project not listed here.

Galera: Unknown command

Here is the scenario: You connect to one of the nodes in a MariaDB/MySQL Galera cluster, execute a valid command and the node responds with “Unknown command” error. Strange! After all, you executed a valid SQL command. What’s going on? I will try to demystify it in this article.

ERROR 1047 (08S01): Unknown command

In a “split-brain” situation, due to network partitioning for instance, there is a higher risk of data becoming inconsistent across various nodes in the cluster. So, in order to avoid this, Galera nodes in minority start rejecting incoming commands** until this issue has been resolved – and thus, it returns “Unknown command” error to the client. Its a reasonable decision that helps preserve data consistency in the cluster. Besides split-brain, there can be other situations when a node is not ready or fully prepared and thus would reject commands with the same error.

** SHOW and SET commands are not rejected.

EDIT: The error message has been corrected recently in MariaDB Galera Cluster (MDEV-6171) :

ERROR 1047 (08S01): WSREP has not yet prepared node for application use

A note on Galera versioning

The knowledge of Galera versioning comes in handy when you are working with MariaDB Galera cluster. Galera libraries are versioned using the following format : xx.yy.zz (e.g. 25.3.2), where :

xx : WSREP Interface Version
yy : Galera library major version
zz : Galera library minor version

However, the library itself is often referred by its major and minor version only (galera-3.2, for instance). While the last two numbers (yy.zz) describe the set of features/bug fixes, the first one (xx) gives the WSREP interface version of the Galera library.

A MariaDB Galera cluster node consists of two main components :

  1. MariaDB Galera server
  2. Galera library

The Galera library (aka the WSREP provider) plugs into MariaDB Galera server to provide support for writeset replication. In order to successfully plug in, the WSREP interface version of galera library must match with that provided by MariaDB Galera Server. The WSREP interface version of MariaDB Galera Server can be found using version_comment system variable.

MariaDB [test]> select @@version_comment;
+---------------------------------------+
| @@version_comment                     |
+---------------------------------------+
| Source distribution, wsrep_25.9.r3961 |
+---------------------------------------+

So, what happens when interface versions of MariaDB Galera Server and Galera library are different? The Galera plugin would fail to load :

140311 14:14:02 [Note] WSREP: Read nil XID from storage engines, skipping position init
140311 14:14:02 [Note] WSREP: wsrep_load(): loading provider library '/home/packages/galera-23.2.7-src/libgalera_smm.so'
140311 14:14:03 [ERROR] WSREP: provider interface version mismatch: need '25', found '23'
140311 14:14:03 [ERROR] WSREP: wsrep_load(): interface version mismatch: my version 25, provider version 23
140311 14:14:03 [ERROR] WSREP: wsrep_load(/home/packages/galera-23.2.7-src/libgalera_smm.so) failed: Invalid argument (22). Reverting to no provider.
140311 14:14:03 [Note] WSREP: Read nil XID from storage engines, skipping position init
140311 14:14:03 [Note] WSREP: wsrep_load(): loading provider library 'none'
140311 14:14:03 [ERROR] Aborting

Lastly, it is interesting to note that a Galera library can be loaded into the MariaDB Galera Server regardless of its version (2.XX or 3.XX) as long as their WSREP interface versions are same.

Galera: Connection timed out scenarios

While joining an existing Galera cluster, if a node fails to “reach” any of the nodes in the cluster, it generally – aborts – with the following error :

140505 10:01:46 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '127.0.0.1:4567'
140505 10:01:49 [Warning] WSREP: no nodes coming from prim view, prim not possible
140505 10:01:49 [Note] WSREP: view(view_id(NON_PRIM,cb63893e-d45d-11e3-9dee-0a8c1484c284,1) memb {
        cb63893e-d45d-11e3-9dee-0a8c1484c284,0
} joined {
} left {
} partitioned {
})
140505 10:01:49 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.51674S), skipping check
140505 10:02:19 [Note] WSREP: view((empty))
140505 10:02:19 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():141
140505 10:02:19 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -110 (Connection timed out)
140505 10:02:19 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'my_wsrep_cluster' at 'gcomm://127.0.0.1:4567': -110 (Connection timed out)
140505 10:02:19 [ERROR] WSREP: gcs connect failed: Connection timed out
140505 10:02:19 [ERROR] WSREP: wsrep::connect() failed: 7
140505 10:02:19 [ERROR] Aborting

Assuming that there are no network issues, this error may occur due to any of the following scenarios:

Scenario #1: A node tries join a non-existing cluster

This issue and the possible resolution has been discussed here.

Scenario #2: A node with Galera 3 (25.3.xx) wsrep provider tries to join an existing cluster consisting of Galera 2 (25.2.xx) nodes

This results in error because, by default, Galera-2 and Galera-3 uses different checksum algorithms on network packets. While, for Galera-2 its plain CRC32 (socket.checksum=1), Galera-3 uses CRC32-C hw-accelerated (socket.checksum=2). Checksum algorithms can be controlled by socket.checksum galera parameter.

So, the solution to this problem would be to start the Galera-3 node with wsrep_provider_options=’socket.checksum=1′ option to make sure it uses the same checksum algorithm which other nodes of the cluster (Galera-2) are using.

References: http://galeracluster.com/documentation-webpages/galeraparameters.html

Scenario #3: telnet: Unable to connect to remote host: No route to host

Solution: Configure firewall settings :

(a) Flush all the firewall rules: Quick, but advisable for test setups only.
$ sudo iptables -F ## Thanks to Qian Joe for the suggestion!

(b) Allow only specific hosts/ports: Its important to note that by flushing (-F), one would essentially delete all the firewall rules. This is undesirable. Only specific ports & hosts should be allowed instead. (see firewall settings)

EDIT: Do let me know (via comment) if the given resolutions did not work for you and I will be happy to add the failure scenario & resolution here.

Auto increments in Galera cluster

Lets start by considering a scenario where records are being inserted in a single auto-increment table via different nodes of a multi-master cluster. One issue that might arise is ‘collision’ of generated auto-increment values on different nodes, which is precisely the subject of this article.

As the cluster is multi-master, it allows writes on all master nodes. As a result of which a table might get same auto-incremented values on different nodes on INSERTs. This issue is discovered only after the writeset is replicated and that’s a problem!

Galera cluster suffers with the similar problem.

Lets try to emulate this on a 2-node Galera cluster :

1) On node #1:

MariaDB [test]> CREATE TABLE t1(c1 INT AUTO_INCREMENT PRIMARY KEY, c2 INT)ENGINE=InnoDB;
Query OK, 0 rows affected (0.07 sec)

MariaDB [test]> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSERT INTO t1(c2) VALUES (1);
Query OK, 1 row affected (0.05 sec)

2) On node #2:

MariaDB [test]> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSERT INTO t1(c2) VALUES(2);
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> COMMIT;
Query OK, 0 rows affected (0.05 sec)

3) On node #1

MariaDB [test]> COMMIT;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

MariaDB [test]> SELECT * FROM t1;
+----+------+
| c1 | c2   |
+----+------+
|  1 |    2 |
+----+------+
1 row in set (0.00 sec)

As expected, the second commit could not succeed because of the collision.

So, how do we handle this issue? Enter @@auto_increment_increment and @@auto_increment_offset! Using these two system variables one can control the sequence of auto-generated values on a MySQL/MariaDB server. The trick is to set them in such a way that every node in the cluster generates a sequence of non-colliding numbers.

For instance, lets discuss this for a 3-node cluster (n=3):
Node 1: @@auto_increment_increment=3, @@auto_increment_offset=1 => Sequence : 1, 4, 7, 10, ...
Node 2: @@auto_increment_increment=3, @@auto_increment_offset=2 => Sequence : 2, 5, 8, 11, ...
Node 3: @@auto_increment_increment=3, @@auto_increment_offset=3 => Sequence : 3, 6, 9, 12, ...

As you can see, by setting each node’s auto_increment_increment to the total number of nodes (n) in the cluster and auto_increment_offset to a number between [1,n], we can assure that auto-increment values, thus generated, would be unique across the cluster, thus, would avoid any conflict or collision.

In Galera cluster this is already taken care of by default. As and when a node joins the cluster, the two auto-increment variables are adjusted automatically to avoid collision. However, this capability can be controlled by using wsrep_auto_increment_control variable.

Node #1:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 1     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

Node #2:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 2     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

Node #3:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 3     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

With this setting the last COMMIT in the above example would succeed.

What is Galera Arbitrator?

Galera Arbitrator (garbd) is a stateless daemon that can act like a node in a Galera cluster. It is normally used to avoid split-brain situation which mostly occurs because of hardware/link failure, as a result of which the cluster gets divided into two parts and each part remains operational thinking they are in majority (primary component). This may lead to inconsistent data sets. Garbd should be installed on a separate machine. However, it can share the machine running load-balancer. It is interesting to note that as garbd joins the cluster, it makes a request for SST.

Let us now try to add galera arbitrator (garbd) to an existing 2-node MariaDB Galera cluster. Check out this post for steps to setup a MariaDB Galera cluster. We start by connecting to one of the nodes to get the size and name of the cluster.

MariaDB [(none)]> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 2     |
+--------------------+-------+
1 row in set (0.00 sec)

MariaDB [(none)]> show variables like 'wsrep_cluster_name';
+--------------------+------------------+
| Variable_name      | Value            |
+--------------------+------------------+
| wsrep_cluster_name | my_wsrep_cluster |
+--------------------+------------------+
1 row in set (0.00 sec)

Now, lets start the garbd as a daemon:

$ cd galera-23.2.7-src/garb

$ ./garbd
    --address='gcomm://127.0.0.1:4567'
    --options='gmcast.listen_addr=tcp://127.0.0.1:4569'
    --group='my_wsrep_cluster'
    --log=garbd.err
    --daemon

That’s it, garbd should now be running and connected to the cluster. Let’s verify this by checking wsrep_cluster_size again.

MariaDB [test]> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.00 sec)

Great! So, the new garbd is now part of the cluster.

Setting up MariaDB Galera Cluster on Ubuntu

MariaDB Galera Cluster is a multi-master synchronous replication system. In this article, I would be setting up a 3-node cluster on a single machine running Ubuntu. However, in a production scenario it is advised to run each cluster node on a separate box in a WAN.

Requirements

  1. MariaDB Galera Cluster
    • Download it from the official site : https://downloads.mariadb.org/mariadb-galera/, OR
    • Install it using Advanced packaging tool (APT), steps can be found in erkules’ blog post. OR
    • Build it from source : lp:~maria-captains/maria/maria-5.5-galera
      Note : Build would require additional cmake options : WITH_WSREP=ON and WITH_INNODB_DISALLOW_WRITES=1
  2. Galera wsrep provider (libgalera_smm.so)
  3. Some extra Ubuntu packages (in case you choose to build Galera from source!)
    • scons (Build utility, replacement for make)
    • check (Unit test framework for C)
    • libboost-dev
    • libboost-program-options-dev
    • libboost-system-dev (for 23.2.7)
    • libssl-dev

Setup

Now that we have all requirements in place, lets bring up the cluster nodes.

  1. Node#1 : Start 1st node (at port 4001 for instance) with empty cluster address (–wsrep_cluster_address=’gcomm://’).
     $ mysqld
        --no-defaults 
        --basedir=.
        --datadir=./data
        --port=4001
        --socket=/tmp/mysql_4001.sock
        --binlog_format=ROW
        --wsrep_provider=/path-to-galera-provider/libgalera_smm.so
        --wsrep_cluster_address='gcomm://'

    There are certain points to note before we proceed :

    • wsrep_provider option points to the galera wsrep provider, i.e. libgalera_smm.so library.
    • wsrep_cluster_address contains the address of existing cluster members. But in this case, as it is the very first node, the address must be empty.

      Important! In case you are planning to put these options in a config file (my.cnf) – once the cluster is up and running, make sure to change this option to hold a valid address. Failure to do so would disable the node’s capability to auto-join the cluster in case it restarts after a shutdown or crash.

    • From the server’s error log make a note of the base_host & base_port (default 4567) of wsrep. This information would be required to start the subsequent nodes.
  2. Node#2 : Start 2nd node at a different port (4002).
     $ mysqld
        --no-defaults
        --basedir=.
        --datadir=./data
        --port=4002
        --socket=/tmp/mysql_4002.sock
        --binlog_format=ROW
        --wsrep_provider=/path-to-galera-provider/libgalera_smm.so
        --wsrep_cluster_address='gcomm://127.0.0.1:4567'
        --wsrep_provider_options='gmcast.listen_addr=tcp://127.0.0.1:4568'

    Here, we have to specify the address of 1st node via wsrep_cluser_address option. It consists of base_host & base_port of the 1st node that we noted earlier in step 1 (i.e. gcomm://127.0.0.1:4567). Also, as we are starting this 2nd node on the same machine, in order to avoid port conflict, we must provide a different port for wsrep provider to listen to via gmcast.listen_addr wsrep provider option (tcp://127.0.0.1:4568).

  3. Node#3 : As with 2nd node, 3rd (and all subsequent nodes) can be started by same set of options with appropriate selection of available ports.
     $ mysqld
        --no-defaults
        --basedir=.
        --datadir=./data
        --port=4003
        --socket=/tmp/mysql_4003.sock
        --binlog_format=ROW
        --wsrep_provider=/path-to-galera-provider/libgalera_smm.so
        --wsrep_cluster_address='gcomm://127.0.0.1:4567'
        --wsrep_provider_options='gmcast.listen_addr=tcp://127.0.0.1:4569'

The cluster should now be up and running with 3 nodes. This can easily be verified and monitored further by inspecting server’s status & system variables :

MariaDB [(none)]> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.06 sec)

MariaDB [(none)]> show variables like 'wsrep%';
...
MariaDB [(none)]> show status like 'wsrep%';
...

Before I close, let me list out some important options which were omitted from this article for brevity :

  • default_storage_engine=INNODB
  • innodb_autoinc_lock_mode=2
  • innodb_locks_unsafe_for_binlog=1
  • wsrep_sst_auth=”user:pass” : to be used by SST (Snapshot state transfer) script

That’s all for now!