New Information schema tables for Galera membership and status

MariaDB Galera server logs all the cluster related information like node status, cluster status, membership, etc. in the error log. MariaDB 10.1.2 introduces a new INFORMATION SCHEMA plugin WSREP_INFO that enables querying these information via INFORMATION SCHEMA tables. The WSREP_INFO plugin adds two new tables to the Information Schema, WSREP_MEMBERSHIP and WSREP_STATUS. The plugin is not enabled by default, so in order to use it, it needs to be installed first :

MariaDB [test]> INSTALL SONAME 'wsrep_status';
Query OK, 0 rows affected (0.04 sec)

MariaDB [test]> SHOW PLUGINS;
+-----------------------------+----------+--------------------+-----------------+---------+
| Name                        | Status   | Type               | Library         | License |
+-----------------------------+----------+--------------------+-----------------+---------+
...
| WSREP_MEMBERSHIP            | ACTIVE   | INFORMATION SCHEMA | wsrep_status.so | GPL     |
| WSREP_STATUS                | ACTIVE   | INFORMATION SCHEMA | wsrep_status.so | GPL     |
+-----------------------------+----------+--------------------+-----------------+---------+
52 rows in set (0.01 sec)

MariaDB [test]> SHOW CREATE TABLE INFORMATION_SCHEMA.WSREP_MEMBERSHIP\G
*************************** 1. row ***************************
       Table: WSREP_MEMBERSHIP
Create Table: CREATE TEMPORARY TABLE `WSREP_MEMBERSHIP` (
  `INDEX` int(11) NOT NULL DEFAULT '0',
  `UUID` varchar(36) NOT NULL DEFAULT '',
  `NAME` varchar(32) NOT NULL DEFAULT '',
  `ADDRESS` varchar(256) NOT NULL DEFAULT ''
) ENGINE=MEMORY DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

MariaDB [test]> SHOW CREATE TABLE INFORMATION_SCHEMA.WSREP_STATUS\G
*************************** 1. row ***************************
       Table: WSREP_STATUS
Create Table: CREATE TEMPORARY TABLE `WSREP_STATUS` (
  `NODE_INDEX` int(11) NOT NULL DEFAULT '0',
  `NODE_STATUS` varchar(16) NOT NULL DEFAULT '',
  `CLUSTER_STATUS` varchar(16) NOT NULL DEFAULT '',
  `CLUSTER_SIZE` int(11) NOT NULL DEFAULT '0',
  `CLUSTER_STATE_UUID` varchar(36) NOT NULL DEFAULT '',
  `CLUSTER_STATE_SEQNO` bigint(21) NOT NULL DEFAULT '0',
  `CLUSTER_CONF_ID` bigint(21) NOT NULL DEFAULT '0',
  `GAP` varchar(10) NOT NULL DEFAULT '',
  `PROTOCOL_VERSION` int(11) NOT NULL DEFAULT '0'
) ENGINE=MEMORY DEFAULT CHARSET=utf8
1 row in set (0.00 sec)

MariaDB [test]> SELECT @@wsrep_provider;
+----------------------------------+
| @@wsrep_provider                 |
+----------------------------------+
| /usr/lib/galera/libgalera_smm.so |
+----------------------------------+
1 row in set (0.00 sec)

Now that WSREP_INFO plugin is installed, lets look into the contents of I_S tables on a 3-node cluster.

MariaDB [test]> SELECT * FROM INFORMATION_SCHEMA.WSREP_MEMBERSHIP;
+-------+--------------------------------------+----------+-----------------+
| INDEX | UUID                                 | NAME     | ADDRESS         |
+-------+--------------------------------------+----------+-----------------+
|     0 | 19058073-8940-11e4-8570-16af7bf8fced | my_node1 | 10.0.2.15:16001 |
|     1 | 19f2b0e0-8942-11e4-9cb8-b39e8ee0b5dd | my_node3 | 10.0.2.15:16003 |
|     2 | d85e62db-8941-11e4-b1ef-4bc9980e476d | my_node2 | 10.0.2.15:16002 |
+-------+--------------------------------------+----------+-----------------+
3 rows in set (0.00 sec)

MariaDB [test]> SELECT * FROM INFORMATION_SCHEMA.WSREP_STATUS\G
*************************** 1. row ***************************
         NODE_INDEX: 0
        NODE_STATUS: Synced
     CLUSTER_STATUS: Primary
       CLUSTER_SIZE: 3
 CLUSTER_STATE_UUID: 190604d7-8940-11e4-a084-ebee5211c190
CLUSTER_STATE_SEQNO: 2
    CLUSTER_CONF_ID: 3
                GAP: NO
   PROTOCOL_VERSION: 3
1 row in set (0.00 sec)

As seen above, WSREP_MEMBERSHIP table shows information about current members in the cluster which includes node’s name and incoming address. WSREP_STATUS table, on the other hand, shows status information about the node and cluster as a whole.

SHOW command can also be used to query these tables. Its quick and reduces the number of columns for WSREP_STATUS to fit to the screen.

MariaDB [test]> SHOW WSREP_MEMBERSHIP;
+-------+--------------------------------------+----------+-----------------+
| Index | Uuid                                 | Name     | Address         |
+-------+--------------------------------------+----------+-----------------+
|     0 | 19058073-8940-11e4-8570-16af7bf8fced | my_node1 | 10.0.2.15:16001 |
|     1 | 19f2b0e0-8942-11e4-9cb8-b39e8ee0b5dd | my_node3 | 10.0.2.15:16003 |
|     2 | d85e62db-8941-11e4-b1ef-4bc9980e476d | my_node2 | 10.0.2.15:16002 |
+-------+--------------------------------------+----------+-----------------+
3 rows in set (0.00 sec)

MariaDB [test]> SHOW WSREP_STATUS;
+------------+-------------+----------------+--------------+
| Node_Index | Node_Status | Cluster_Status | Cluster_Size |
+------------+-------------+----------------+--------------+
|          0 | Synced      | Primary        |            3 |
+------------+-------------+----------------+--------------+
1 row in set (0.00 sec)

The curious case of lost data

There is no doubt that Galera makes a fine clustering solution. The ease at which one can setup a cluster is incredible. Recently, a query from a user on #mariadb@freenode, however, revealed a caveat which I feel is quite important and worth sharing.

Sep 10 14:16:09 <____> nirbhay: executive summary, my cluster was out of sync, and when i brought it back into sync the DB is now 3 weeks old. it updated the new node with the old node’s contents.
Sep 10 14:16:23 <____> nirbhay: galera + maria 5.5

Let me first try to explain the problem. The user has a MariaDB Galera cluster – up and running – but with stale data (3 weeks old to be precise) and another MariaDB Galera node running outside the cluster storing latest updates. Now, the user takes this node and adds to the cluster, thinking that out-of-sync cluster nodes would receive latest changes from this joining node. Unfortunately, it turned out that opposite happened – and this joiner node instead became stale (losing 3 weeks worth of updates!) when into synced with the cluster.

Actually, this is expected. When a node joins an existing Galera cluster, it receives data from one of the nodes (donor) of the cluster in order to get in sync with the cluster. During this process (aka snapshot state transfer) the node itself loses it own data (if any). This is precisely the reason why the joiner node (with latest updates) lost the recent changes in the above mentioned scenario.

So, how to handle such situations? In cases like this, one should discard the existing stale cluster and bootstrap a new cluster using the node having latest updates.

Snapshot State Transfer (SST) methods in MariaDB Galera Cluster

In a Galera cluster, when a node (joiner) joins the cluster, it receives a copy of entire data from one of the nodes (donor) in the cluster. This process is called Snapshot state transfer (SST). MariaDB Galera Cluster provides various methods for snapshot state transfer which can be configured using wsrep_sst_method option. MariaDB Galera cluster distribution currently includes the following SST methods :

  • rsync
  • It is the default method which uses rsync to transfer data files across the cluster. The donor node itself becomes READ-ONLY during the transfer by executing FLUSH TABLES WITH READ LOCK. It is important to note that even though the data transfer is fast, this method requires no authentication (wsrep_sst_auth).

    rsync sst

  • mysqldump
  • This method uses mysqldump tool to get a dump of all the databases on donor node which is then played/executed on the joiner node. This method requires wsrep_sst_auth to be set with credentials to connect to the donor as well as joiner node.

    mysqldump sst

  • xtrabackup/xtrabackup-v2
  • This method uses Percona XtraBackup tool to take a backup (snapshot) of donor’s data directory which is then restored on the joiner node. This method requires wsrep_sst_auth to be set with credentials to connect to the donor node.

    xtrabackup-v2 sst

Configuring SELinux for Galera cluster

Setting up a MariaDB Galera cluster can easily become tedious if its being setup on a Linux system with SELinux enabled.

140805 16:16:20 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer ''
140805 16:16:20 [ERROR] WSREP: Permission denied
140805 16:16:20 [ERROR] WSREP: failed to open gcomm backend connection: 13: error while trying to listen 'tcp://0.0.0.0:4567?socket.non_blocking=1', asio error 'Permission denied': 13 (Permission denied)
         at gcomm/src/asio_tcp.cpp:listen():814
140805 16:16:20 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -13 (Permission denied)
140805 16:16:20 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'my_wsrep_cluster' at 'gcomm://': -13 (Permission denied)
140805 16:16:20 [ERROR] WSREP: gcs connect failed: Permission denied
140805 16:16:20 [ERROR] WSREP: wsrep::connect() failed: 7
140805 16:16:20 [ERROR] Aborting

In a test environment one can bypass this by simply disabling SELinux :

$ sudo setenforce 0
$ getenforce
   permissive

But, this certainly is not the right way especially when we are dealing with a cluster in production.

http://stopdisablingselinux.com/

So, I tried to configure SELinux for a 2-node MariaDB Galera cluster on CentOS by using some useful tools provided to manage SELinux policies. The basic idea is to let the MariaDB Galera nodes run under permissive mode in order to get all possible operations (which SELinux would have otherwise denied) logged into the audit log and then create a policy module using allow2audit tool after carefully analyzing the “denials”. This module can then be installed before enabling (enforcing) SELinux.

Prepare hosts 1 & 2
* Install MariaDB Galera server packages ($ sudo yum install MariaDB-Galera-server)
* Setup MariaDB configuration options ($sudo vi /etc/my.cnf.d/server.cnf)
* Install SELinux policy management tools ($ sudo yum install policycoreutils-python)
* Firewall settings (see resources below)

On host 1:
* Put SELinux in permissive mode. ($ sudo setenforce 0) : Once SELinux is in permissive mode, it basically logs all the denial operations as warnings instead of enforcing them.
* Start the server (the donor node) ($ sudo /etc/init.d/mysql start)
* Wait for the joiner node (node 2) to start and join the cluster post-SST (snapshot state transfer).
* Stop the node, edit the configuration to make it join the already running node 2 and start it back again to initiate SST this time in reverse direction.
* Once the node has successfully started and joined the cluster, carefully analyze (sealert Messages) all the “denials” logged in the audit log (/var/log/audit/audit.log). If the denials are expected, create a local policy module using allow2audit.

  $ sudo grep mysql audit.log | audit2allow -M galera
  $ sudo semodule -i galera.pp
  • Put SELinux back to enforcing mode ($sudo setenforce 1)

On host 2:
* Follow the steps performed for node 1. Note: the node will be started as the “joiner node” and would later become the “donor”.

Resources

Galera: runtime adjustment of applier threads

In a MariaDB Galera node, writesets received from other node(s) can be applied parallely through multiple applier threads. The number of slave applier threads is controlled by server’s wsrep_slave_threads system variable. Its a dynamic variable and thus the number of slave applier threads can be adjusted in runtime. The current number of slave applier threads can be checked through “SHOW PROCESSLIST” or wsrep_thread_count status variable (MDEV-6206).

One interesting point to note here is that when the number of @@global.wsrep_slave_threads is increased at runtime, the additional requested applier threads gets spawned immediately. However, when the number is decreased, the effect can not be noticed immediately. What happens internally is that when the number is decreased, the extra applier threads are not killed right away. The process is deferred, and the extra threads exit gracefully only after each apply one last writeset (transaction) after receiving it. So, one will not notice the number of applier threads decreasing on an idle node. The thread count will decrease only after the node starts receiving writesets to apply.

Here are some snippets to help you understand this further :

1. Calculate the change.

  wsrep_slave_count_change += (var->value->val_int() - wsrep_slave_threads);

2a: If its positive, spawn new applier threads.

  if (wsrep_slave_count_change > 0)
  {
    wsrep_create_appliers(wsrep_slave_count_change);
    wsrep_slave_count_change = 0;
  }

2b: else mark the thread as “done” after it has applied (commits or rollbacks) the given last writeset.

wsrep_cb_status_t wsrep_commit_cb(void*         const     ctx,
                                  uint32_t      const     flags,
                                  const wsrep_trx_meta_t* meta,
                                  wsrep_bool_t* const     exit,
                                  bool          const     commit)
{
...
  if (commit)
    rcode = wsrep_commit(thd, meta->gtid.seqno);
  else
    rcode = wsrep_rollback(thd, meta->gtid.seqno);

...

  if (wsrep_slave_count_change < 0 && commit && WSREP_CB_SUCCESS == rcode)
  {
    mysql_mutex_lock(&LOCK_wsrep_slave_threads);
    if (wsrep_slave_count_change < 0)
    {
      wsrep_slave_count_change++;
      *exit = true;
    }
...

And the thread exits :

static void wsrep_replication_process(THD *thd)
{
...
  rcode = wsrep->recv(wsrep, (void *)thd);
  DBUG_PRINT("wsrep",("wsrep_repl returned: %d", rcode));

  WSREP_INFO("applier thread exiting (code:%d)", rcode);
...

APT: How to prefer local packages over remote?

I recently had to test some of my locally generated MariaDB debian packages. So, created a local repository of generated packages and added it to sources.list file. However, since these packages required some other packages stored on a mirror, I had to add the mirror to the sources.list file as well. Since, this mirror also contained the packages that I intended to test, now when I try to install the packages, APT would always pick/prefer the ones stored on the mirror. How to solve this? How to make APT prefer the local packages instead? Lets start by taking a peek into the sources.list file:

$cat /etc/apt/sources.list
..
# remote repo
deb http://mirror.jmu.edu/pub/mariadb/repo/5.5/ubuntu precise main

# local repo
deb file:///home/nirbhay/project/repo/maria/testing/debian/5.5/debs binary/
..

$ sudo apt-cache policy mariadb-galera-server
mariadb-galera-server:
  Installed: (none)
  Candidate: 5.5.37+maria-1~precise
  Version table:
     5.5.37+maria-1~precise 0
        500 file:/home/nirbhay/project/repo/maria/testing/debian/5.5/debs/ binary/ Packages
        500 http://mirror.jmu.edu/pub/mariadb/repo/5.5/ubuntu/ precise/main i386 Packages

The following tips can help you fix this problem :

  1. If remote and local packages have the same version (as in my case), place one that you want to be preferred higher in sources.list file.

  2. APT prefers authenticated repository over unauthenticated. So, as against the above case, even if the local repository is placed over the remote, APT will prefer remote one if its authenticated and the local repository is not. In that case, –allow-unauthenticated can be used to make local packages take precedence.

  3. In case local and remote packages have different versions, APT would always prefer the package with higher version. However, this rule can be eased by apt-pinning, where a higher priority is assigned to a particular repository. For example, local repository can be pinned with higher priority (or precedence in APT’s context) by adding a file under /etc/apt/preferences.d with .pref extension and the following content :

    Package: *
    Pin: origin ""
    Pin-Priority: 1000
    

    This has been explained really well in this thread.

Lastly, do not forget to run “apt-get update” for changes to take effect.

Generating SSH key pair

SSH key pair is a set of private/public keys normally used in securing network communication. These keys are normally required for passwordless SSH login to a remote host running SSH daemon (sshd). Here is how you would generate a pair RSA keys:

$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/nirbhay/.ssh/id_rsa):
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /home/nirbhay/.ssh/id_rsa.
Your public key has been saved in /home/nirbhay/.ssh/id_rsa.pub.
The key fingerprint is:
5f:1a:b5:50:a8:b6:d6:2b:48:1b:b6:df:4c:54:a2:28 nirbhay@nirbhay-VirtualBox
The key's randomart image is:
+--[ RSA 2048]----+
|           ..    |
|          ..     |
|         .o o    |
...
|       .o. = .   |
|       ...o      |
+-----------------+
$ ls ~/.ssh/
id_rsa  id_rsa.pub

Now that we have the private/public key files, all you need to do is copy/append the public key (id_rsa.pub) contents to the remote machine’s ~/.ssh/authorized_keys (600) file. DO NOT share the “private key”.

Note: On debian-based distributions, ssh-keygen is provided by openssh-client package.

Generating SSL certificates

** The (self-signed) SSL certificate generated by the procedure mentioned in this article should be used for testing purpose only **

Generating an SSL certificate is very simple. All you need is openssl package installed on your system. A key point to note here is that SSL certificates contain public key, which is always generated in pair with a private key. Here is the step-by-step procedure to generate one:

  1. Private key
    Lets first generate a 2048-bit RSA private key.

    openssl genrsa -out privkey.pem 2048

    $ openssl genrsa -out privkey.pem 2048
    Generating RSA private key, 2048 bit long modulus
    ........................+++
    ................+++
    unable to write 'random state'
    e is 65537 (0x10001)
    $ ls
    privkey.pem
    

    So, we have the private key in place. This will be used to generate the certificate.

    If “unable to write ‘random state’” bothers you, then check this out for a possible solution: http://stackoverflow.com/a/94458

  2. The self-signed certificate
    A certificate can now be generated using the following command.

    openssl req -new -x509 -key privkey.pem -out cacert.pem -days 1095

    $ openssl req -new -x509 -key privkey.pem -out cacert.pem -days 1095
    You are about to be asked to enter information that will be incorporated
    into your certificate request.
    What you are about to enter is what is called a Distinguished Name or a DN.
    There are quite a few fields but you can leave some blank
    For some fields there will be a default value,
    If you enter '.', the field will be left blank.
    -----
    Country Name (2 letter code) [AU]:XX
    State or Province Name (full name) [Some-State]:XX
    Locality Name (eg, city) []:XX
    Organization Name (eg, company) [Internet Widgits Pty Ltd]:XX
    Organizational Unit Name (eg, section) []:XX
    Common Name (e.g. server FQDN or YOUR name) []:XX
    Email Address []:XX
    
    $ ls
    cacert.pem  privkey.pem
    
    

These files can easily be tested by starting a test SSL/TLS server (s_server(1) and connecting to it using a client (s_client(1)).

Server:
$ openssl s_server -port <port> -cert /path/to/cacert.pem -key /path/to/privkey.pem

Client:
$ openssl s_client -host <server-host> -port <server-port> -key /path/to/privkey.pem

Reference : https://www.openssl.org/docs/HOWTO/
Thats all!