# Systemd: Managing multiple MariaDB instances

Many Linux distributions now come pre-installed with Systemd. MariaDB added support for Systemd in version 10.1. This support mostly came as a contribution from MariaDB community. In this post, I will discuss how to manage multiple instances of MariaDB server using Systemd. Once MariaDB packages are installed, the Systemd unit files can be located under /lib/systemd/system/.

As we are going to deal with multiple instances, we are interested in the last unit file. With the current implementation, there are two ways to handle multiple instances.

1. Individual configuration files for each instance

This is the default method for which you do not have to make any changes to the unit file. All you need is create separate configuration files for each MariaDB instance under the standard MariaDB configuration directory /etc/mysql/conf.d/ named with distinct suffixes (aka instance name) but common prefix “my”.

For instance, in my test setup, I have the following two option files:

That’s it. You can now start MariaDB instances using the following command:

So, why should configuration files have common prefix “my” but different suffixes? If you look into the unit file, the command used to start a MariaDB instance uses “%I” in –defaults-file option. This gets replaced by the instance name – the suffix used in the above command. Thus, MariaDB gets started with the corresponding configuration file.

2. Using separate group suffixes in single configuration file

This method makes use of MariaDB’s –defaults-group-suffix option. So, you can add options for all the instances in a single configuration file, such that options with values unique to a particular instance are placed in a group with a different suffix. The instructions related to this method are commented by default in the multi-instance unit file. So in order to use this method, the multi-instance unit file (/lib/systemd/system/mariadb@.service) needs to be modified.

• Comment – ConditionPathExists=/etc/mysql/conf.d/my%I.cnf
• Comment all ExecStartPre and ExecStart commands with –defaults-file
• Uncomment all ExecStartPre and ExecStart commands with –defaults-group-suffix

As you might have guessed by now, in this method the instance name “%I” is used in the –defaults-group-suffix (unlike the first method).

Thus, each instance is started with options from the group with specified suffix along with the option listed under common group(s) like [mariadb], [mysqld], [server] or [galera].

For example, the configuration file in my test setup has the following groups :

That’s all. Now, the same set of commands can be used to start/stop/restart/status MariaDB instances.

A word of caution: DO NOT use period in group suffix, as that will cause set-environment to fail.

Recent bug fixes:

• MDEV-10004: Galera’s pc.recovery process fails in 10.1 with systemd
• MDEV-10145: Systemd fails to start mysqld in multi-instance mode

While looking into a replication-related issue, I came across an interesting problem where I wanted to get the name of the largest event in a very large binary log file. So, I thought of putting Vaquita’s binary log API to a test. Here’s a simple program to read a binary log file and print the largest event and its size.

In this case, having RawEvents is sufficient as we are not interested in the actual content of the events.
Wiki : https://github.com/vaquita/mysql/wiki/Replication-Binary-Logs

Edit: Updated the above program to account for a recent change done to RawEvent(), which now returns error.

# Securing MariaDB Galera Cluster using SSL

In MariaDB Galera cluster, the data (writeset) is replicated across all the nodes over network. This data traffic can be secured by enabling SSL connection between the nodes. In order to achieve this, one needs to have SSL certificate/key pair (Here’s how to generate one). The certificate and key can then be copied to all the nodes. Once done, instruct the nodes to establish SSL connection on startup by simply pointing to the certificate/key files using wsrep_provider_options system variable.

wsrep_provider_options='socket.ssl_cert=/tmp/cert.pem;socket.ssl_key=/tmp/key.pem'

Following is the list of SSL options that Galera supports (details : Galera parameters:

• socket.ssl : Enable/disable SSL, explicitly
• socket.ssl_ca : SSL CA file
• socket.ssl_cert : SSL certificate file
• socket.ssl_cipher : SSL cipher list
• socket.ssl_compression : Enable/disable SSL compression
• socket.ssl_key : SSL key file
• socket.ssl_password_file : SSL password file, in case the key is encrypted

At the time of writing, there is no way to check if galera connection is encrypted using SQL (issue#165). The only way is to look into the node’s error log for the following :

150516 14:22:03 [Note] WSREP: SSL handshake successful, remote endpoint ssl://127.0.0.1:46661 local endpoint ssl://127.0.0.1:4000 cipher: AES128-SHA compression:
150516 14:22:03 [Note] WSREP: (6f49f928, 'ssl://0.0.0.0:4000') turning message relay requesting on, nonlive peers:
150516 14:22:03 [Note] WSREP: declaring 73263ae6 at ssl://127.0.0.1:4010 stable

Lastly, it is important to note that SST (snapshot state transfer) traffic is not affected, whatsoever, by the use of galera SSL options.

# MariaDB 10.1 : A new version of GRA_X_X.log file

When a MariaDB Galera cluster node fails to apply a binary log (or writeset, as we call it), the node dumps it into a file (GRA_X_X.log) under the data directory for further investigation. This process has been explained fairly well in this Percona blog). Since the dumped log file is header-less, one has to first prepend a binary log header to it in order to open it using mysqlbinlog tool.

Starting MariaDB 10.1.4, the node will automatically prepend the binlog header to the writeset before dumping it into the GRA_ log file. In order to differentiate it from the older log files, the file has been renamed to GRA_X_X_v2.log.

$./bin/mysqlbinlog data2/GRA_1_1_v2.log /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/; /*!40019 SET @@session.max_insert_delayed_threads=0*/; /*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/; DELIMITER /*!*/; # at 4 #150414 17:50:20 server id 0 end_log_pos 248 Start: binlog v 4, server v 10.1.4-MariaDB-wsrep-debug created 150414 17:50:20 at startup ROLLBACK/*!*/; BINLOG ' nIstVQ8AAAAA9AAAAPgAAAAAAAQAMTAuMS40LU1hcmlhREItd3NyZXAtZGVidWcAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAACciy1VEzgNAAgAEgAEBAQEEgAA3AAEGggAAAAICAgCAAAACgoKAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAEEwQAZxODxw== '/*!*/; # at 248 #150414 17:50:20 server id 0 end_log_pos 76 Query thread_id=4 exec_time=0 error_code=0 use test/*!*/; SET TIMESTAMP=1429048220/*!*/; SET @@session.pseudo_thread_id=4/*!*/; SET @@session.foreign_key_checks=1, @@session.sql_auto_is_null=0, @@session.unique_checks=1, @@session.autocommit=1/*!*/; SET @@session.sql_mode=0/*!*/; SET @@session.auto_increment_increment=1, @@session.auto_increment_offset=1/*!*/; /*!\C utf8 *//*!*/; SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/; SET @@session.lc_time_names=0/*!*/; SET @@session.collation_database=DEFAULT/*!*/; drop table t1 /*!*/; DELIMITER ; # End of log file ROLLBACK /* added by mysqlbinlog */; /*!50003 SET COMPLETION_TYPE=@OLD_COMPLETION_TYPE*/; /*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=0*/; # Password validation plugins in MariaDB Let me start off with a phrase : “A chain is only as strong as its weakest link”. So, how to ensure that all the links (=passwords) are strong enough to keep the system secure? One of the key attributes to consider here is password strength. MariaDB 10.1.2 added support for password validation by introducing a password validation plugin API and two password validation plugins. These plugins can be used to ensure that the password used for the user accounts adhere to some required security standards. 1. simple_password_check 2. cracklib_password_check (requires libcrack2) 3. So, what passwords are now forbidden? Lets check : Password validation plugins only validate plain-text password (for obvious reasons!). So commands that contain password hashes are not validated. In order to reject such commands strict_password_validation system variable can be used. Lastly, it important to note that multiple password validation plugins can be loaded at the same time and the password must pass on all the plugins. # Understanding split brain in a Galera cluster Split brain is a condition when a cluster gets partitioned and each part is operating independently. This is an unwanted situation that one would always want to avoid. So, how is it handled in a MariaDB Galera cluster? In order to understand this, let’s first start by looking into the logs of a node from a split-brain(ed) cluster. In the event of a network partition, some nodes of the cluster may no longer be reachable from the other nodes. They try to reconnect to these suspecting nodes and later move them to partitioned list by marking them as inactive when no response is received. A voting for quorum is then taken on each node to see if they belong to the majority partition (Primary Component) using the following formula : $\frac{\sum&space;\left&space;(&space;p_{i}&space;\times&space;w_{i}&space;\right&space;)&space;-&space;\sum&space;\left&space;(&space;l_{j}&space;\times&space;w_{j}&space;\right&space;)}{2}&space;<&space;\sum&space;\left&space;(&space;m_{k}&space;\times&space;w_{k}&space;\right&space;)$ where, In a Galera cluster, nodes outside the primary component are not allowed to process queries. It is mainly done preserve data consistency. MariaDB [test]> select 1; ERROR 1047 (08S01): WSREP has not yet prepared node for application use Now, as shown in the logs above, when the cluster gets split into two partitions of equal size, (i.e. both the partitions get equal weight, split-brain), the quorum algorithm fails find the the primary component. As a result, the cluster has no primary component and can no longer process any queries. This can be resolved by finding the node with most recent updates and bootstraping the cluster using that node. # New Information schema tables for Galera membership and status MariaDB Galera server logs all the cluster related information like node status, cluster status, membership, etc. in the error log. MariaDB 10.1.2 introduces a new INFORMATION SCHEMA plugin WSREP_INFO that enables querying these information via INFORMATION SCHEMA tables. The WSREP_INFO plugin adds two new tables to the Information Schema, WSREP_MEMBERSHIP and WSREP_STATUS. The plugin is not enabled by default, so in order to use it, it needs to be installed first : MariaDB [test]> INSTALL SONAME 'wsrep_status'; Query OK, 0 rows affected (0.04 sec) MariaDB [test]> SHOW PLUGINS; +-----------------------------+----------+--------------------+-----------------+---------+ | Name | Status | Type | Library | License | +-----------------------------+----------+--------------------+-----------------+---------+ ... | WSREP_MEMBERSHIP | ACTIVE | INFORMATION SCHEMA | wsrep_status.so | GPL | | WSREP_STATUS | ACTIVE | INFORMATION SCHEMA | wsrep_status.so | GPL | +-----------------------------+----------+--------------------+-----------------+---------+ 52 rows in set (0.01 sec) MariaDB [test]> SHOW CREATE TABLE INFORMATION_SCHEMA.WSREP_MEMBERSHIP\G *************************** 1. row *************************** Table: WSREP_MEMBERSHIP Create Table: CREATE TEMPORARY TABLE WSREP_MEMBERSHIP ( INDEX int(11) NOT NULL DEFAULT '0', UUID varchar(36) NOT NULL DEFAULT '', NAME varchar(32) NOT NULL DEFAULT '', ADDRESS varchar(256) NOT NULL DEFAULT '' ) ENGINE=MEMORY DEFAULT CHARSET=utf8 1 row in set (0.00 sec) MariaDB [test]> SHOW CREATE TABLE INFORMATION_SCHEMA.WSREP_STATUS\G *************************** 1. row *************************** Table: WSREP_STATUS Create Table: CREATE TEMPORARY TABLE WSREP_STATUS ( NODE_INDEX int(11) NOT NULL DEFAULT '0', NODE_STATUS varchar(16) NOT NULL DEFAULT '', CLUSTER_STATUS varchar(16) NOT NULL DEFAULT '', CLUSTER_SIZE int(11) NOT NULL DEFAULT '0', CLUSTER_STATE_UUID varchar(36) NOT NULL DEFAULT '', CLUSTER_STATE_SEQNO bigint(21) NOT NULL DEFAULT '0', CLUSTER_CONF_ID bigint(21) NOT NULL DEFAULT '0', GAP varchar(10) NOT NULL DEFAULT '', PROTOCOL_VERSION int(11) NOT NULL DEFAULT '0' ) ENGINE=MEMORY DEFAULT CHARSET=utf8 1 row in set (0.00 sec) MariaDB [test]> SELECT @@wsrep_provider; +----------------------------------+ | @@wsrep_provider | +----------------------------------+ | /usr/lib/galera/libgalera_smm.so | +----------------------------------+ 1 row in set (0.00 sec) Now that WSREP_INFO plugin is installed, lets look into the contents of these tables on a 3-node cluster. MariaDB [test]> SELECT * FROM INFORMATION_SCHEMA.WSREP_MEMBERSHIP; +-------+--------------------------------------+----------+-----------------+ | INDEX | UUID | NAME | ADDRESS | +-------+--------------------------------------+----------+-----------------+ | 0 | 19058073-8940-11e4-8570-16af7bf8fced | my_node1 | 10.0.2.15:16001 | | 1 | 19f2b0e0-8942-11e4-9cb8-b39e8ee0b5dd | my_node3 | 10.0.2.15:16003 | | 2 | d85e62db-8941-11e4-b1ef-4bc9980e476d | my_node2 | 10.0.2.15:16002 | +-------+--------------------------------------+----------+-----------------+ 3 rows in set (0.00 sec) MariaDB [test]> SELECT * FROM INFORMATION_SCHEMA.WSREP_STATUS\G *************************** 1. row *************************** NODE_INDEX: 0 NODE_STATUS: Synced CLUSTER_STATUS: Primary CLUSTER_SIZE: 3 CLUSTER_STATE_UUID: 190604d7-8940-11e4-a084-ebee5211c190 CLUSTER_STATE_SEQNO: 2 CLUSTER_CONF_ID: 3 GAP: NO PROTOCOL_VERSION: 3 1 row in set (0.00 sec) As seen above, WSREP_MEMBERSHIP table shows information about current members in the cluster which includes node’s name and incoming address. WSREP_STATUS table, on the other hand, shows status information about the node and cluster as a whole. SHOW command can also be used to query these tables. Its quick and reduces the number of columns for WSREP_STATUS to fit to the screen. MariaDB [test]> SHOW WSREP_MEMBERSHIP; +-------+--------------------------------------+----------+-----------------+ | Index | Uuid | Name | Address | +-------+--------------------------------------+----------+-----------------+ | 0 | 19058073-8940-11e4-8570-16af7bf8fced | my_node1 | 10.0.2.15:16001 | | 1 | 19f2b0e0-8942-11e4-9cb8-b39e8ee0b5dd | my_node3 | 10.0.2.15:16003 | | 2 | d85e62db-8941-11e4-b1ef-4bc9980e476d | my_node2 | 10.0.2.15:16002 | +-------+--------------------------------------+----------+-----------------+ 3 rows in set (0.00 sec) MariaDB [test]> SHOW WSREP_STATUS; +------------+-------------+----------------+--------------+ | Node_Index | Node_Status | Cluster_Status | Cluster_Size | +------------+-------------+----------------+--------------+ | 0 | Synced | Primary | 3 | +------------+-------------+----------------+--------------+ 1 row in set (0.00 sec) # The curious case of lost data There is no doubt that Galera makes a fine clustering solution. The ease at which one can setup a cluster is incredible. Recently, a query from a user on #mariadb@freenode, however, revealed a caveat which I feel is quite important and worth sharing. Sep 10 14:16:09 <____> nirbhay: executive summary, my cluster was out of sync, and when i brought it back into sync the DB is now 3 weeks old. it updated the new node with the old node’s contents. Sep 10 14:16:23 <____> nirbhay: galera + maria 5.5 Let me first try to explain the problem. The user has a MariaDB Galera cluster – up and running – but with stale data (3 weeks old to be precise) and another MariaDB Galera node running outside the cluster storing latest updates. Now, the user takes this node and adds to the cluster, thinking that out-of-sync cluster nodes would receive latest changes from this joining node. Unfortunately, it turned out that opposite happened – and this joiner node instead became stale (losing 3 weeks worth of updates!) when into synced with the cluster. Actually, this is expected. When a node joins an existing Galera cluster, it receives data from one of the nodes (donor) of the cluster in order to get in sync with the cluster. During this process (aka snapshot state transfer) the node itself loses it own data (if any). This is precisely the reason why the joiner node (with latest updates) lost the recent changes in the above mentioned scenario. So, how to handle such situations? In cases like this, one should discard the existing stale cluster and bootstrap a new cluster using the node having latest updates. # Snapshot State Transfer (SST) methods in MariaDB Galera Cluster In a Galera cluster, when a node (joiner) joins the cluster, it receives a copy of entire data from one of the nodes (donor) in the cluster. This process is called Snapshot state transfer (SST). MariaDB Galera Cluster provides various methods for snapshot state transfer which can be configured using wsrep_sst_method option. MariaDB Galera cluster distribution currently includes the following SST methods : • rsync • It is the default method which uses rsync to transfer data files across the cluster. The donor node itself becomes READ-ONLY during the transfer by executing FLUSH TABLES WITH READ LOCK. It is important to note that even though the data transfer is fast, this method requires no authentication (wsrep_sst_auth). • mysqldump • This method uses mysqldump tool to get a dump of all the databases on donor node which is then played/executed on the joiner node. This method requires wsrep_sst_auth to be set with credentials to connect to the donor as well as joiner node. • xtrabackup/xtrabackup-v2 • This method uses Percona XtraBackup tool to take a backup (snapshot) of donor’s data directory which is then restored on the joiner node. This method requires wsrep_sst_auth to be set with credentials to connect to the donor node. # Configuring SELinux for Galera cluster Setting up a MariaDB Galera cluster can easily become tedious if its being setup on a Linux system with SELinux enabled. 140805 16:16:20 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '' 140805 16:16:20 [ERROR] WSREP: Permission denied 140805 16:16:20 [ERROR] WSREP: failed to open gcomm backend connection: 13: error while trying to listen 'tcp://0.0.0.0:4567?socket.non_blocking=1', asio error 'Permission denied': 13 (Permission denied) at gcomm/src/asio_tcp.cpp:listen():814 140805 16:16:20 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -13 (Permission denied) 140805 16:16:20 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'my_wsrep_cluster' at 'gcomm://': -13 (Permission denied) 140805 16:16:20 [ERROR] WSREP: gcs connect failed: Permission denied 140805 16:16:20 [ERROR] WSREP: wsrep::connect() failed: 7 140805 16:16:20 [ERROR] Aborting In a test environment one can disable/enable SELinux for mysqld using the following commands (Thanks to Daniel Black for the tip!) :$ sudo semanage permissive -a mysqld_t
$sudo semanage permissive -d mysqld_t Disabling SELinux for mysql works, but its certainly not the best solution. So, I tried to configure SELinux for a 2-node MariaDB Galera cluster on CentOS by using some tools provided to manage SELinux policies. The basic idea is to let the MariaDB Galera nodes run under permissive mode in order to get all possible operations (which SELinux would have otherwise denied) logged into the audit log and then create a policy module using allow2audit tool after carefully analyzing the “denials”. The resulting module can then be installed before enabling (enforcing) SELinux for mysqld again. Prepare the hosts * Install MariaDB Galera server packages ($ sudo yum install MariaDB-Galera-server)
* Setup MariaDB configuration options ($sudo vi /etc/my.cnf.d/server.cnf) * Install SELinux policy management tools ($ sudo yum install policycoreutils-python)
* Firewall settings (see resources below)

Generate the policy module
* Disable SELinux for mysqld on both the hosts. With SELinux in permissive mode, it logs all the denial operations as warnings instead of enforcing them.

$sudo semanage permissive -a mysqld_t • Once mysqld on both the hosts are in permissive mode, our goal is to trigger all sorts of events that can happen on a node in a MariaDB Galera cluster (the more extensive, the better!) like, starting the node as donor/joiner with different snapshot state transfer (SST) methods and incremental state transfer (IST). The idea is to let all possible denials get logged into the audit log, which we later use to generate the policy module. • Carefully analyze (sealert Messages) all the “denials” logged in the audit log (/var/log/audit/audit.log). If the denials are expected, create a local policy module using allow2audit.$ sudo grep mysql audit.log | audit2allow -M mariadb-galera
• Install the policy module.
$sudo semodule -i mariadb-galera.pp • Put mysqld back to enforcing mode.$ sudo semanage permissive -d mysqld_t

Resources