A note on Galera versioning

The knowledge of Galera versioning is of significance when you are working with MariaDB Galera cluster. Galera libraries are versioned with the following format : xx.yy.zz (e.g. 25.3.2), where :

xx : WSREP Interface Version
yy : Galera library major version
zz : Galera library minor version

While the last two numbers (yy.zz) describe the set of features/bug fixes, the first one (xx) gives the WSREP interface version of the Galera library.

A MariaDB Galera cluster node consists of two main components :

  1. MariaDB Galera server
  2. Galera library

The Galera library (aka the WSREP provider) plugs into MariaDB Galera server to provide support for writeset replication. Now, in order to plugin, the WSREP interface version of galera library must match with that provided by MariaDB Galera Server. The WSREP interface version of MariaDB Galera Server can be found using version_comment system variable.

MariaDB [test]> select @@version_comment;
+---------------------------------------+
| @@version_comment                     |
+---------------------------------------+
| Source distribution, wsrep_25.9.r3961 |
+---------------------------------------+

So, what happens when there is a mismatch of WSREP interface version between MariaDB Galera Server and Galera library? The Galera plugin would fail to load :

140311 14:14:02 [Note] WSREP: Read nil XID from storage engines, skipping position init
140311 14:14:02 [Note] WSREP: wsrep_load(): loading provider library '/home/packages/galera-23.2.7-src/libgalera_smm.so'
140311 14:14:03 [ERROR] WSREP: provider interface version mismatch: need '25', found '23'
140311 14:14:03 [ERROR] WSREP: wsrep_load(): interface version mismatch: my version 25, provider version 23
140311 14:14:03 [ERROR] WSREP: wsrep_load(/home/packages/galera-23.2.7-src/libgalera_smm.so) failed: Invalid argument (22). Reverting to no provider.
140311 14:14:03 [Note] WSREP: Read nil XID from storage engines, skipping position init
140311 14:14:03 [Note] WSREP: wsrep_load(): loading provider library 'none'
140311 14:14:03 [ERROR] Aborting

Lastly, it is interesting to note that a Galera library can be loaded into the MariaDB Galera Server regardless of its version (2.XX or 3.XX) as long as their WSREP interface versions are same.

Galera: Connection timed out

When a node with Galera 3 (25.3.xx) wsrep provider tries to join an existing cluster consisting of Galera 2 (25.2.xx) nodes, it might fail to join with the following error :

140228 10:46:28 [Note] WSREP: view((empty))
140228 10:46:28 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():141
140228 10:46:28 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():196: Failed to open backend connection: -110 (Connection timed out)
140228 10:46:28 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'my_wsrep_cluster' at 'gcomm://127.0.0.1:4567': -110 (Connection timed out)
140228 10:46:28 [ERROR] WSREP: gcs connect failed: Connection timed out
140228 10:46:28 [ERROR] WSREP: wsrep::connect() failed: 7
140228 10:46:28 [ERROR] Aborting

This happens because, by default, Galera-2 and Galera-3 uses different checksum algorithms on network packets. While, for Galera-2 its plain CRC32 (socket.checksum=1), Galera-3 uses CRC32-C hw-accelerated (socket.checksum=2). Checksum algorithms can be controlled by socket.checksum galera parameter.

So, the solution to this problem would be to start the Galera-3 node with wsrep_provider_options=’socket.checksum=1′ option to make sure it uses the same checksum algorithm which other nodes of the cluster (Galera-2) are using.

References: http://www.codership.com/wiki/doku.php?id=galera_parameters

Auto increments in Galera cluster

Lets start by considering a scenario where records are being inserted in a single auto-increment table via different nodes of a multi-master cluster. One issue that might arise is ‘collision’ of generated auto-increment values on different nodes, which is precisely the subject of this article.

As the cluster is multi-master, it allows writes on all master nodes. As a result of which a table might get same auto-incremented values on different nodes on INSERTs. This issue is discovered only after the writeset is replicated and that’s a problem!

Galera cluster suffers with the similar problem.

Lets try to emulate this on a 2-node Galera cluster :

1) On node #1:

MariaDB [test]> CREATE TABLE t1(c1 INT AUTO_INCREMENT PRIMARY KEY, c2 INT)ENGINE=InnoDB;
Query OK, 0 rows affected (0.07 sec)

MariaDB [test]> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSERT INTO t1(c2) VALUES (1);
Query OK, 1 row affected (0.05 sec)

2) On node #2:

MariaDB [test]> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSERT INTO t1(c2) VALUES(2);
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> COMMIT;
Query OK, 0 rows affected (0.05 sec)

3) On node #1

MariaDB [test]> COMMIT;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

MariaDB [test]> SELECT * FROM t1;
+----+------+
| c1 | c2   |
+----+------+
|  1 |    2 |
+----+------+
1 row in set (0.00 sec)

As expected, the second commit could not succeed because of the collision.

So, how do we handle this issue? Enter @@auto_increment_increment and @@auto_increment_offset! Using these two system variables one can control the sequence of auto-generated values on a MySQL/MariaDB server. The trick is to set them in such a way that every node in the cluster generates a sequence of non-colliding numbers.

For instance, lets discuss this for a 3-node cluster (n=3):
Node 1: @@auto_increment_increment=3, @@auto_increment_offset=1 => Sequence : 1, 4, 7, 10, ...
Node 2: @@auto_increment_increment=3, @@auto_increment_offset=2 => Sequence : 2, 5, 8, 11, ...
Node 3: @@auto_increment_increment=3, @@auto_increment_offset=3 => Sequence : 3, 6, 9, 12, ...

As you can see, by setting each node’s auto_increment_increment to the total number of nodes (n) in the cluster and auto_increment_offset to a number between [1,n], we can assure that auto-increment values, thus generated, would be unique across the cluster, thus, would avoid any conflict or collision.

In Galera cluster this is already taken care of by default. As and when a node joins the cluster, the two auto-increment variables are adjusted automatically to avoid collision. However, this capability can be controlled by using wsrep_auto_increment_control variable.

Node #1:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 1     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

Node #2:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 2     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

Node #3:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 3     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

With this setting the last COMMIT in the above example would succeed.

Oracle VM Virtual Box : Installing guest additions

Environment:

  • Host : Windows 8
  • Guest : Ubuntu-12.04
  • Oracle VM Virtual Box & Extension pack v.4.3.4 (available here)

Instructions:

  1. Install Ubuntu as guest OS and boot into it.
  2. Install necessary packages (required for building VirtualBox Guest Additions kernel modules).

    • build-essential
    • dkms
    • linux-headers-generic
    $ sudo apt-get install build-essential dkms linux-headers-generic
    
  3. Install Guest Additions:
    Guest OS Menu -> Devices -> Install Guest Additions
  4. Open a terminal in the guest OS, cd into the mounted VBOXADDITIONS_4.3.4_XXXXX CD drive and run VBoxLinuxAdditions.run script as root.

    $ cd /media/VBOXADDITIONS_4.3.4_XXXXX
    $ sudo ./VBoxLinuxAdditions.run
    
  5. Reboot!
  6. Optionally, bidirectional copying can be enabled by checking “Bidirectional” under Devices -> Drag’n'Drop.

References:
(i) https://forums.virtualbox.org/viewtopic.php?f=3&t=15679

What is Galera Arbitrator?

Galera Arbitrator (garbd) is a stateless daemon that can act like a node in a Galera cluster. It is normally used to avoid split-brain situation which mostly occurs because of hardware/link failure, as a result of which the cluster gets divided into two parts and each part remains operational thinking they are in majority (primary component). This may lead to inconsistent data sets. Garbd should be installed on a separate machine. However, it can share the machine running load-balancer. It is interesting to note that as garbd joins the cluster, it makes a request for SST.

Let us now try to add galera arbitrator (garbd) to an existing 2-node MariaDB Galera cluster. Check out this post for steps to setup a MariaDB Galera cluster. We start by connecting to one of the nodes to get the size and name of the cluster.

MariaDB [(none)]> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 2     |
+--------------------+-------+
1 row in set (0.00 sec)

MariaDB [(none)]> show variables like 'wsrep_cluster_name';
+--------------------+------------------+
| Variable_name      | Value            |
+--------------------+------------------+
| wsrep_cluster_name | my_wsrep_cluster |
+--------------------+------------------+
1 row in set (0.00 sec)

Now, lets start the garbd as a daemon:

$ cd galera-23.2.7-src/garb

$ ./garbd
    --address='gcomm://127.0.0.1:4567'
    --options='gmcast.listen_addr=tcp://127.0.0.1:4569'
    --group='my_wsrep_cluster'
    --log=garbd.err
    --daemon

That’s it, garbd should now be running and connected to the cluster. Let’s verify this by checking wsrep_cluster_size again.

MariaDB [test]> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.00 sec)

Great! So, the new garbd is now part of the cluster.

Setting up MariaDB Galera Cluster on Ubuntu

MariaDB Galera Cluster is a multi-master synchronous replication system. In this article, I would be setting up a 3-node cluster on a single machine running Ubuntu. However, in a production scenario it is advised to run each cluster node on a separate box in a WAN.

Requirements

  1. MariaDB Galera Cluster
    • Download it from the official site : https://downloads.mariadb.org/mariadb-galera/, OR
    • Install it using Advanced packaging tool (APT), steps can be found in erkules’ blog post. OR
    • Build it from source : lp:~maria-captains/maria/maria-5.5-galera
      Note : Build would require additional cmake options : WITH_WSREP=ON and WITH_INNODB_DISALLOW_WRITES=1
  2. Galera wsrep provider (libgalera_smm.so)
  3. Some extra Ubuntu packages (in case you choose to build Galera from source!)
    • scons (Build utility, replacement for make)
    • check (Unit test framework for C)
    • libboost-dev
    • libboost-program-options-dev
    • libboost-system-dev (for 23.2.7)
    • libssl-dev

Setup

Now that we have all requirements in place, lets bring up the cluster nodes.

  1. Node#1 : Start 1st node (at port 4001 for instance) with empty cluster address (–wsrep_cluster_address=’gcomm://’).
     $ mysqld
        --no-defaults 
        --basedir=.
        --datadir=./data
        --port=4001
        --socket=/tmp/mysql_4001.sock
        --binlog_format=ROW
        --wsrep_provider=/path-to-galera-provider/libgalera_smm.so
        --wsrep_cluster_address='gcomm://'

    There are certain points to note before we proceed :

    • wsrep_provider option points to the galera wsrep provider, i.e. libgalera_smm.so library.
    • wsrep_cluster_address contains the address of existing cluster members. But in this case, as it is the very first node, the address must be empty.

      Important! In case you are planning to put these options in a config file (my.cnf) – once the cluster is up and running, make sure to change this option to hold a valid address. Failure to do so would disable the node’s capability to auto-join the cluster in case it restarts after a shutdown or crash.

    • From the server’s error log make a note of the base_host & base_port (default 4567) of wsrep. This information would be required to start the subsequent nodes.
  2. Node#2 : Start 2nd node at a different port (4002).
     $ mysqld
        --no-defaults
        --basedir=.
        --datadir=./data
        --port=4002
        --socket=/tmp/mysql_4002.sock
        --binlog_format=ROW
        --wsrep_provider=/path-to-galera-provider/libgalera_smm.so
        --wsrep_cluster_address='gcomm://127.0.0.1:4567'
        --wsrep_provider_options='gmcast.listen_addr=tcp://127.0.0.1:4568'

    Here, we have to specify the address of 1st node via wsrep_cluser_address option. It consists of base_host & base_port of the 1st node that we noted earlier in step 1 (i.e. gcomm://127.0.0.1:4567). Also, as we are starting this 2nd node on the same machine, in order to avoid port conflict, we must provide a different port for wsrep provider to listen to via gmcast.listen_addr wsrep provider option (tcp://127.0.0.1:4568).

  3. Node#3 : As with 2nd node, 3rd (and all subsequent nodes) can be started by same set of options with appropriate selection of available ports.
     $ mysqld
        --no-defaults
        --basedir=.
        --datadir=./data
        --port=4003
        --socket=/tmp/mysql_4003.sock
        --binlog_format=ROW
        --wsrep_provider=/path-to-galera-provider/libgalera_smm.so
        --wsrep_cluster_address='gcomm://127.0.0.1:4567'
        --wsrep_provider_options='gmcast.listen_addr=tcp://127.0.0.1:4569'

The cluster should now be up and running with 3 nodes. This can easily be verified and monitored further by inspecting server’s status & system variables :

MariaDB [(none)]> show status like 'wsrep_cluster_size';
+--------------------+-------+
| Variable_name      | Value |
+--------------------+-------+
| wsrep_cluster_size | 3     |
+--------------------+-------+
1 row in set (0.06 sec)

MariaDB [(none)]> show variables like 'wsrep%';
...
MariaDB [(none)]> show status like 'wsrep%';
...

Before I close, let me list out some important options which were omitted from this article for brevity :

  • default_storage_engine=INNODB
  • innodb_autoinc_lock_mode=2
  • innodb_locks_unsafe_for_binlog=1
  • wsrep_sst_auth=”user:pass” : to be used by SST (Snapshot state transfer) script

That’s all for now!

Building Boost library from source on Ubuntu

On Ubuntu, building the Boost library from source can be tricky at times. So, I thought of putting an article together to help someone trying the figure out the same. So, on your Ubuntu machine once you have downloaded the Boost source files (here), you can follow the following steps to build it (I have picked boost_1_54_0.tar.gz for demonstration) :

  1. Unzip and change into the boost source directory.
    tar -xf boost_1_54_0.tar.gz && cd boost_1_54_0
  2. Run bootstrap.sh, this will generate the necessary config files and “b2″ (the build tool).
    ./bootstrap.sh
  3. Run b2 to build & install all (or desired) boost libraries (refer “./b2 –help” for explanation on individual options used):
    ./b2 --build-type=complete --layout=versioned --prefix=./install -q install

That’s all! The build (build-type=complete) would take a while to finish, so be patient. Also, the statndard Ubuntu installation might not have some headers required for the complete build. In my case, installing the following 2 packacges was sufficient :

  • libbz2-dev
  • python-dev

A more detailed set of instructions can be found here.

How to detect unclosed file handles in C/C++

Sometimes we forget to close file handles which are no longer required in the program. This might eventually eat up some useful resources like open file count, memory, etc. Valgrind does magic at checking a variety of such leakages. All you need is run the program under Valgrind with –track-fds option. Here is the output of a simple C program, executed under Valgrind, which opens a file handle but doesn’t close it.

$ gcc -Wall -g -o fopen fopen.c
$ valgrind --tool=memcheck --leak-check=full --track-fds=yes ./fopen
==2771== Memcheck, a memory error detector
==2771== Copyright (C) 2002-2011, and GNU GPL'd, by Julian Seward et al.
==2771== Using Valgrind-3.7.0 and LibVEX; rerun with -h for copyright info
==2771== Command: ./fopen
==2771== 
File opened successfully
==2771== 
==2771== FILE DESCRIPTORS: 4 open at exit.
==2771== Open file descriptor 3: /tmp/fl
==2771==    at 0x4125F73: __open_nocancel (syscall-template.S:82)
==2771==    by 0x40B9ACB: _IO_file_open (fileops.c:233)
==2771==    by 0x40B9C87: _IO_file_fopen@@GLIBC_2.1 (fileops.c:338)
==2771==    by 0x40ADD86: __fopen_internal (iofopen.c:93)
==2771==    by 0x80484F7: main (fopen.c:23)
==2771== 
==2771== Open file descriptor 2: /dev/pts/0
==2771==    
==2771== 
==2771== Open file descriptor 1: /dev/pts/0
==2771==    
==2771== 
==2771== Open file descriptor 0: /dev/pts/0
==2771==    
==2771== 
==2771== 
==2771== HEAP SUMMARY:
==2771==     in use at exit: 352 bytes in 1 blocks
==2771==   total heap usage: 1 allocs, 0 frees, 352 bytes allocated
==2771== 
==2771== LEAK SUMMARY:
==2771==    definitely lost: 0 bytes in 0 blocks
==2771==    indirectly lost: 0 bytes in 0 blocks
==2771==      possibly lost: 0 bytes in 0 blocks
==2771==    still reachable: 352 bytes in 1 blocks
==2771==         suppressed: 0 bytes in 0 blocks
==2771== Reachable blocks (those to which a pointer was found) are not shown.
==2771== To see them, rerun with: --leak-check=full --show-reachable=yes
==2771== 
==2771== For counts of detected and suppressed errors, rerun with: -v
==2771== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

As you can see, at the end Valgrind prints a list of all open file descriptors and stack trace showing where the file was opened.

Apache2: php file getting downloaded instead?

After having installed all the necessary packages on my fresh Ubuntu, while testing different configurations I came across this problem where the php file was being offered for download when tried to execute via browser! That is, whenever I tried to open a php script through browser, the file was getting downloaded, instead of being executed. If you are a LAMP-fan (like me), you might come across this problem in future or perhaps you are facing it now and have reached here.. ;) The problem is straight forward :

“the apache server is not able to execute the requested php script (even though the core Apache & PHP packages are installed) and hence offering it for download”

Fortunately, in my case, the resoluton turned out to be simple. While installing php+apache, I missed out the php module for apache, which basically enables apache to handle php scripts. So, I installed it, restarted the apache server and the problem just got resolved!

$ sudo apt-get install libapache2-mod-php5

$ sudo /etc/init.d/apache2 restart

$ find /etc/apache2/ | grep php
/etc/apache2/mods-enabled/php5.load
/etc/apache2/mods-enabled/php5.conf
/etc/apache2/mods-available/php5.load
/etc/apache2/mods-available/php5.conf

C++: Why prefer member initialization list over assignment?

The answer is simple – the latter could be a costly choice. I will try to explain this using a simple example. In the following program we have a class Bar that has a private data member of type Foo (another class). Now, lets say we want to create an object of type Bar initializing its data member with an object of type Foo, something like, Bar obj2(obj1); The following program defines the necessary functions to give an idea of what happens during the creation of such objects.

#include <iostream>

using namespace std;

class Foo {
public:
  Foo() { cout << "Foo: default ctor called." << endl; }
  Foo(const Foo &obj) { cout << "Foo: copy ctor called." << endl; }

  Foo & operator =(const Foo &obj)
  {
    cout << "Foo: overloaded assignment operator called." << endl;
    return *this;
  }

  ~Foo() { cout << "Foo: default dtor called." << endl; }
};

class Bar {
public:
  Bar() { cout << "Bar: default ctor called." << endl; }

  /* Parameterized constructor (without member initialization list) */
  Bar(Foo &foo)
  {
    cout << "Bar: ctor taking reference to Foo as param.." << endl;
    m_foo= foo;
  }

  ~Bar() { cout << "Bar: default dtor called." << endl; }

private:
  Foo m_foo;
};

int main()
{
  Foo obj1;
  Bar obj2(obj1);
  return 0;
}

Lets focus on Bar’s parameterized constructor & how it affects the output.

..snip..

  /* Parameterized constructor (without member initialization list) */
  Bar(Foo &foo)
  {
    cout << "Bar: ctor taking reference to Foo as param.." << endl;
    m_foo= foo;                             /* Use assignment operator */
  }

..snip..

Output:
Foo: default ctor called.
Foo: default ctor called.
Bar: ctor taking reference to Foo as param..
Foo: overloaded assignment operator called.

Bar: default dtor called.
Foo: default dtor called.
Foo: default dtor called.

As we can see, while Bar’s object is being constructed,

  1. Foo’s default constructor is first invoked to construct the Foo’s part of Bar, and then
  2. Foo’s assignment operator is invoked to initialize it with the object supplied as parameter to this constructor.

Said that, let us now see how this can be improved by using member initialization list in Bar’s parameterized constructor’s definition.

..snip..

  /* Parameterized constructor (with member initialization list) */
  Bar(Foo &foo) : m_foo(foo)
  {
    cout << "Bar: ctor taking reference to Foo as param.." << endl;
  }

..snip..

Output:
Foo: default ctor called.
Foo: copy ctor called.
Bar: ctor taking reference to Foo as param..

Bar: default dtor called.
Foo: default dtor called.
Foo: default dtor called.

Looking at the above output, we can see that during the construction of Bar’s object, Foo is being constructed & initialized directly by Foo’s copy constructor, instead of using the default constructor to create an uninitialized object followed by the assignment operator.

Hence using member initialization list, one can completely avoid the use for overloaded assignment operator and thus save some precious CPU clocks.