Galera: Connection timed out scenarios

While joining an existing Galera cluster, if a node fails to “reach” any of the nodes in the cluster, it generally – aborts – with the following error :

140505 10:01:46 [Note] WSREP: gcomm: connecting to group 'my_wsrep_cluster', peer '127.0.0.1:4567'
140505 10:01:49 [Warning] WSREP: no nodes coming from prim view, prim not possible
140505 10:01:49 [Note] WSREP: view(view_id(NON_PRIM,cb63893e-d45d-11e3-9dee-0a8c1484c284,1) memb {
        cb63893e-d45d-11e3-9dee-0a8c1484c284,0
} joined {
} left {
} partitioned {
})
140505 10:01:49 [Warning] WSREP: last inactive check more than PT1.5S ago (PT3.51674S), skipping check
140505 10:02:19 [Note] WSREP: view((empty))
140505 10:02:19 [ERROR] WSREP: failed to open gcomm backend connection: 110: failed to reach primary view: 110 (Connection timed out)
         at gcomm/src/pc.cpp:connect():141
140505 10:02:19 [ERROR] WSREP: gcs/src/gcs_core.c:gcs_core_open():202: Failed to open backend connection: -110 (Connection timed out)
140505 10:02:19 [ERROR] WSREP: gcs/src/gcs.c:gcs_open():1291: Failed to open channel 'my_wsrep_cluster' at 'gcomm://127.0.0.1:4567': -110 (Connection timed out)
140505 10:02:19 [ERROR] WSREP: gcs connect failed: Connection timed out
140505 10:02:19 [ERROR] WSREP: wsrep::connect() failed: 7
140505 10:02:19 [ERROR] Aborting

Assuming that there are no network issues, this error may occur due to any of the following scenarios:

Scenario #1: A node tries join a non-existing cluster

This issue and the possible resolution has been discussed here.

Scenario #2: A node with Galera 3 (25.3.xx) wsrep provider tries to join an existing cluster consisting of Galera 2 (25.2.xx) nodes

This results in error because, by default, Galera-2 and Galera-3 uses different checksum algorithms on network packets. While, for Galera-2 its plain CRC32 (socket.checksum=1), Galera-3 uses CRC32-C hw-accelerated (socket.checksum=2). Checksum algorithms can be controlled by socket.checksum galera parameter.

So, the solution to this problem would be to start the Galera-3 node with wsrep_provider_options=’socket.checksum=1′ option to make sure it uses the same checksum algorithm which other nodes of the cluster (Galera-2) are using.

References: http://galeracluster.com/documentation-webpages/galeraparameters.html

Scenario #3: telnet: Unable to connect to remote host: No route to host

Solution: Configure firewall settings :

(a) Flush all the firewall rules: Quick, but advisable for test setups only.
$ sudo iptables -F ## Thanks to Qian Joe for the suggestion!

(b) Allow only specific hosts/ports: Its important to note that by flushing (-F), one would essentially delete all the firewall rules. This is undesirable. Only specific ports & hosts should be allowed instead. (see firewall settings)

EDIT: Do let me know (via comment) if the given resolutions did not work for you and I will be happy to add the failure scenario & resolution here.

Auto increments in Galera cluster

Lets start by considering a scenario where records are being inserted in a single auto-increment table via different nodes of a multi-master cluster. One issue that might arise is ‘collision’ of generated auto-increment values on different nodes, which is precisely the subject of this article.

As the cluster is multi-master, it allows writes on all master nodes. As a result of which a table might get same auto-incremented values on different nodes on INSERTs. This issue is discovered only after the writeset is replicated and that’s a problem!

Galera cluster suffers with the similar problem.

Lets try to emulate this on a 2-node Galera cluster :

1) On node #1:

MariaDB [test]> CREATE TABLE t1(c1 INT AUTO_INCREMENT PRIMARY KEY, c2 INT)ENGINE=InnoDB;
Query OK, 0 rows affected (0.07 sec)

MariaDB [test]> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSERT INTO t1(c2) VALUES (1);
Query OK, 1 row affected (0.05 sec)

2) On node #2:

MariaDB [test]> START TRANSACTION;
Query OK, 0 rows affected (0.00 sec)

MariaDB [test]> INSERT INTO t1(c2) VALUES(2);
Query OK, 1 row affected (0.00 sec)

MariaDB [test]> COMMIT;
Query OK, 0 rows affected (0.05 sec)

3) On node #1

MariaDB [test]> COMMIT;
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

MariaDB [test]> SELECT * FROM t1;
+----+------+
| c1 | c2   |
+----+------+
|  1 |    2 |
+----+------+
1 row in set (0.00 sec)

As expected, the second commit could not succeed because of the collision.

So, how do we handle this issue? Enter @@auto_increment_increment and @@auto_increment_offset! Using these two system variables one can control the sequence of auto-generated values on a MySQL/MariaDB server. The trick is to set them in such a way that every node in the cluster generates a sequence of non-colliding numbers.

For instance, lets discuss this for a 3-node cluster (n=3):
Node 1: @@auto_increment_increment=3, @@auto_increment_offset=1 => Sequence : 1, 4, 7, 10, ...
Node 2: @@auto_increment_increment=3, @@auto_increment_offset=2 => Sequence : 2, 5, 8, 11, ...
Node 3: @@auto_increment_increment=3, @@auto_increment_offset=3 => Sequence : 3, 6, 9, 12, ...

As you can see, by setting each node’s auto_increment_increment to the total number of nodes (n) in the cluster and auto_increment_offset to a number between [1,n], we can assure that auto-increment values, thus generated, would be unique across the cluster, thus, would avoid any conflict or collision.

In Galera cluster this is already taken care of by default. As and when a node joins the cluster, the two auto-increment variables are adjusted automatically to avoid collision. However, this capability can be controlled by using wsrep_auto_increment_control variable.

Node #1:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 1     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

Node #2:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 2     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

Node #3:

MariaDB [test]> show variables like '%auto_increment%';
+------------------------------+-------+
| Variable_name                | Value |
+------------------------------+-------+
| auto_increment_increment     | 3     |
| auto_increment_offset        | 3     |
| wsrep_auto_increment_control | ON    |
+------------------------------+-------+
3 rows in set (0.00 sec)

With this setting the last COMMIT in the above example would succeed.