HowTo use MySQL JDBC loadbalancer with Galera multi-master clusters

Some time ago I finally had the chance to test the built-in load balancing feature in MySQL's JDBC driver together with a 3 node Galera cluster. I have used this feature at a MySQL Cluster customer many years ago, so I knew it worked and I knew it was great, but I didn't know if it would work with Galera. Galera sometimes returns some error states that are different from what MySQL Cluster does and the main point of the test was to see how the loadbalancing in the JDBC driver reacts to that.

Unfortunately this really cool feature is still rather unknown to the world. Of course, until now the only multi-master cluster you could use it with was MySQL Cluster so the addressable market was much smaller than what is the case with Galera, which is based on InnoDB. I've previously written about this on an architectural level in "The Ultimate MySQL High Availability Solution" and how to actually use it at "How to use JDBC Connector with MySQL Cluster". Beyond those blog posts this feature isn't documented much at all, the needed JDBC connection string is mentioned in passing deep on this page in the MySQL manual, but you'd have to be a wizard to guess what it really means. It's like a secret club, almost!

Ok, so let's see if it works with Galera. The test case is to setup a 3 node cluster, kill one node and then let it join back. I used mysqldump SST for joining nodes. Using xtrabackup SST is non-blocking, so it would be less interesting for this test since the donor node will just work, whereas with mysqldump both the joiner and the donor will be blocked.

So let's create a simple Java test program:

import java.sql.*;

public class MysqlTest
{
    public static void main (String[] args)
    {
        Connection conn = null;

        while(true)
        {

            // Connect to MySQL
            try
            {
                String userName = "root";
                String password = "password";
                String hosts = "node125,node126,node127";
                String url = "jdbc:mysql:loadbalance://" + hosts + "/test";
                Class.forName ("com.mysql.jdbc.Driver").newInstance ();
                conn = DriverManager.getConnection (url, userName, password);
                System.out.println ("Database connection established");
            }
            catch (Exception e)
            {
                System.err.println ("Cannot connect to database server: " + e);
            }

            try
            {
                Statement s = conn.createStatement ();
                int count;
                count = s.executeUpdate("INSERT INTO jdbctests (t, h, d) VALUES ('Inserted from Java code.', @@hostname, now());");
                s.close ();
                System.out.println (count + " rows were inserted");
            }
            catch (SQLException e)
            {
                System.err.println ("Error when executing SQL statement: " + e);
            }

            // Close connection
            if (conn != null)
            {
                try
                {
                    conn.close ();
                    System.out.println ("Database connection terminated");
                }
                catch (Exception e)
                {
                    System.err.println( "Error when closing database connection: " + e );
                }
            }
            try { Thread.sleep(1000); } catch( Exception e ) { }
        }
    }
}

I run this with: java -cp mysql-connector-java-5.1.20-bin.jar:. MysqlTest

When all 3 nodes are up, this results in following entries in the table:

| 31 | Inserted from Java code. | node125 | 2012-05-22 10:36:45 |
| 34 | Inserted from Java code. | node125 | 2012-05-22 10:36:46 |
| 37 | Inserted from Java code. | node125 | 2012-05-22 10:36:47 |
| 39 | Inserted from Java code. | node126 | 2012-05-22 10:36:50 |
| 43 | Inserted from Java code. | node125 | 2012-05-22 10:36:50 |
| 44 | Inserted from Java code. | node127 | 2012-05-22 10:36:52 |
| 45 | Inserted from Java code. | node126 | 2012-05-22 10:36:53 |
| 49 | Inserted from Java code. | node125 | 2012-05-22 10:36:53 |
| 51 | Inserted from Java code. | node126 | 2012-05-22 10:36:55 |
| 55 | Inserted from Java code. | node125 | 2012-05-22 10:36:55 |
| 56 | Inserted from Java code. | node127 | 2012-05-22 10:36:57 |
| 57 | Inserted from Java code. | node126 | 2012-05-22 10:36:58 |

You can see that each query was randomly routed to a different MySQL node.

Ok, but can it survive failure? In the below, I'm stopping MySQL on node127.

| 151 | Inserted from Java code. | node125 | 2012-05-22 10:37:37 |
| 152 | Inserted from Java code. | node126 | 2012-05-22 10:37:40 |
| 155 | Inserted from Java code. | node125 | 2012-05-22 10:37:39 |
| 157 | Inserted from Java code. | node125 | 2012-05-22 10:37:40 |
| 159 | Inserted from Java code. | node125 | 2012-05-22 10:37:41 |
| 162 | Inserted from Java code. | node126 | 2012-05-22 10:37:44 |
| 165 | Inserted from Java code. | node126 | 2012-05-22 10:37:45 |

WIN!

However, when starting the node and re-joining the cluster, it actually didn't work. This is a bug in Galera, not the JDBC driver. I think this may have been introduced in 2.0 version of Galera when xtrabackup non-blocking SST became supported. It seems Galera now also behaves as if mysqldump was non-blocking. I'm pretty sure we've tested this prior to 2.0 and didn't see this. Anyway, you can read from bug 1002714 what happened.

The bug should be easy to fix. The problem is that Galera accepts an incoming query to the node that is in donor state and the query then hangs for the time of the SST. (This can be hours on a large database.) The JDBC driver is good at avoiding nodes that are not responding or that are returning errors, but in this case the node doesn't generate any error, so the JDBC driver is just stuck waiting. The fix of course is that a Galera node in donor state needs to return an error and not accept incoming queries. (They use the MySQL "Unknown error" for this purpose.) Otoh for xtrabackup SST of course incoming queries can be accepted. I sense a new configuration option wsrep_sst_method_is_nonblocking...

UPDATE: To force a full state snapshot (SST) I delete the /var/lib/mysql/grastate.dat file after shutting down the MySQL server. If you don't, then Galera will just do an incremental state transfer from the position that is saved in that file.

For IST the donor node is not blocked (since it is just a copy of galera logs, nothing from the database) so load balancing worked just fine.

I hope the Galera bug will be fixed soon and was very happy to see that the JDBC part of this setup worked perfectly. Of course, in practice we will want to use xtrabackup for non-blocking SST, so this isn't really an issue, but a bug nevertheless.

I have yet to test the PHP MySQLnd driver that has similar loadbalancing capability since last year. (Thanks to Ulf and the PHP connector team!)

Hi Henrik,

You're probably right, I also sense this option. However we'd like be more certain about it. Nowadays you can't be too careful with options: once it is added, it can't be taken back. And for one, 'wsrep_sst_method_is_nonblocking' sounds horribly long.

But donor node allowing client connections has always been a "feature" of Galera cluster - the reasoning behind that was that the user normally wants to control which node is the donor and will do necessary preparations. And for intermittent failures there is incremental state transfer which is non-blocking. (BTW, you seem to forgot to mention the removal of grastate.txt file: the curious reader may fail to reproduce your findings ;) )

Regards,
Alex

I don't see why I should care about which node is the donor? Unless of course I have one "reference node" that is not part of the load balancing to begin with. If I have three identical nodes in the same data center, there is no reason for me to manage which one Galera will pick for donor, and I'd like my load balancing to just work and not have to care (since it is possible to have that).

Of course, in practice we will start using xtrabackup for the sst, so it will be non-blocking and this won't be such a big issue after all. (I added this to the post now.)

Thanks for pointing out the grastate.dat, I added too.

I spotted your 2012 posts on this subject but they are too low level. People need to know about the connection string, that's all. They don't need to write their own plugins.

I wonder if I have missed your post in 2010, of course I was on paternity leave then but I've kept an eye on Planet MySQL even then :-) Anyway, I hope you get a simple howto on this into the manual soon, because people really deserve to know about this feature.

About the bookAbout this siteAcademicAccordAmazonAppleBeginnersBooksBuildBotBusiness modelsbzrCassandraCloudcloud computingclsCommunitycommunityleadershipsummitConsistencycoodiaryCopyrightCreative CommonscssDatabasesdataminingDatastaxDevOpsDistributed ConsensusDrizzleDrupalEconomyelectronEthicsEurovisionFacebookFrosconFunnyGaleraGISgithubGnomeGovernanceHandlerSocketHigh AvailabilityimpressionistimpressjsInkscapeInternetJavaScriptjsonKDEKubuntuLicensingLinuxMaidanMaker cultureMariaDBmarkdownMEAN stackMepSQLMicrosoftMobileMongoDBMontyProgramMusicMySQLMySQL ClusterNerdsNodeNoSQLNyrkiöodbaOpen ContentOpen SourceOpenSQLCampOracleOSConPAMPParkinsonPatentsPerconaperformancePersonalPhilosophyPHPPiratesPlanetDrupalPoliticsPostgreSQLPresalespresentationsPress releasesProgrammingRed HatReplicationSeveralninesSillySkySQLSolonStartupsSunSybaseSymbiansysbenchtalksTechnicalTechnologyThe making ofTransactionsTungstenTwitterUbuntuvolcanoWeb2.0WikipediaWork from HomexmlYouTube