The state of MySQL forks: co-operating without co-operating
Giuseppe "The Data Charmer" Maxia recently posted his take on the MySQL forks. I had been pondering whether to do the same, and seeing that what I planned to write will nicely complement Giuseppe's article, I was inspired to follow him into the same topic. Note that last Spring I created a Map of MySQL forks in preparation for Monty's keynote at the MySQL user conference. So let's see how things have evolved. I'll look into MySQL ecosystem as a whole and the forks separately.
The post is long, but the key takeaway is that despite the challenges, the combined development seen in the MySQL ecosystem is probably stronger than ever, the current situation is hard for an outsider to grasp but manageable, and if a few more obstacles can be overcome, we are looking into a very bright future indeed. There are more than 100 engineers (how much more?) working full time on the mysql code base (including both developers, QA, build engineers...). This development effort is an order of magnitude higher than other open source databases I'm aware of, in particular PostgreSQL and Drizzle. Often the open source project with most momentum and mass will come out as the winner, no matter what challenges it may seemingly be facing, and this is the case with MySQL too.
- My personal prediction of Oracle taking over MySQL has been that Oracle would not outright kill it but would reposition it into a nice sandbox where it is no longer a threat to its main database product. These moves can be seen in changes like the slogan: Where MySQL AB used to say "#1 open source database" or "#1 online database", Oracle has downgraded it to World's most popular open source database for the Web, suggesting that MySQL is mainly a web database, and even there leaving open the door to a certain proprietary alternative. Practical consequences of this is that MySQL Account Managers can no longer go to enterprise IT customers to propose an open source database strategy to replace their existing Oracle installations - this was rather expected - but must mostly focus on the web segment where MySQL already is a leader anyway.
- Just like we've seen happening with most other communities of Sun projects, Oracle did try to have its way with the MySQL community too. The main target of attack was the annual MySQL user conference (and expo). Luckily for us, the conference was formally an O'Reilly conference and O'Reilly Media and Tim O'Reilly personally stepped up to keep the conference going in 2010 - with some help from Monty Program and Percona employees to fill the void left by MySQL AB's role (not to forget all the speakers, exhibitors and people such as Brian Aker organizing his traditional summit...). The 2010 conference was only marginally smaller than the 2009 with Sun, this can be counted as a victory. In 2011 Oracle is boycotting the conference and Oracle employees have not been able to submit a single talk, but by now we have enough people outside of Oracle that I have no doubts that the conference will again be a great success and fuel the community into another prosperous year of open source database development and adoption.
- Significant employee turnover. Of the 400 MySQL AB employees acquired into Sun less than 3 years ago, way more than half have now left and still several a month are resigning. This was also quite expected, I personally knew about 50 people wanting to leave Oracle, and the real number seems to be more like 150. (And add to that 50+ resigned or RIFfed during Sun era.) To emphasize the challenge Oracle is facing here, when last Summer there was a small storm in a tea pot with Oracle employees blogging that everything is fine, they love Oracle and nobody is leaving ... today 2 out of the 3 most vocal ones have also resigned. The problem is that when everyone else is leaving, it's not fun to stay as the last one. But the good news is that even in the face of such a challenge, the MySQL engineering team remains quite productive - this topic will be continued in the list of positives. The other positive part is that most (albeit nearly not all) of those resigning have found their place within the MySQL ecosystem - some like Anders Karlsson seem to be able to contribute more now than he could in his previously hectic Sales Engineer role. Yoshinori's HandlerSocket is in my opinion the greatest MySQL innovation since the addition of InnoDB - both developed outside of MySQL. Update: HandlerSocket was not developed by Yoshinori Matsunobu, though it was first announced on his blog, the development was done by Akira Higuchi.
- There are constant rumors from sources at Oracle that developers are being moved to develop closed source features. This is not yet reflected in MySQL 5.5, which is pretty much the fruit of work done at Sun, so it remains to be seen in the next couple of years what the reality behind these rumors is. Anyway, it is fair to note that going increasingly closed source was on MySQL AB's agenda anyway, the only negative here is that Oracle has more balls to go through with such plans than Sun had.
- MySQL 5.1 already was a better release than those before it, you didn't have to wait a year after GA before the bugs were really ironed out. In 5.5 also the gap between major releases is decreasing, which is very welcome.
- The biggest miss of 5.1 was the total lack of focus on performance. There was no performance enhancement over 5.0 and MySQL was lagging behind PostgreSQL in this respect. Since then a lot has changed. In 5.5 scalability is the main enhancement. Despite MySQL's poor management of its community, the "First Aid" in this respect actually came from Google and Percona work, without which MySQL would have been in big trouble. The upcoming 5.5 release also contains original work from the MySQL and InnoDB teams themselves. (It is notable that Percona is yet to publish a comparative benchmark of MySQL 5.5 and Percona Server and MariaDB openly admits that they don't focus on competing on multi-core scalability with the MySQL team, since they can include that work thanks to the GPL anyway.)
- The MySQL engineering team, despite the high number of resignations, seems to be relatively productive. I personally think Tomas Ulin as the VP of engineering is the best MySQL has ever seen. Processes related to automated QA, reviews, cleaning up of compiler warnings, the "train model" of producing releases mean that paradoxically MySQL engineering seems to be more productive now, despite the hit it has taken in headcount and experience.
- The improvements into scalability seem to remain GPL in the future too and will thus benefit all MySQL users, regardless of what happens to other features.
Summary: Oracle's MySQL is in pretty good shape, considering the circumstances. Chances are the community will be happier with the 5.5 release than anything we've seen for a very long time - focus is on scalability, finally! The community has been able to patch over the problems that are due to Oracle: keeping the MySQL conference alive, retaining the talent within the ecosystem and addressing enterprise customers with new 3rd party support providers, not the least of which is SkySQL.
Fork 1: Percona Server with XtraDB
In my map of MySQL forks last spring I predicted that XtraDB and the Percona patches would fold into MariaDB. They are in MariaDB, but Percona also productized their own fork of MySQL 5.1. The approach of Percona is to provide "boutique" performance enhancements to a MySQL 5.1 base. They are very focused on customer needs - most enhancements are sponsored by the customers directly.
- Percona's work has a track record of a few years now and is running in production at many of their customers. (I wonder if they have any estimate of usage by non-customers, would be really interesting to know.) To continue with such a product is a wise and welcome move.
- Focused on performance: More performance or at least better monitoring facilities.
- As the oldest 3rd party MySQL support provider, especially with a proven capability to provide also product maintenance (fix bugs and develop new features), Percona's existence more than any other company has provide relief to anybody who has had doubts over Oracle acquiring MySQL.
- XtraBackup is the only open source tool to provide online backups of a large MySQL database.
- I agree with Giuseppe, that Percona Server is not a sustainable product on its own, but depend on the work coming from MySQL to evolve. I think this question is relevant, not because Oracle is likely to kill MySQL any time soon, but it is still a yardstick of credibility and viability whether one is dependent on ones major competitor. On the other hand, if anything were to happen to MySQL, Percona could use MariaDB as a base instead and continue with their model unaffected. Hence, Percona Server is not viable as an independent product alone, but not dependent on Oracle either.
- Very focused on Percona customers and their demands is a good thing, but also means the rest of the world is served less well. A stray 3rd party patch addressing something else than web scale is unlikely to ever get attention of the Percona engineers.
- While no comparative benchmark has been published, it seems obvious that MySQL 5.5 is more scalable than Percona Server based on the 5.1 series. This is not a big problem, it will not be many weeks of work for Percona to re-base it's product on top of 5.5 instead.
Fork 2: MariaDB
Run by the creator of MySQL and an impressive group of senior MySQL Architects and Engineers, MariaDB 5.2 is based on MySQL 5.1 and pulls together a lot of 3rd party work such as XtraDB, PBXT, Sphinx, OQGRAPH engines and also new features. The core team itself is focused on finishing sub-query optimizations from the dropped MySQL 6.0 release - due in a later release - and some other very relevant enhancements such as fixing the broken group commit and a new pluggable replication API.
- Focus on integrating lot of exciting work from the MySQL ecosystem that was previously left without attention, both features and a significant MyISAM performance improvement.
- I disagree with Giuseppe here: I think MariaDB is a viable fork and would survive in the hypothetical scenario that Oracle pulled the plug on MySQL. This takes into account the friendly co-operation from the XtraDB team at Percona, but still. Obviously, MariaDB is not today fixing bugs at the rate of the much larger MySQL team, but could grow into it if needed.
- When new code from MySQL releases i merged, the MariaDB team at least partially reviews them. On occasion they have been able to exclude patches that introduced a new bug. On the other hand, the same is true the other way too: MySQL never included the pool-of-threads patch that is now in MariaDB 5.2, as it is said to have scalability issues.
- Lot's of new features.
- My earlier criticism of some decisions aside, this is still the most open and community oriented alternative of these three.
- Does not yet have a lot of adoption, particularly only a few production / paying customer installations so far.
- Bad luck: Own release schedule matches poorly with MySQL 5.5, it will probably be a long time before we see a stable MariaDB release based on MySQL 5.5.
- Very much a single-vendor project, with only a few active outside contributors. Most of the "community contributions" in MariaDB 5.2 were originally contributed to MySQL (who neglected them) and then "pulled" into MariaDB by Monty Program, they do not yet constitute an active developer community around MariaDB itself (but they are proof of the potential inherent in the MySQL community).
Out of the other forks mentioned in my map from last Spring:
- OurDelta has now folded activity and the developers contribute to MariaDB. In fact they are probably among the most active contributors outside of Monty Program.
- XAMPP seems to continue as before.
- Many storage engines found their way into MariaDB, which makes it easier for users to try them. Some still remain, in particular Spider engine is not yet there. MariaDB also did good work in integrating some "Abandoned patches", yet many remain.
- Drizzle is not a backward compatible fork so it is not discussed here, but I might post something about Drizzle and PostgreSQL separately.
MySQL ecosystem as a whole
The most important message is that the MySQL ecosystem as a whole is surviving, and despite all the challenges at the moment, looks like it is perhaps stronger and more productive than before.
It is a bit worrying, and definitively confusing to many, that the ecosystem is still disintegrating rather than unifying. It is difficult to follow which fork has which features. The situation is a bit like ten years ago, when Linux distributions would come and go, and users would be asking which one to try. The answer is the same here: if you want a specific feature, choose that fork, otherwise try whichever you want. In practice, most MySQL users still remain on MySQL 5.1 and are likely to upgrade to MySQL 5.5. Uptake of the forks will be slow, but their existence still serve as an important safeguard.
Also the "Abandoned patches" practice continues with a Facebook patch, Tivo patch, HandlerSocket, Anders' SQL statement monitor plugin and more turning up in their own source code repositories, often not even using Launchpad to publish them. However, with both MySQL and MariaDB now paying more attention to this work, and Percona to the extent it serves their focus (web customers), the patches are perhaps less abandoned than a year ago.
- The MySQL ecosystem as a whole is today more productive than ever before. (It is just difficult to see it.) It seems plausible that scalability/performance of MySQL 5.5 is better than any other open source alternative (Drizzle, PostgreSQL...). Mark Callaghan recently compared MySQL against MongoDB with MySQL coming out as a winner in all tests. And then we have things like HandlerSocket (NoSQL for MySQL -> 7x better performance) in the queue. All of this just proves that often the open source project with most momentum and mass will come out as the winner, no matter what challenges it may seemingly be facing.
- To put it another way: I don't have any exact number, but there are in any case way more than 100 engineers working full time on the mysql code base (including both developers, QA, build engineers...). This development effort is an order of magnitude higher than other open source databases I'm aware of, in particular PostgreSQL and Drizzle.
- While it is unfortunate that different actors have not been able to come together and focus their work on a unified project, in practice code seems to flow rather painlessly (considering the cisrumstances) between the main forks. This is much thanks to using distributed version control.
- Same issue can be repeated for things like IRC channels and mailing lists: it would probably be better if everyone could work on one communication channel. On the other hand, in MySQL AB times almost all discussion was company internal, so the current situation is still better than what was before.
- Despite the strained relations between the official Oracle and many of those on the outside, the developers have good contact with each other, and will help each other out with answering questions and sharing ideas. For instance, while Monty Program developers don't generally assign copyright to Oracle, they have done so on occasion to have bugs fixed in areas of the code they felt responsible for.
Negatives / challenges yet to overcome:
- The inability of the forks to unify into a single community plays into MySQL's pocket. When we want to talk about the ecosystem as a whole, it will be known as the "MySQL ecosystem", since no alternative community exists or is in sight. As the owner of MySQL (and the mysql.com domain) Oracle is the main benefactor of this state.
- Once forks get more usage, they will increasingly suffer from the lack of a free MySQL manual. Writing one from scratch will be a lot of work. I hope to contribute to that effort one day.
- Duplicate work: MariaDB and Percona publish their original work as GPL, but Oracle wants copyright assignment for anything going into MySQL. For instance, a lot of duplicate work has gone into cleaning up compiler warnings from the code: done first by Drizzle, then MariaDB, then again separately by MySQL.
- While we seem to be doing ok for now, it cannot be healthy in the long term to just work from separate trees. For instance the new replication API and group commit developed at MariaDB is a significant change, there is no way Oracle will be able to independently reverse engineer a compatible API. Sooner or later, such incompatibilities in key areas of the server will lead to divergence.
- Despite both MySQL and MariaDB now paying more attention to new patches coming from the wider ecosystem, it seems they are still struggling to keep up and at any given time there are a handful of interesting plugins or patches that one needs to download separately. (This is another topic I intend to come back to at a later date.)
- The year and all the "restructuring" has taken its toll on the community though. It seems to me that for instance posts on Planet MySQL are down almost 50% from what they used to be. Interestingly this is also the case for the independent consultants that used to top the list of most frequent bloggers, so its not just that there are less bloggers, also the active ones blog less.
The picture of the MySQL ecosystem drawn here is one where vendor interests dominate, and to match that also contributors that are not vendor publish their new code in isolation from others. Despite this, there is significant progress going on, and eventually new code "diffuses" itself into all the different forks/distributions.
We are well familiar with governance models for open source like the Benevolent Dictator model (Linus and others) or the Apache Model (core team with +1/-1 votes). The MySQL ecosystem is none of these, yet it seems to be quite productive anyway. I've been thinking I will start calling this mode of development the "Federalist model" of open source development, where different actors co-operate without co-operating, or rather, exchange and merge patches in a peer-to-peer fashion, communicating with each other but without seeing any one party as a leader of the others.
Thanks to bzr and the GPL, this seems to work surprisingly well, as MySQL has never seen so much interesting progress as it does today.