Observations on Drizzle and PostgreSQL (followup on state of MySQL forks)
My recent account of The State of MySQL forks seems to have gotten quite a lot of attention. I promised to follow up with a separate piece about Drizzle and also PostgreSQL, as the other major open source database, so I'd better keep that promise now.
I should say that I know much less about these 2 database projects than I know about the MySQL compatible variants which I have been working with personally. This post isn't nearly as detailed as it's parent, consider it an epilogue if you will, and feel free to correct any errors or omissions (or disagreements) in the comments below. Also the perspective is that I'm comparing these to the state of MySQL (and each other), even if both Drizzle and PostgreSQL probably would want to be reviewed on their own merits.
Update Sun 2010-12-19: LinuxJedi was kind enough to provide additional details on Drizzle and the article has been updated with that information.
Drizzle was forked by Brian Aker in 2008, when MySQL had joined Sun. Sun sponsored a core developer team working outside the MySQL chain of command. The mission for Drizzle was and is:
- A database optimized for Cloud infrastructure and Web applications
- Design for massive concurrency on modern multi-cpu architecture
- Optimize memory for increased performance and parallelism
- Open source, open community, open design
In addition to the above, I would like to add the significant work gone into re-factoring and modularizing the MySQL code base, making it more object oriented, removing non-core features completely, and replacing internal code with generic libraries (like Boost, Google protobuf) instead. It is commonly agreed that this work has made it easier for new developers to join the project.
- Community development they way it should be. In 2008 Drizzle immediately showed a vibrant community on mailing lists, IRC and yes, even code contributions! This highlighted the poor job MySQL had been doing in this area and unleashed the untapped potential in the MySQL community.
- While Rackspace is the biggest contributor, employing a team of 8 engineers, it is clear that even today Drizzle is the only MySQL fork with a vibrant and diverse community one is accustomed to seeing in open source projects. Brian recently posted some commit statistics on this topic concluding that Drizzle has had 96 contributors to date. (But afaik less than 10 are full time working on Drizzle.) Drizzle has also been very successful with their Google Summer of Code students, etc.
- Rackspace backing gives it credibility even if Sun is now gone. If Drizzle is to succeed, getting a foothold in the hosting, then cloud, then web market is the likely path. Especially the multi-tenant feature targets the hosting market.
- Has now finally reached Beta, good time to try it out.
Negatives (or things I'd need to know more about):
There are in particular 4 things I miss from Drizzle:
- Will we have a release before the window of opportunity passes? While engineers often say "we should rewrite everything from scratch", it is not always a good idea to let them do that! KDE4 is an example of things gone wrong. On the other hand Firefox is an example of things gone right. In this case, MySQL (with all the forks) has been able to catch up in performance, which was the major problem in 2008, so there is a risk of Drizzle becoming a MySQL fork with a cleaner code base, but no better performance and less features.
- Benchmarks. The goal was to scale well on multi-core machines with lots of RAM. But now MySQL does that too. I haven't seen a single benchmark that would actually tell me about how Drizzle performs.
- Degree of compatibility with MySQL. Of course, we know that Drizzle is still a somewhat compatible fork, for instance the MySQL client libraries can connect to a Drizzle server. But can I run Wordpress, Drupal or my legacy PHP app on Drizzle? How much work is it to port it? I know Ronald Bradford was looking into this, but I haven't seen results. Is that a bad sign, or were you just busy with customer work?
- User adoption. Having one user story will encourage the second user to try it too. It's beta, now is the time to hear about people trying it out, then run it in production.
There is one more negative thing with Drizzle in that there could be more collaboration and cross-pollination between Drizzle and the other MySQL forks. While the core code has diverged so much it can be considered separate, there's still untapped opportunity for collaboration in common areas such as InnoDB/XtraDB, the missing manual, XtraBackup, client connectors, the new Boots client, automated QA and builds... I should note that this isn't inherently a Drizzle negative, the finger is pointing as much (or more) at the other forks.
My conclusion: The Drizzle story is a perfect vision for where I'd want to upgrade my MySQL installations - the weaknesses it set out to correct in MySQL are exactly the right ones. However, before upgrading anything at all, I'd need to see answers to the questions in the above list, and I'm a bit skeptical. (Sorry guys, I'm cheering for you, I'm just skeptical.)
PostgreSQL is the other major open source database. If one was looking for an alternative to MySQL, it is probably the first challenger. It is truly community developed, with corporate sponsors having come and gone during more than 2 decades of development.
- Long history, stable community, no drama as we have in the MySQL world right now.
- Track record of new release every 12 months, with new interesting features every year.
- Lists 70+ active and 44 inactive contributors. Unlike in the MySQL world, most of these are only part-time developing PostgreSQL code on the side of a consulting job or other job, or in the case of EnterpriseDB staff, on the side of also developing closed source EnterpriseDB addons.
- Is most certainly being run in production, for critical workloads, even has a respectable market share for web use. (Skype is a known PostgreSQL user.) Consulting and support is available, though this topic appears also in the negatives.
- Nowadays also supports Windows, in addition to Unix and Linux.
- Marketed as the "most advanced open source database" as it used to have more features (and more complex features) than MySQL. The architecture and features are sometimes seen to be clones of how Oracle does things, for better or worse. Even so, which features are important or advanced is somewhat subjective - this topic also re-appears in the list of negatives.
Trivia challenge: Which highly current and controversial person is a past PostgreSQL contributor? Answers in comments below.
- Despite being well known and respected in open source circles, PostgreSQL has surprisingly poor adoption in traditional enterprise usage. In my few years at selling MySQL, I only came across 2 companies using it rather than MySQL - both of these had in-house expertise to support it. (This doesn't count companies I know from public sources, such as Skype.) Given that PostgreSQL is commonly touted as more suitable for enterprise use, I was very surprised when I saw the study done by the EU Commission related to the Oracle Sun merger, that in Oracle accounts, PostgreSQL isn't even on the radar as a competitor in Oracle's CRM.1 (To compare: MySQL was equal or had passed Sybase, which conversely was better than I expected.)
- A likely explanation to weak adoption in the traditional enterprise is the lack of a well known 24/7 support provider. (Interestingly, even if Sun was also providing PostgreSQL support at a time, nobody seemed to know about that either :-) While it is certainly possible to buy support for PostgreSQL, most people simply do not know these companies, which are often quite small. A positive development is that EnterpriseDB is now somewhat developing into a globally known player in this field, including their new communications strategy that focuses more on the PostgreSQL brand than their own EnterpriseDB brand.
- Historically, PostgreSQL hasn't always provided the features needed for mass adoption. I personally consider the late addition of Windows support as a primary reason MySQL got more popular, since until recently almost all developers would use Windows. PostgreSQL wasn't that easy to use, and it's almost impossible to understand how Postgres users could live so long without replication! Today, all of these features are somewhat addressed, but they do imho explain the small market share to a high degree.
Conclusion: PostgreSQL is a viable alternative for an open source database, it has certainly proven its stability and credibility as an open source project. Even so, it's low market share means that a PostgreSQL DBA is harder to find, and even the companies providing consulting and support are not as widely known as, say, those in the MySQL sphere. As a function of lower market share, PostgreSQL also receives less total development man-hours than MySQL.
Personally I always like to favor whichever technology has the most mass (while still meeting other requirements too, of course, like being open source and such). I believe having a huge community and even brand recognition is a valuable asset for any open source project. For me this is the main reason I once ended up using MySQL and still do: "everyone else" was doing it too. (Until then I actually was a PostgreSQL user, a satisfied one even.)
Comparing Drizzle and PostgreSQL community activity
An interesting observation is that the size of developer communities is roughly the same order of magnitude, both having less than a hundred recently active developers. For PostgreSQL most developers are volunteers or part time working on PostgreSQL on the side of another job. Drizzle has a core team of 5-10 working full time or part time, whereas the rest are volunteers. (Both in contrast to all MySQL forks where the overwhelming majority are full time developers and there are more full time developers than total developers in Drizzle or PostgreSQL.)
Also the OHLOH graph confirms this:
OHLOH graph for nr of contributors (y-axis) per month (x-axis) for Drizzle (RED) and PostgreSQL (GREEN)
This goes against the perceived size of each community, where one would expect the PostgreSQL community to be clearly larger.
Note, my web host, HostGator, seems to insist on running some fancy firewall checks between the website and the database, and this results in some errors when comments are posted on articles like this. If this happens to you, please let me know at email@example.com and you may also email the comment itself if it is lost, so that I can post it here myself.
- 1. http://ec.europa.eu/competition/mergers/cases/decisions/m5529_20100121_20682_en.pdf
see page 69. The publicly available version has exact percentages obfuscated, however PostgreSQL isn't even listed, so we do know it is equal to 0%. (...as much as I hate to see that.)