My recent account of The State of MySQL forks seems to have gotten quite a lot of attention. I promised to follow up with a separate piece about Drizzle and also PostgreSQL, as the other major open source database, so I'd better keep that promise now.
I should say that I know much less about these 2 database projects than I know about the MySQL compatible variants which I have been working with personally. This post isn't nearly as detailed as it's parent, consider it an epilogue if you will, and feel free to correct any errors or omissions (or disagreements) in the comments below. Also the perspective is that I'm comparing these to the state of MySQL (and each other), even if both Drizzle and PostgreSQL probably would want to be reviewed on their own merits.
Update Sun 2010-12-19: LinuxJedi was kind enough to provide additional details on Drizzle and the article has been updated with that information.
Drizzle
Drizzle was forked by Brian Aker in 2008, when MySQL had joined Sun. Sun sponsored a core developer team working outside the MySQL chain of command. The mission for Drizzle was and is:
- A database optimized for Cloud infrastructure and Web applications
- Design for massive concurrency on modern multi-cpu architecture
- Optimize memory for increased performance and parallelism
- Open source, open community, open design
In addition to the above, I would like to add the significant work gone into re-factoring and modularizing the MySQL code base, making it more object oriented, removing non-core features completely, and replacing internal code with generic libraries (like Boost, Google protobuf) instead. It is commonly agreed that this work has made it easier for new developers to join the project.
Positives:
- Community development they way it should be. In 2008 Drizzle immediately showed a vibrant community on mailing lists, IRC and yes, even code contributions! This highlighted the poor job MySQL had been doing in this area and unleashed the untapped potential in the MySQL community.
- While Rackspace is the biggest contributor, employing a team of 8 engineers, it is clear that even today Drizzle is the only MySQL fork with a vibrant and diverse community one is accustomed to seeing in open source projects. Brian recently posted some commit statistics on this topic concluding that Drizzle has had 96 contributors to date. (But afaik less than 10 are full time working on Drizzle.) Drizzle has also been very successful with their Google Summer of Code students, etc.
- Rackspace backing gives it credibility even if Sun is now gone. If Drizzle is to succeed, getting a foothold in the hosting, then cloud, then web market is the likely path. Especially the multi-tenant feature targets the hosting market.
- Has now finally reached Beta, good time to try it out.
Negatives (or things I'd need to know more about):
There are in particular 4 things I miss from Drizzle:
- Will we have a release before the window of opportunity passes? While engineers often say "we should rewrite everything from scratch", it is not always a good idea to let them do that! KDE4 is an example of things gone wrong. On the other hand Firefox is an example of things gone right. In this case, MySQL (with all the forks) has been able to catch up in performance, which was the major problem in 2008, so there is a risk of Drizzle becoming a MySQL fork with a cleaner code base, but no better performance and less features.
- Benchmarks. The goal was to scale well on multi-core machines with lots of RAM. But now MySQL does that too. I haven't seen a single benchmark that would actually tell me about how Drizzle performs.
- Degree of compatibility with MySQL. Of course, we know that Drizzle is still a somewhat compatible fork, for instance the MySQL client libraries can connect to a Drizzle server. But can I run Wordpress, Drupal or my legacy PHP app on Drizzle? How much work is it to port it? I know Ronald Bradford was looking into this, but I haven't seen results. Is that a bad sign, or were you just busy with customer work?
- User adoption. Having one user story will encourage the second user to try it too. It's beta, now is the time to hear about people trying it out, then run it in production.
There is one more negative thing with Drizzle in that there could be more collaboration and cross-pollination between Drizzle and the other MySQL forks. While the core code has diverged so much it can be considered separate, there's still untapped opportunity for collaboration in common areas such as InnoDB/XtraDB, the missing manual, XtraBackup, client connectors, the new Boots client, automated QA and builds... I should note that this isn't inherently a Drizzle negative, the finger is pointing as much (or more) at the other forks.
My conclusion: The Drizzle story is a perfect vision for where I'd want to upgrade my MySQL installations - the weaknesses it set out to correct in MySQL are exactly the right ones. However, before upgrading anything at all, I'd need to see answers to the questions in the above list, and I'm a bit skeptical. (Sorry guys, I'm cheering for you, I'm just skeptical.)
PostgreSQL
PostgreSQL is the other major open source database. If one was looking for an alternative to MySQL, it is probably the first challenger. It is truly community developed, with corporate sponsors having come and gone during more than 2 decades of development.
Positives:
- Long history, stable community, no drama as we have in the MySQL world right now.
- Track record of new release every 12 months, with new interesting features every year.
- Lists 70+ active and 44 inactive contributors. Unlike in the MySQL world, most of these are only part-time developing PostgreSQL code on the side of a consulting job or other job, or in the case of EnterpriseDB staff, on the side of also developing closed source EnterpriseDB addons.
- Is most certainly being run in production, for critical workloads, even has a respectable market share for web use. (Skype is a known PostgreSQL user.) Consulting and support is available, though this topic appears also in the negatives.
- Nowadays also supports Windows, in addition to Unix and Linux.
- Marketed as the "most advanced open source database" as it used to have more features (and more complex features) than MySQL. The architecture and features are sometimes seen to be clones of how Oracle does things, for better or worse. Even so, which features are important or advanced is somewhat subjective - this topic also re-appears in the list of negatives.
Trivia challenge: Which highly current and controversial person is a past PostgreSQL contributor? Answers in comments below.
Negatives:
- Despite being well known and respected in open source circles, PostgreSQL has surprisingly poor adoption in traditional enterprise usage. In my few years at selling MySQL, I only came across 2 companies using it rather than MySQL - both of these had in-house expertise to support it. (This doesn't count companies I know from public sources, such as Skype.) Given that PostgreSQL is commonly touted as more suitable for enterprise use, I was very surprised when I saw the study done by the EU Commission related to the Oracle Sun merger, that in Oracle accounts, PostgreSQL isn't even on the radar as a competitor in Oracle's CRM.1 (To compare: MySQL was equal or had passed Sybase, which conversely was better than I expected.)
- A likely explanation to weak adoption in the traditional enterprise is the lack of a well known 24/7 support provider. (Interestingly, even if Sun was also providing PostgreSQL support at a time, nobody seemed to know about that either :-) While it is certainly possible to buy support for PostgreSQL, most people simply do not know these companies, which are often quite small. A positive development is that EnterpriseDB is now somewhat developing into a globally known player in this field, including their new communications strategy that focuses more on the PostgreSQL brand than their own EnterpriseDB brand.
- Historically, PostgreSQL hasn't always provided the features needed for mass adoption. I personally consider the late addition of Windows support as a primary reason MySQL got more popular, since until recently almost all developers would use Windows. PostgreSQL wasn't that easy to use, and it's almost impossible to understand how Postgres users could live so long without replication! Today, all of these features are somewhat addressed, but they do imho explain the small market share to a high degree.
Conclusion: PostgreSQL is a viable alternative for an open source database, it has certainly proven its stability and credibility as an open source project. Even so, it's low market share means that a PostgreSQL DBA is harder to find, and even the companies providing consulting and support are not as widely known as, say, those in the MySQL sphere. As a function of lower market share, PostgreSQL also receives less total development man-hours than MySQL.
Personally I always like to favor whichever technology has the most mass (while still meeting other requirements too, of course, like being open source and such). I believe having a huge community and even brand recognition is a valuable asset for any open source project. For me this is the main reason I once ended up using MySQL and still do: "everyone else" was doing it too. (Until then I actually was a PostgreSQL user, a satisfied one even.)
Comparing Drizzle and PostgreSQL community activity
An interesting observation is that the size of developer communities is roughly the same order of magnitude, both having less than a hundred recently active developers. For PostgreSQL most developers are volunteers or part time working on PostgreSQL on the side of another job. Drizzle has a core team of 5-10 working full time or part time, whereas the rest are volunteers. (Both in contrast to all MySQL forks where the overwhelming majority are full time developers and there are more full time developers than total developers in Drizzle or PostgreSQL.)
Also the OHLOH graph confirms this:
OHLOH graph for nr of contributors (y-axis) per month (x-axis) for Drizzle (RED) and PostgreSQL (GREEN)
This goes against the perceived size of each community, where one would expect the PostgreSQL community to be clearly larger.
Note, my web host, HostGator, seems to insist on running some fancy firewall checks between the website and the database, and this results in some errors when comments are posted on articles like this. If this happens to you, please let me know at henrik.ingo [at] avoinelama.fi and you may also email the comment itself if it is lost, so that I can post it here myself.
- 1https://ec.europa.eu/competition/mergers/cases/decisions/m5529_20100121…
see page 69. The publicly available version has exact percentages obfuscated, however PostgreSQL isn't even listed, so we do know it is equal to 0%. (...as much as I hate to see that.)
- Log in to post comments
- 173649 views
Drizzle
Hi Henrik,
Thanks for this, if anything it has shown where we need to communicate more on the Drizzle project. Some things that may help:
* We have roughly 8 people working on Drizzle at Rackspace (staff and contractors) although that includes QA, docs, etc...
* We have a target date for GA. I'm not sure what I am allowed to say there, I hope others can comment here.
* I don't think it would be fair to post benchmarks just yet but the last lot I saw were pre-InnoDB 1.1 and we scaled better than MySQL in some areas and MySQL scaled better than us in others.
* Compatibility with drizzle? Out of the box you can connect to it on port 3306, 4427 or unix socket with the MySQL client libraries. Data types are different (but drizzledump can convert from MySQL to Drizzle on-the-fly). I can (and do) run Wordpress on Drizzle just by running drizzledump to convert the tables, I think there has been some work on Drupal too but I don't know much about that. PHP's MySQL plugin connects straight to Drizzle.
* As for user adoption. Brian may have more info on that than me (I know people are trying it). I personally use it for a Wordpress blog, a SMF forum and my exim and dovecot setup.
* I think Stewart is putting some fixes back into InnoDB where he finds them, but currently this isn't much different to the one in the MySQL 5.5 GA release. Where we find Drizzle bugs that affect MySQL we file them and suggest fixes.
Please feel free to ask us any questions and let us know any positive or negative feedback (we need to know where we are going wrong as well as right! :)
re: Drizzle
Thanks Andrew! I intergrated some of this info back into the article post itself. Other comments:
* We have a target date for GA. I'm not sure what I am allowed to say there, I hope others can comment here.
Having a plan and a deadline is of course a good start. Meeting it is another :-)
But yes, I think it is reasonable to expect Drizzle to be closer to the deadline than MySQL 5.1, otoh it is only human to slip a little bit.
* I don't think it would be fair to post benchmarks just yet but the last lot I saw were pre-InnoDB 1.1 and we scaled better than MySQL in some areas and MySQL scaled better than us in others.
Rest assured the blogosphere will be interested to hear more.
* Compatibility with drizzle? Out of the box you can connect to it on port 3306, 4427 or unix socket with the MySQL client libraries. Data types are different (but drizzledump can convert from MySQL to Drizzle on-the-fly). I can (and do) run Wordpress on Drizzle just by running drizzledump to convert the tables, I think there has been some work on Drupal too but I don't know much about that. PHP's MySQL plugin connects straight to Drizzle.
Right. So I obviously know about the client-server protocol being compatible. I'm more thinking of missing features or other SQL syntax incompatibility.
But based on what you are saying, it seems one could develop a "compatibility plugin", that would target especially compatibility for DDL so that if you have TINYINT or MEDIUMBLOB columns they'd be silently converted to INT and BLOB for Drizzle.
As it is now it seems I would have to install Wordpress on a MySQL server and then migrate the schema to Drizzle. Not good in the long run.
* As for user adoption. Brian may have more info on that than me (I know people are trying it). I personally use it for a Wordpress blog, a SMF forum and my exim and dovecot setup.
Good start, more than I expected! (In terms of apps running, not the fact that one user exists :-)
re: Drizzle
I totally agree about the deadline, so far I feel we are on target, but time will tell.
I would certainly be interested in benchmarks since we moved to InnoDB 1.1.3. I'll see if it is possible to sort some out.
In theory you could create a compatibility plugin as a query filter (we don't have plugins which touch the parser yet, it is planned for a future version). There are several complications with this, mostly around DATETIME based types. For example, in MySQL 0000-00-00 is a valid date whereas in Drizzle the lowest possible date is 0001-01-01. Drizzledump actually converts invalid dates to NULL here.
Any idea why the Drizzle line on that graph drops off in December?
graph
Doh! I mean mid-2010, not December :)
re: Drizzle
Yes. It seems random people register these projects at OHLOH, and point OHLOH to look at whatever bzr branch (or other version control) they decided is the correct one. So my guess is that either Drizzle has changed the main trunk in mid-2010, or OHLOH is looking at a branch that never was the main trunk in the first place, and the branch has gone stale.
The same phenomenon can be observed looking at the OHLOH graph for MySQL, where prior to 2010 it shows 25 monthly developers (which is way too little!) and starting 2010 no activity at all. (Which despite some fears, uncertainty and doubt we may all share at the moment, is certainly incorrect :-)
So in summary, don't take the OHLOH graphs for real. They're fun to look at, but for real science you'd need to do your own study.
Graph missing legend and axis names
Interesting overview! but....could you please add axis titles and a legend to the graph? I presume x is year, and y is number of developers, but what is green and what is red?
re: Graph missing legend and axis names
Yep, thanks for pointing out those weren't actually part of the picture!
Assange :)
Assange :)
We have a winner!
Ding ding ding!!!
Assange is correct :-)
Hey, it's Christmas coming up and everything, so I want to give you a price. Send your postal address to henrik.ingo@avoinelama.fi and I'll send you a free copy of my book Open Life: The Philosophy of Open Source.
Decent assessment of the
Decent assessment of the 'PostgreSQL problem"; doomed to always be the "other guys", even though on a feature-per-feature analysis they come out ahead in almost every area. I question a couple of your comments though:
- ...as it used to have more features (and more complex features) than MySQL (emphasis mine). I don't know when you last used PostgreSQL, but it has only gotten more featureful over the past couple years, in fact increasing in feature capability at a far faster rate than MySQL.
- As a function of lower market share, PostgreSQL also receives less total development man-hours than MySQL. I wonder how on earth you actually know this to be a fact. Everything I see tells me PostgreSQL gets more high-quality developer attention than just about any open-source project barring the major operating systems (Linux and the *BSDs).
Anyway, while the popularity argument has merit, especially if you are developing web applications using standard PHP/Ruby libraries or relying heavily on standard open source web apps, I suggest there are times when that argument should be relegated to a lower priority. PostgreSQL was written to attack far different problems than MySQL. It all depends on the trade-off between ease of use VS. correctness and integrity:
Online content management, dynamic websites, non-critical systems: MySQL.
Online banking, scientific data, complex analysis, engineering support: PostgreSQL.
Ease vs Integrity
No, there is no trade-off between ease of use and correctness/integrity in MySQL. Neither has there been for many years.
To get all the correctness/integrity of PostgreSQL simply add these two lines to your my.cnf.
default-storage-engine=InnoDB
SQL_MODE=TRADITIONAL
If you're going to advocate for PostgreSQL please focus on its superior features, like user definable data types, more stored procedures languages, GIS functions and IPv6 support; not an old tired lie about MySQL's consistency.
re: Decent assessment
Decent assessment of the 'PostgreSQL problem"; doomed to always be the "other guys", even though on a feature-per-feature analysis they come out ahead in almost every area.
Thanks. This is probably a perfect summary of what I tried to say too!
- ...as it used to have more features (and more complex features) than MySQL (emphasis mine). I don't know when you last used PostgreSQL, but it has only gotten more featureful over the past couple years, in fact increasing in feature capability at a far faster rate than MySQL.
Partially I agree. PostgreSQL has for many years demonstrated solid performance in crunching out a new release each year, with non-trivial improvements in each release. But, when comparing to MySQL I'd say this is not such a big advantage anymore: During the MySQL 5.0 and 5.1 cycles, yes, PostgreSQL absolutely progressed ahead of MySQL, which could only demonstrate minor enhancements over a 3 year cycle. I argue that this is not the case anymore because 1) MySQL (integrated over all the forks) actually makes very nice progress at the moment and 2) MySQL nowadays covers all of the relevant parts of the ANSI SQL standard.
For instance, prior to 5.0 (6 years ago!) MySQL didn't have stored procedures, which was a strong argument for anyone in the enterprise world to use Postgres. Today, perhaps Postgres has some advanced features not in MySQL procedures, but most people are happy with what MySQL provides, so for getting more users this is not an advantage for Postgres anymore.
The other perspective is that PostgreSQL has also historically been lacking some important features: Windows support is a feature. Replication and clustering are pretty important features for any database today. So just saying that Postgres is more advanced is actually subjective. (Ok, so now you do have these things covered.)
- As a function of lower market share, PostgreSQL also receives less total development man-hours than MySQL. I wonder how on earth you actually know this to be a fact. Everything I see tells me PostgreSQL gets more high-quality developer attention than just about any open-source project barring the major operating systems (Linux and the *BSDs).
If counting man-hours, man-months, whatever, there is no doubt that for at least the past 5 years MySQL has gotten more "engineering investment" than PostgreSQL. I know MySQL AB employed over a hundred full time engineers, several hundred at Sun time. And I know from a few sources (including www.postgresql.org) that PostgreSQL doesn't have that many contributors, and most contributors aren't full time.
You could perhaps argue that those man-hours weren't always spent wisely or productively on the MySQL side, such as having a large team working on Falcon engine for many years (many many millions of dollars in investment) without producing anything.
Today the MySQL situation is much better and significant features are being produced, more than MySQL has ever seen. The only problem today is that this work is spread across 3-4 different forks, so the total of this work doesn't immediately benefit the one and the same product. (But given time, the code does diffuse across the forks.)
All this being said, I totally agree that PostgreSQL, with at least close to a hundred active developers and 20+ years of history certainly has earned it's place in the "Hall of Fame" of significant open source projects. It would very much deserve more adoption than it has, if you ask me!
PostgreSQL was written to attack far different problems than MySQL. It all depends on the trade-off between ease of use VS. correctness and integrity:
Online content management, dynamic websites, non-critical systems: MySQL.
Online banking, scientific data, complex analysis, engineering support: PostgreSQL.
Except that in practice, MySQL is used (a lot!) in:
Online banking: Shinsei bank. Hypovereinsbank. Deutsche Börse. Börse go. NGM. (3 last ones are stock markets, I hope it still counts.) Paggo...
Scientific data:
See Large Synoptic Survey Telescope at Stanford Linear Accelerator Center
http://www.mysqlconf.com/mysql2008/public/schedule/detail/849
Complex analysis:
This depends on your definition of complex. If complex == the SQL queries Postgres can do and MySQL can't, then yes, obviously. In practice, people do complex things on MySQL, like personalization and other data mining activities, and there are at least a couple storage engine vendors that are making a business out of Business Intelligence applications.
Engineering support:
Maybe you have something specific in mind, but certainly techies use MySQL a lot.
All of this said, yes, Postgres is an excellent choice for all of those applications. But in practice many will gravitate around the product with the largest mass. (And this will eventually also be Drizzle's main challenge.)
You are talking about actual
You are talking about actual practice, while I am talking about intent and appropriateness. Sure lots of techies use MySQL, but usually when I walk them through PostgreSQL's feature set I get one of two responses: 'WOW", or "WOW but I can't be bothered to learn how I can use any of this." I can fully respect either answer, as every technology choice is a risk/benefit judgement call, but that doesn't alter the facts of what the capabilities are. Along with extremely robust implementation of SQL standards, there are all sorts of techie goodies like complex datatypes, array types, user-defined aggregate functions, spatial types and functions, etc... (Anyone using MySQL over PostgreSQL for GIS ought to have their head examined)
The feature gulf is still far wider than you make it out to be. I have to use MySQL 5.1 for a medium-size web project after spending a few years enjoying the capabilities of PostgreSQL, so I have been forced to painstakingly re-discover MySQL's limitations. Yes, it now has stored procedures, views, subqueries, etc... but each of these has showstopper-level deficiencies that make it hard to accomplish what would be easy in Oracle or PostgreSQL. Set-returning stored procedures cannot be used in subqueries, for a major example of a showstopper. Referential transparency is one of the cornerstones of the relational model and you don't really have it with MySQL.
As for PostgreSQL's limitations in relation to MySQL, they mostly have boiled down to replication (Which has its own list of showstoppers), but now PostgreSQL has built-in replication, so one is forced to search a little harder and come up with some of MySQL's nice shortcuts like the REPLACE INTO statement or its converse, the ON DUPLICATE KEY syntax.
Other than that, it is as you say in your article: largely a question of ubiquity* and low barrier to entry. I just want to urge developers to explore a little before making up their mind based breezy 'market-driven' observations. Knowledge is power.
*Many companies using PostgreSQL tend not to advertise that fact, partly because the general public doesn't care, but also because it tends to be built into embedded systems or other complex software that is not really self-evident as a database application. (Cisco, for example, uses it in some of their routers: http://www.cisco.com/en/US/products/sw/voicesw/ps4371/products_user_gui… )
Postgresql advantages -- docs and admin
I'm a little late on the draw here, but I wanted to chime in with a few of my own postgresql observations.
I came to Postgresql almost 10 years ago as my first database system, and I've been incrementally learning its details ever since. Over this time, I've found the documentation to be extraordinarily helpful. It's concise, precise, and readable. It's consistently maintained from version to version. In a sense, I *learned* SQL and databases from the postgresql docs, and that has made me very fond of the project in and of itself.
A second strength relative to Oracle that I've heard from other DB admins is its relative ease of administration. I'm not a professional tech, and I've been able to figure out install and admin from the docs with no problems. I've heard a number of complaints about very painful Oracle installs and/or optimizations.
On a final note, I've always worked from within a debian linux system. I personally don't see Windows install as a major plus for enterprise -- linux/apache has a majority share of web server space for a number of very good reasons, none of which are related to cost. It's nice that folks can now try postgresql on windows - while they're young, for example.
Overall, I think that postgresql has benefited from years of development out of the limelight, leading to a stable base without any untowards push by a large and diverse user audience towards feature bloat. With replication and hot standby (and a ridiculously rich feature list - i just discovered conditional triggers this week!), postgresql seems to be about ready for prime time. With Oracle stomping around in MySQL land, it's nice to know that there's a flourishing OSS *database* ecosystem.
100% agree with the docs. I
100% agree with the docs. I learned SQL by reading PostgreSQL docs, even when I wasn't using Pg I read the docs. I especially liked that they document what feature is according to the standard and what is Pg specific, or compatible with some other database yet not in the standard.
Only later did I learn how to google for a file named sql1992.txt :-)