I often wonder what's behind the increased trend behind Hadoop and other NoSQL technologies. I realize if you're Yahoo that such technology makes sense. I don't get why everyone else wants to use it.
Reading Stephen O'Grady's self-review of his predictions for 2010 for the first time gave me some insights into how such people think:
Democratization of Big Data
Consider that RedMonk, a four person analyst shop, has the technical wherewithal to attack datasets ranging from gigabytes to terabytes in size. Unless you’re making institutional money, budgets historically have not permitted this. The tools of Big Data have never been more accessible than they are today.
My recent account of The State of MySQL forks seems to have gotten quite a lot of attention. I promised to follow up with a separate piece about Drizzle and also PostgreSQL, as the other major open source database, so I'd better keep that promise now.
Just a short note again on HandlerSocket developments: PHP bindings to the HandlerSocket client library have been spotted in the wild: http://code.google.com/p/php-handlersocket/ (I'd like to credit someone, but I have no idea who user "avuenta" is on Google Code. Is it you Dathan?)
I found this on Dathan Vance Pattishall's blog. I haven't personally had time to try it out yet as I'm still deep into my current project (of which I hope to blog in January).
"...php-handlersocket which is a PECL type version (C driver with exposure to PHP). It does the job but needs some work that I'm doing now."
Just wanted to highlight that Percona Server has now added HandlerSocket to its most recent release, being the first "MySQL fork/distribution" to ship it in easy to consume binary downloads.
HandlerSocket brings NoSQL to MySQL, and does so with a vengeance! It was developed at DeNa, by Akira Higuchi and is already used in production in their MySQL servers. The announcement on my former collague Yoshinori Matsunobu's blog flaunts a 7x performance improvement over the standard SQL interface in MySQL. The most astonishing part is that their MySQL is now faster than Memcached, even if the latter doesn't store anything to disk, so with this NoSQL-for-MySQL solution it makes sense to remove the caching layer completely!
Giuseppe "The Data Charmer" Maxia recently posted his take on the MySQL forks. I had been pondering whether to do the same, and seeing that what I planned to write will nicely complement Giuseppe's article, I was inspired to follow him into the same topic. Note that last Spring I created a Map of MySQL forks in preparation for Monty's keynote at the MySQL user conference. So let's see how things have evolved. I'll look into MySQL ecosystem as a whole and the forks separately.
The post is long, but the key takeaway is that despite the challenges, the combined development seen in the MySQL ecosystem is probably stronger than ever, the current situation is hard for an outsider to grasp but manageable, and if a few more obstacles can be overcome, we are looking into a very bright future indeed. There are more than 100 engineers (how much more?) working full time on the mysql code base (including both developers, QA, build engineers...). This development effort is an order of magnitude higher than other open source databases I'm aware of, in particular PostgreSQL and Drizzle. Often the open source project with most momentum and mass will come out as the winner, no matter what challenges it may seemingly be facing, and this is the case with MySQL too.
Some time ago I was asked to do a study of our most popular open source projects to assess 1) what governance models are out there and 2) if the governance model has any effect on the project's success (such as size of developer community) on the one hand and on the other hand on the business of the related vendor(s). Some of the results are quite remarkable and have general applicability, so I wanted to share them here:
(Small updates done on 2011-07-14. OpenJDK size clarified on 2012-05-21.)
The 451 Group's annual report on the state of the open source business world is out. Already the title: Control and Community suggests they are once again on top of what has been going on this year. Analyzing about 300 open source related businesses they not only "get it right", but were actually able to uncover some facts even I was unaware of and this impressed me a lot. If an analyst can dig up statistics to back up something that I already "intuitively" know in my heart, that is a useful service. But if they can make me go "ah, I didn't know that" on a topic I consider myself quite an expert in, the I'm impressed!
This is an analyst report, available for a price that would be completely unreasonable for a private person. I was pondering whether I should go begging for a free copy to satisfy my curiosity on the topic. But that wasn't necessary, as the next day I was offered a copy by Matthew Aslett himself:
Last week I announced internally that after my paternity leave ends next year, I will not be returning to Monty Program.
When I joined the company over a year ago I was immediately involved in drafting a project plan for the Open Database Alliance and its relation to MariaDB. We wanted to imitate the model of the Linux Foundation and Linux project, where the MariaDB project would be hosted by a non-profit organization where multiple vendors would collaborate and contribute. We wanted MariaDB to be a true community project, like most successful open source projects are - such as all other parts of the LAMP stack.