Making rpm builds a first class citizen: Why?
Last weekend I released rpm files for the latest Drizzle Fremont beta (announcement). As part of that work I've also integrated the spec file and other files used by the rpmbuild into the main Drizzle bzr repository (but not yet merged into trunk). In this post I want to explain why I think this is a good thing, and in a follow up post I'll go into what I needed to do to make it work.
(And speaking of stuff you can download, phpMyAdmin 3.5.0-alpha1 now supports Drizzle!)
Why produce RPMs yourself?
The RPM package manager and it's older brother the Debian package manager were originally created by the Linux distributions as a tool to standardize and automate installation of all the software included in the distribution. Crucially both systems also include dependency tracking, which makes sure all of the pieces of software you install will work nicely together.
Due to this history there is a division of labor in how software is brought to the end user: The typical open source project would historically focus on making a source code release. The distributions would then pick up this source release and build it for the hardware architectures they support, with the optimizations, directory hierarchies and other configurations used in the distribution. (Nowadays those tend to be very homogenous though.) And they would link against the library versions included in the distirbution - hence the need to carefully track dependencies.
In today's world though, increasingly the projects themselves need to take responsibility to produce user installable binaries. A new project for instance isn't included in any distribution yet, so the best way to make sure users can download and test their software is to provide them with friendly RPM and DEB binaries they can install. This is also true for established projects: Users may want to keep up with your newest releases even if an older release is included in all the Linux distributions. A perfect example of this is that even if it is common for users to still run CentOS 5.x, there is absolutely no reason you should still use the included but antiquated MySQL 5.0. In fact, distributions also support this activity with services like OpenSuse Build System and Ubuntu PPA repositories.
Some people still think that the normal way to install software is to build it from source. As far as they know everyone does it that way. Unless you are a Gentoo user, where that is, in fact, the normal way, and you belong to this group of people... you really need to get out more.
Once you delve into the world of RPMs and DEBs you'll discover what I would label as anti-patterns that arise from this historical division of labor.
For example, building Drizzle from source is quite easy. The build system is in good shape and you can just do something like
sudo apt-get build-dep drizzle
...and you have drizzle installed. Compared to this, building the RPM packages from our latest Drizzle release was much more work. I based it on the spec file from the GA release last April, which had been kept somewhat up to date by BJ Dierkes. Even from such a good starting point, it took me roughly 3 weeks (of evenings here and there, not work time) to get my first RPMs to actually build. Why?
Because the RPM building has historically been decoupled from the rest of the development, things break fairly fast. In the Fremont releases Brian and Mark made a change to no longer install version 2 of the libdrizzle library, only version 1. The way RPM spec file works it expected to build a libdrizzle-2 package, yet there was nothing there to put into the package. So things break. Almost any file-level change in the built release will cause such breakage. Then it is left to the packager to ask developers by email why things have changed (but they won't respond to emails) or browse changelogs to find out about the change (but changelogs are too high level to answer all these questions) or commit logs (in Drizzle these often contain barely enough information, but might not do so for your pet project). It is a lonely feeling, I can tell you. But since my starting point was pretty good, and I'm fairly up to date with what goes on in Drizzle development, I survived this exercise. It was still quite annoying.
The reason this happens is that it is considered ok for developers to not care about RPMs or DEBs. I bet most Drizzle devs don't even know where the scripts to build the official Drizzle RPMs are kept. It took me a while to track them down for sure. Developers just produce source code, which in the best case will pass
make test but that's it. For instance in Drizzle if a patch doesn't pass the test suite, it will not be merged. But if a patch breaks the rpmbuild, nobody will notice, and nobody will care.
Clearly a better solution is to at least integrate the rpm build process to be a part of your continuous integration tests, so that if someone makes a change that breaks rpm packaging, he will also have to correct it (or get help in correcting it). And now, to make patch management easier it is probably necessary to make sure your rpm spec file is in the same repository as the rest of the source code. Otherwise you end up in chicken-egg discussions like "This patch works if you test it against this other patch against our build scripts here, and commit them here and here... And you also need to first merge these other uncommitted changes here and here..." Source code repositories exist to solve this problem, so put your stuff in the same repository!
The other antipattern I've seen both with MySQL packages and Drizzle packages is that the packagers end up accumulating code into their scripts that really should be in the upstream repository. For instance the Debian MySQL package applies from 5 to 10 patches on top of official MySQL. One wonders why these are not fixed in MySQL itself. In Drizzle the packagers do a reasonably good job of pushing the patches also upstream, there's a chance for the DEBs there will even be zero patches needed!
Sometimes the use of these packaging originated patches is pure abuse of the package system: for instance Debian includes the innotop utility as a patch to MySQL. (It is not part of MySQL as it wasn't developed by MySQL/Sun/Oracle.) I'm pretty sure the correct thing to do would be to package Innotop as it's own package, and then use dependencies as appropriate to have it installed alongside MySQL.
The last bit usually added in packaging phase is a default configurations file. Granted, the example configuration files shipped with MySQL are laughable, with 1GB of RAM considered huge. Drizzle currently doesn't offer any such configuration file. So the distributions try to provide their users with a more usable default configuration. But still, do Debian and Red Hat users really need to have different configurations to do the same thing?
The oddest result of this I've seen was in the DEBs of MariaDB 5.1 and 5.2 releases. You see, MariaDB got their DEBs done by Arjen and Peter from the OurDelta packaging project. As a result, MariaDB configuration, as downloaded from Monty Program, actually shipped with this introduction:
# MariaDB database server configuration file.
# Base configuration courtesy of Open Query (http://openquery.com/)
# For production use, case-specific preparation is still required.
# This is *not* an optimised config, merely a more sane baseline:
# - InnoDB default (e.g., ACID out-of-the-box, same as on Windows)
# - strict mode (for proper input checks, same as on Windows)
# - various other useful settings
# - make use of MariaDB/Percona/OurDelta enhancements/extensions
# For tuning assistance, please see http://openquery.com/services
A good example why developers should care more about packaging. I often wondered how much business this MariaDB advertisement generated business for Open Query and at what point Monty became aware of this. (But actually it's a pretty good configuration file. I totally think that Open Query deserves the credit of authoring it. I just assume it was not intended by the MariaDB devs to actually be there.)
In the next post I will explain how to integrate the RPM build files into your main code repository and routine CI efforts.