State of the MySQL forks

It's been some time since I last wrote an overview of the state of the MySQL forks, but the last few weeks have been eventful enough that it is a good time to again see how the competing variants are positioned against each other.

I have written on this topic 1-2 times a year. Here are links to the previous overviews:

As is the case with this post, many of those previous ones coincide with some events or discussion at the time they were written, and the posts contain links to other bloggers commenting on whatever was current then.

We discussed something like this in the corridors of Percona Live UK 2011, and I predicted that MariaDB 5.5 will probably be the last MySQL release that MariaDB is able to merge from its parent. Already that effort took them well over a year. MySQL 5.6 will contain Oracle developed code for features that MariaDB already has, like sub-query optimization and better GIS support. Unless MariaDB developers want to toss out their own, already released code, this is another reason why they simply can't continue anymore to just wholesale merge MySQL as an upstream source.

So do I have an opinion on this? Of course I do...

Monty always said that MariaDB is not a fork, it is a branch. Well, now it is a fork. It is no longer a subset or superset of any specific MySQL version, it is different.

One of the marketing promises of MariaDB is to be fully backwards compatible. That is true for MySQL 5.1. It is now true for 5.5, but for a long time MariaDB did not actually contain MySQL 5.5 features at all. And finally, it won't be true for MySQL 5.6. It will be interesting to see in a few years how much these two forks continue to diverge.

MariaDB was from the beginning a bold effort. The MariaDB team knows they are capable of understanding all of the codebase and develop it further and they don't need any help from Oracle to do so. At the same time, they don't have the resources like Oracle does. Merging MySQL 5.5 took long enough. From 5.6 onwards we can bet that MySQL will have some features that MariaDB doesn't (yet) offer.

To me, the decision to "fork for real" seems a bit bold. There is no question the MariaDB developers can pull it off technically, and they are well funded to have the long term resources needed. But do they have the momentum needed? According to the 451 Group study, MariaDB is only now starting to compete with Percona for second place in the MySQL forks ecosystem in terms of adoption. When other former Sun projects have forked, be it LibreOffice, Jenkins or IllumOS, they have been able to move a significant majority of the developer mindshare with the fork and most crucially, have had the support from Linux distributions with them. MariaDB does not yet have this. Instead, most people continue to use stock MySQL, in particular as it is shipped with their favorite Linux distribution.

With the installed base still solidly following MySQL then, instead of being a competitive advantage, the fact that MariaDB is becoming more different from MySQL might instead turn into barriers to migrate. For instance, I used to cite the lack of semi-synchronous replication as a showstopper for using MariaDB. Now that it comes with MariaDB 5.5, I'm of course pointing to lack of Galera support (e.g. what Percona XtraDB Cluster does). Forking when you are still catching up with the competition certainly has its risk.

Unless of course Oracle would do something - a mistake, say - that would actively push people away from MySQL...

Following the already previously implemented policy of no longer using a public bug tracker, MySQL management had apparently realized that the community developers (including, but not restricted to competitors MariaDB and Percona) had no trouble figuring out bug fixes from the source code commits and test cases.

When trying to understand what follows, it is possibly timely to pause for a moment to understand the pressure Oracle faces from its competitors, in particular Percona (mostly in the US) and SkySQL/MariaDB (in Europe). In the past two years, Oracle has lost tens of millions of MySQL revenue to this competition. (Note: The new revenue seen by SkySQL and Percona is of course less, as they compete with price.) There must be a lot of pressure to do something!

The natural fix? Oracle has now stopped using the public launchpad branches completely, only publishing source tarballs together with each new MySQL release. (Update: The launchpad branches have now been updated, see more info and a link further below.)

In addition they have completely stopped publishing new regression tests. (Which is still true.) Essentially MySQL is now doing the same as was implemented for Solaris from day one: Development is now as closed as it can be without actually stopping to use the GPL completely. Source code is "thrown over the wall" together with each release, but that's all you get. (The return of bzr repositories with commit history improves the situation somewhat.)

Ok, so will missing test cases matter to most end users? There will of course always be those that will react adversely to increased amount of closedness, and will be looking towards MariaDB and Percona now to support a more open source alternative. But no, most people will not care. If anything, end users will be more driven by actual features being closed source (and only for paying customers), like the plugins we discussed in my last installment of this series that are available as open source implementations in MariaDB and Percona.

Hence it is easy to predict that one of these moves will be the straw that breaks the camel's neck, and we will see Linux distributions switching to another MySQL fork in the coming 12 months. And this is what will matter. Most of the end users will continue not to care, and they will continue to use whatever is automatically installed for them - by the Linux distro.

There is a lot more that could be said about Oracle MySQL. They are about to release MySQL 5.6 which will be (close to) on par with MariaDB features like sub-queries, microsecond data types and GIS, plus a slew of new features in replication (global transaction id for the win) and a NoSQL memcache API for 2x to 7x performance increases! When you read about people proclaim how good a steward Oracle has been for MySQL, it is of course to some extent also diplomacy, but it is also true that measured purely in engineering efficiency, MySQL of today is way more productive than it ever was before Oracle times. Which is not to say this is necessarily thanks to Oracle, but it is still a fact.

Anyway, screwing your community and in particular the Linux distro community is today more newsworthy than the great engineering work, so I will have to write more about those some other time.

Update:

Conveniently, Baron has today blogged about MySQL 5.6 features. If you want to know more about those, his post has it all.

Mark Leith otoh informs us that the out of date Launchpad trees are now refreshed again and that Oracle has made no decision to stop publishing code on Launchpad. Based on Stewart's post it seems at least the 5.1 tree was not updated since May. On the other hand, it appears that 5.1.64 was never released, which might explain why there was no update for so many months. In other words, I'm inclined to believe Mark's assertion that this was yet again a case of just general neglect by the Oracle team, rather than active witholding of code commits. Of course, that's exactly what easily happens when development is internal by default, and the publication of code is additional overhead.

So far there has been no comment on the missing test cases though.

Percona

In all of the above Percona Server has continued to tightly follow Oracle MySQL as an upstream, adding their own, mostly performance related, enhancements on top. This strategy has allowed them to release a MySQL 5.5 based release only months after Oracle, and again now they already have a 5.6 beta release out before MySQL 5.6 has even been released.

With their current strategy Percona is somewhat dependent on progress in their upstream MySQL. Here they have turned out to be lucky. Since Oracle is doing well in keeping up with MariaDB, it means Percona too is benefiting from this. Expect Percona to have sub-query optimizations and improved GIS later this year in a 5.6 GA release.

While the Percona Server strategy is relatively focused, and I wouldn't expect them to do large refactoring of the optimizer code base like MariaDB, they have actually often been able to be first to market with significant improvements. The XtraDB engine itself of course saved MySQL from a near-death experience when InnoDB didn't scale on multiple cores, and Xtrabackup is now the de facto backup tool for MySQL (regardless of your choice of server fork). Percona was also the first to integrate a NoSQL API via HandlerSocket. This year Percona has released Percona XtraDB Cluster, which is Percona Server integrated with Galera clustering. This is yet again a game changing introduction of new technology, which I'm certain will wipe out a whole category of also rans in the category of high-availability solutions.

Interestingly then, with a strategy that seems quite minimalistic at first sight, Percona in fact often comes away as over-achieving. If you would count only the amount of new features, or complexity of them, then certainly MariaDB is a more ambitious effort than Percona Server. But in practice Percona Server offers a cominbation of keeping tight with the MySQL upstream releases, while also delivering significant enhancements of their own. In practice then I have found that if anyone is to be considered a superset of everything good the MySQL ecosystem has to offer, surprisingly, it often turns out to be Percona Server.

To be continued...

In the end, all of the above are still good choices for a MySQL database. All of them are much more advanced than MySQL 5.1 was in its time. With the introduction of advanced synchronous clustering techniques like Galera, coupled with superior InnoDB peformance, MySQL competes well with any of your favorite NoSQL (key-value) solutions.

Still, to follow the competitive situation between the 3 MySQL compatible forks has perhaps introduced some cognitive overhead into the game. I hope to make some concluding remarks on all 3 of these in a follow up blog post.

Even if I'm myself guilty of also mostly using planet mysql as the main mysql communication channel, and hence had not read this thread on mysql-internals, I want to applaud Tomas actually using the mailing list here. It's what normal open source projects do too, and I often find myself wishing MySQL had been like that (ever).

Still, it seems both Stewart and Sergei tried to reach out to Oracle and got no responses. Letting Launchpad lag behind for 3 months, coupled with radio silence, is perhaps not the best community relations strategy. Oracle could do itself a favor by at least responding to such requests, even if the response is along the lines of "we've gone to the beach and will update them in August".

Thanks for this summary Henrik. The interesting point in the development trajectory is when MySQL forks cease to be backwards compatible with each other at the level of APIs (there are now minor differences between the SQL offered by MariaDB and the other forks), client/server protocols, and storage formats. So far as I known data at the storage level including binlogs has not diverged except as necessary to add new features such as global transactions in MySQL 5.6. If data format divergences were to occur it would mark the beginning of a fundamental split in the marketplace.

This is a great question which we can use to entertain a discussion for many nights to come, at MySQL/FOSS conferences around the world :-) For sure, the way we speak about MySQL forks is technically incorrect, for instance Percona Server is not a fork as I understand the word.

When speaking of MariaDB, I'm mostly thinking about the app developer perspective. Since there is relatively little divergence on the InnoDB level, I actually don't foresee people having problems there for a long time to come. Even MariaDB includes vanilla InnoDB and Percona's XtraDB - they don't do anything there on their own. This is perhaps a good thing to point out!

"One of the marketing promises of MariaDB is to be fully backwards compatible." But still clients (command line clients, GUI clients, Web clients) are and the protocol as such is, what is the most important. But I am afraid that that might change one day too.

Drizzle, the fork that has diverged the most, still claims to have protocol-compatible (and GPL unencumbered as a bonus) client libraries. So it looks like maintaining protocol compatibility is not a very big issue and won't be, at least short term.

Henrik, thanks for the run down. I think the lack of test cases is just more of Oracle applying their ridiculous "no talking about security problems" policy. A test case for a security bug is, in fact, an exploit, so it stands to reason that if you're going to do non-disclosure, you can't be releasing exploit code. Of course, we all know this is ridiculous.

Also you failed to mention Drizzle. I know its not 100% compatible, but it is interesting in the same conversation. By being the first to cut itself off completely from MySQL, it has the advantage of already being in a position to innovate around Oracle into new markets rather than cling for dear life to the thin GPL life-line trailing behind Oracle's crazy train.

As you know I'm much involved also with Drizzle, so cutting it from this article has nothing to do with my personal interests! Mainly the post was already long, plus I had a meeting coming up, so I had to end it. I will possibly address Drizzle in the followup.

But still, I don't think Drizzle is that relevant - unfortunately. It has been a great platform for hacking, and last year I had plenty of dead time to do that. But sometimes I feel it is not quite there yet for the average DBA: there still aren't any RPMs or tar-binaries and there are many other places were it seems to be waiting for a little polish. Thanks much to your efforts, the situation is better on DEB based distros! Anyway, for the geek audience that enjoy compiling their software from source code, Drizzle is a nice community project that will make you feel good - that's for sure.

But it's not drop-in compatible with MySQL like these three guys are with each other.

About "no talking about security problems"... I could actually understand if that was the case. I mean, it would still not be the open source way, but I could understand that Oracle had such a policy even if it makes life difficult for you. But according to Sergei Golubchik's blog *all* new test cases are witheld. Yet all bugs are not security problems. There is no good excuse for that.

As far as I can tell test cases are not to be published for security exploits. MySQL has suffered from few exploits that bypass authentication/authorization. But it appears that some crashing bugs are also considered to be exploits so we will not get test cases for some crashing bugs. Looking at past release notes a lot of crashing bugs are fixed in each release so many test cases might not be published. But I assume other test cases will be published.

This discussion has suffered a bit because we on the outside are trying to figure out what has changed and the people who know have not stepped up to explain it.

I actually understand that crashing / locking bugs need to be considered security bugs. This is because MySQL is used in shared hosting, so you need to assume authenticated users may be hostile to each other. Some security scanning tool (name of which I can't remember) classified crashing bugs as severe even when they could only be triggered by connected, authenticated user, because they were considered DoS attacks. Then I had to sit through a 60 minute conf call just to explain to my dear project manager that if I have a hostile user already authenticated in my DB, he can do much worse things regardless of such bugs...

MariaDB team explained in a blog post why the next release will be numbered 10.0. The summary of it all is that they have now diverged so much

Did you notice that InnoDB [Plugin] also has its own version numbers? Having
your own version numbers is not an indication that something has diverged. Own version numbers are typically an indication that one has its own major features. Own major features may, or may not cause one to diverge.

that you can no longer directly say whether the next release will be like MySQL 5.5, MySQL 5.6 or something else completely.

What is that "something else completely"? Drizzle? :-) I think we have
communicated fairly clearly that the next release is 5.5 plus extra features, some of which are alternative implementations of 5.6.

I predicted that MariaDB 5.5 will probably be the last MySQL release that
MariaDB is able to merge from its parent. Already that effort took them well over a year.

For the most part of that period, merging has not been a company-wide effort.

Monty always said that MariaDB is not a fork, it is a branch. Well, now
it is a fork. It is no longer a subset or superset of any specific MySQL
version, it is different.

I get an impression that a big part of this post is dedicated to discovery that "MARIADB IS DIFFERENT FROM MYSQL!!!". Of course, it is different, that's why we call ourselves a "branch" and not "Oracle's download mirror".

One of the marketing promises of MariaDB is to be fully backwards
compatible.

This still holds true. On-disk data formats, network protocol, and SQL dialect are compatible. If you have any specific concerns about this part, I'd like to hear them.

It seems you are taking a defensive tone. For my part it has not been my intention to say something negative about MariaDB. For instance, I'm not saying it is a bad thing to fork from MySQL. Yet, I think it is significant to observe that the level of divergence is now increased.

Did you notice that InnoDB [Plugin] also has its own version numbers?

But there is a clear 1-to-1 binding between given MySQL version and the InnoDB version it includes. Rasmus' point explains that there will be no such binding between MariaDB 10.0 and any given MySQL version. (Except if you take the position that all your future versions are based on MySQL 5.5, but then this is precisely the definition of a fork!)

This still holds true. On-disk data formats, network protocol, and SQL dialect are compatible. If you have any specific concerns about this part, I'd like to hear them

The next release is 5.5 plus extra features, but it seems to me there will never be a "MySQL 5.6 plus extra features". The point is not that there is some specific problem that will prevent people from switching to MariaDB. Instead, if I have developed an application against MySQL 5.6, then there is a list of features I need to be concerned about if I want to migrate to a given MariaDB version. Yet, the problem is not the features on that list, the problem is that most people have no clue what is on the list. So checking all the differences is what adds uncertainty and extra work to the migration process.

In practice the challenges will surely be minor things. For instance Holyfoot explains to me that MariaDB uses precise Decimal arithmetics in your new GIS implementation, but MySQL uses floating points. However subtle, this might cause different results on GIS operations. Now, anyone of course understands that the MariaDB implementation is "better" in this respect, yet if the application today is running on MySQL, then the MariaDB result is "different" and that may be a problem. Even if in theory the MariaDB result would be "correct", what people expect from "compatible" is not "correct", but "the same".

In the future the divergence may be bigger. For instance if your 10.0 version is based on MySQL 5.5, then it might be a year or two before you really include everything that is released in MySQL 5.6. (And by that time we may already see MySQL 5.7?) Then in practice, for users of such a feature, MariaDB will be considered a downgrade. For MariaDB 5.3 this was certainly the situation wrt semi-sync replication.

The reason behind this tone was that, from my reading, the blog post left impression that 1) the difference is HUUUGE, much bigger than it actually
is, and 2) there is an implicit assumption that the only effect of the differences is to break queries that used to work.

The next release is 5.5 plus extra features, but it seems to me there will never be a "MySQL 5.6 plus extra features".

From query optimizer point of view, the distance to "5.6 plus extra features" is not that great. The biggest parts of MySQL 5.6 are subquery optimizations
and MultiRangeRead+BatchedKeyAccess. For these features, MySQL 5.6 code is not more than cleaned-up MySQL 6.0 code, without anything new.

MariaDB has already backported all that, and made substantial improvements to it.

Things that are in MySQL 5.6 and not in MariaDB:

filesort + ORDER BY ... LIMIT optimization. Small feature, we're already backporting it and are close to finish

EXPLAIN for UPDATE/DELETE. Small feature, will be easy to backport. We'll need to adjust the code so that query plan is actually saved somewhere, and
not splattered over various variables on the stack, though

EXPLAIN FORMAT=JSON. I doubt whether this feature is going to be useful*.We also dont want its code, it looks like an IOCCC entry. I've never thought that
the code to print out data structures could be that complicated.

(*)- There are blog posts about how great this feature is, but I think it will go the way of mk-visual-explain: we don't ever see anybody posting output of
mk-visual-explain other than to show how its output looks like, for some reason. If I am proved wrong, we will re-implement FORMAT=JSON. After I've done the SHOW EXPLAIN feature, printing out data structures should be easy

Optimizer trace. This also hasn't seen much development since the initial idea of 2009. I have the same concerns as with FORMAT=JSON. The feature is
difficult to *merge*, because its printouts are all over the place (Conflicts!). It won't be particularly difficult to backport, though: how hard it is to add
some print-out statements? Especially, when a lot of what needs to be printed was already being printed into DBUG trace anyway?

To sum up: we're missing 4 minor features, of which one has been already backported, and two others cause scepticism about their utility.

I am not an expert in replication code. But if you look at GA releases, we have some of 5.6 features in MariaDB already.

Ok, fair enough. Rasmus' justification for why you want to use version number 10.0 gives the impression that the difference is bigger. If the difference is small, then I don't see the point in such an odd version number.