Godfather of open source databases facilitates a merger to quicken development, growth.

Michael "Monty" Widenius' relationship with MySQL, the database named after his oldest daughter, has been difficult since his company (MySQL AB) was acquired by Sun Microsystems in 2008. It only fractured further when Oracle acquired Sun. But now Widenius is bringing much of his old team back together as he plots a fully open-source fork of MySQL. He hopes the new initiative will both keep core developers paid and grow the community for MariaDB—a fork of MySQL 5.1 named after Widenius' younger daughter—as it begins to diverge even more from the functionality of Oracle's MySQL.

Widenius' Monty Program has merged with SkySQL, the company that provides commercial support for MariaDB and other MySQL variants. SkySQL was founded by former employees of MySQL AB who left after the Sun acquisition. The organization offers enterprise support subscriptions for MySQL and its variants—competing directly with Oracle's MySQL Enterprise licensing. The merged company will support all versions of MySQL, develop MySQL and MariaDB tools, and fund further development of MariaDB.

In a release, Widenius said, "With this merger, I’m ensuring that the MariaDB project will remain ‘open source forever,' while knowing that enterprise and community users of both the MySQL and MariaDB databases will benefit from best-in-breed products, services, and support provided by SkySQL. And who doesn’t want the best for their children?"

All of this comes as big Web players are taking a hard look at their database investments. The buzz around the "NoSQL" database movement has begun to see a backlash. Late last year, Google announced its own efforts to shift back toward SQL database technology with its Spanner distributed SQL database research. And many read-heavy websites (such as Ars, for instance) have moved back to MySQL and other open-source SQL databases after dalliances with NoSQL.

The merger announcement comes on the heels of Wikipedia's adoption of MariaDB as a replacement for another fork of MySQL maintained by Facebook. Wikimedia Foundation Site Architect Asher Feldman announced the adoption of MariaDB in a post yesterday, saying that the move was in part because of "a preference for (open source) projects without bifurcated code bases between differently licensed free and enterprise editions." (That's a reference to how Oracle has moved to keep enterprise functionality of its "official" MySQL code line proprietary.)

As a result of the merger, Widenius himself is stepping aside from the business of running the company. In December, Widenius moved to split the roles of Monty Program between business and non-profit operations by forming the MariaDB Foundation—a nonprofit organization intended to serve as the center of the MariaDB community and the holder of its GNU General Public License. Widenius took the role of foundation CTO. Earlier this month, Simon Phipps was brought on as the foundation's secretary and interim CEO. He's the president of the Open Source Initiative and former head of Sun's open source program.

Phipps told Ars that the MariaDB Foundation was modeled on the Eclipse Foundation, the entity that maintains the Eclipse open-source integrated software development environment. "We've recruited (Eclipse Foundation executive director) Mike Millinkovich as a board advisor," Phipps said. "We're just about to add one of the core committers on MariaDB to the board, and then we'll have six board members." The foundation will have "representative governance," Phipps added, with funding and support from corporate members of the MariaDB community.

Basically, as a lot of major web sites like Facebook and Twitter emerged, they found traditional SQL databases to be too much of a performance bottleneck and turned to databases designed to scale better. These more scalable databases don't adhere to the rigorous theory of relational databases (which you query with SQL) and so they provide lower data integrity and/or functionality. It became trendy to switch to these databases, probably in no small part because a lot of developers don't understand or like SQL or how it integrates with object-oriented code. But in the meantime, SQL databases improved and the hype bubble surrounding NoSQL deflated so some companies have been moving back to traditional databases.

Basically, as a lot of major web sites like Facebook and Twitter emerged, they found traditional SQL databases to be too much of a performance bottleneck and turned to databases designed to scale better. These more scalable databases don't adhere to the rigorous theory of relational databases (which you query with SQL) and so they provide lower data integrity and/or functionality. It became trendy to switch to these databases, probably in no small part because a lot of developers don't understand or like SQL or how it integrates with object-oriented code. But in the meantime, SQL databases improved and the hype bubble surrounding NoSQL deflated so some companies have been moving back to traditional databases.

It really depends on use-case. I use Redis (not technically NoSQL) for simple tasks (think storing sessions), the key is that the data can be lost at any time without any ill effects. And MariaDB for more traditional database tasks. Redis is lightening fast, but I like my database engine to enforce its integrity itself.

It really depends on use-case. I use Redis (not technically NoSQL) for simple tasks (think storing sessions), the key is that the data can be lost at any time without any ill effects. And MariaDB for more traditional database tasks. Redis is lightening fast, but I like my database engine to enforce its integrity itself.

Right, if you don't need the rigor provided by traditional databases, you can take advantage of the speed and simplicity of NoSQL. And different databases provide different feature sets so it's good that there's some variety to cater to a wide variety of situations. But the moment you catch yourself implementing transaction isolation or a declarative query language or similar in your application, you need to step back and ask yourself if you chose the right database. I think a lot of people jumped on the NoSQL ship because it's cool and their favorite new framework, Scooby-On-Scales, supports it out-of-box, not because they actually thought through the pluses and minuses.

What this guy did with his company is hilariously awesome. He basically sold his company, quit, and then continued developing the product that he previous sold off.

The wonders of open source...

To be fair, he broke off when things stopped going the way they promised. Having been part of a large company acquisition, there are a lot of promises about the future of your software that don't end up coming true.

So yeah, if he wanted MySQL to continue and remain a force, this was his only real choice. And we, the end users, are the ones benefit from it (as evidenced by the continued development of MySQL by Oracle); as the use of GPL intended.

It really depends on use-case. I use Redis (not technically NoSQL) for simple tasks (think storing sessions), the key is that the data can be lost at any time without any ill effects.

For storing sessions, I just use the filesystem. Use the session cookie as the filename, simple unix filesystem locking while performing a write, cron job to delete old sessions (use the atime value of the file), and you're done. 30 or 40 minutes work and you've got a fast and bulletproof session storage system.

There's no need to use a database (sql or no-sql) for something like that unless you need to distribute it across multiple physical machines.

I've yet to find a _common_ situation where either a reliational DB or just the filesystem cannot function perfectly, if well designed. And if you can't design your relational databases properly... why not spend time learning that instead of learning how to use a completely different database?

MariaDB looks good. So far I haven't been able to convince the boss to let us experiment with it... MySQL is perfectly fine, so why switch?

It really depends on use-case. I use Redis (not technically NoSQL) for simple tasks (think storing sessions), the key is that the data can be lost at any time without any ill effects.

For storing sessions, I just use the filesystem. Use the session cookie as the filename, simple unix filesystem locking while performing a write, cron job to delete old sessions (use the atime value of the file), and you're done. 30 or 40 minutes work and you've got a fast and bulletproof session storage system.

There's no need to use a database (sql or no-sql) for something like that unless you need to distribute it across multiple physical machines.

I've yet to find a _common_ situation where either a reliational DB or just the filesystem cannot function perfectly, if well designed. And if you can't design your relational databases properly... why not spend time learning that instead of learning how to use a completely different database?

MariaDB looks good. So far I haven't been able to convince the boss to let us experiment with it... MySQL is perfectly fine, so why switch?

That may work for low intensive tasks, but really? Files to the disk? Everyone tries to avoid exactly that and use RAM or Ram Cache for a reason, and databases evolved from that to be faster (indexing, etc)

There was too much hype around NoSQL, just like it is now with Cloud. SQL databases got better like with any technology and virtualization is also getting better, which is more or less cloud. I think we are going to see exactly the same with Cloud, people moving back to traditional servers or virtualized systems, just like most are moving back to SQL from NoSQL because the advantages are more or less minimal this days vs the drawbacks.

It seems the tech works loves to invent new stuff and terms, which always has the basis on things that worked before, then everyone moves back to the good old stuff which got just as good or even better.

As for why to switch? Not sure, maybe some don´t like where Oracle is going with MySQL, not to mention enterprises features are paid, as with MariaDB and this forks you get extra enterprise features for free. Some things are implemented which do not exists in MySQL either like with PerconaDB:http://www.percona.com/software/percona-server

A couple of really high profile organizations moving "back" to sql is not indicative of an industry wide shift in anything. The needs and circumstances of super huge organizations like Google and Facebook are unique among the millions of applications and companies out there.

We started working with NoSql because of it's simplicity and increased productivity. Had nothing to do with performance.

On the other hand, the productivity gains realized by having flexible schema and not needing to worry about the object/relational impedance mismatch are pretty hard to deny (depending on the specifics of a given problem domain).

For us it was a big risk considering it was unknown territory. However the risk has paid of in spades in terms of how quickly we have been able to come to market with new products. Moving to NoSql was one of the best technical decisions our business has ever made. But again performance had nothing to do with it. Once the learning curve was over, and a mental switch was made so we stopped thinking about what we were doing in relational db terms, we experience a very quantifiable order of magnitude developer productivity increase.

Same goes with the cloud by the way. Switching to Amazon/Azure has been game changing. Our core business has nothing to do with data centers and servers. They are simply a necessary evil for us. I'd much rather let someone else (who is way better at it) deal with those types of things. I just want a platform to deliver connected applications, and Amazon/Azure allow us to focus on building the apps and not worrying about anything else.

Doesn't mean it's right for everyone. It was for us though. Use the right tool for the job?

"My" is a somewhat uncommon but not at all strange female name in Sweden(and as someone pointed out, a Finnish name as well). In fact I have a relative named My. And given the Swedish sounding names of the entire family I'm going to guess they belong to the non trivial Swedish-speaking minority living in Finland much like Linus Torvalds. So there you go with regards to the name.

A couple of really high profile organizations moving "back" to sql is not indicative of an industry wide shift in anything. The needs and circumstances of super huge organizations like Google and Facebook are unique among the millions of applications and companies out there.

We started working with NoSql because of it's simplicity and increased productivity. Had nothing to do with performance.

On the other hand, the productivity gains realized by having flexible schema and not needing to worry about the object/relational impedance mismatch are pretty hard to deny (depending on the specifics of a given problem domain).

For us it was a big risk considering it was unknown territory. However the risk has paid of in spades in terms of how quickly we have been able to come to market with new products. Moving to NoSql was one of the best technical decisions our business has ever made. But again performance had nothing to do with it. Once the learning curve was over, and a mental switch was made so we stopped thinking about what we were doing in relational db terms, we experience a very quantifiable order of magnitude developer productivity increase.

Same goes with the cloud by the way. Switching to Amazon/Azure has been game changing. Our core business has nothing to do with data centers and servers. They are simply a necessary evil for us. I'd much rather let someone else (who is way better at it) deal with those types of things. I just want a platform to deliver connected applications, and Amazon/Azure allow us to focus on building the apps and not worrying about anything else.

Doesn't mean it's right for everyone. It was for us though. Use the right tool for the job?

Meh, I don't know about RDBMS being so restrictive. There are also some very huge gains. For one thing you have a high degree of data integrity, you can't accidentally drop bad dates into an SQL database for instance, whereas most non-schema-based databases have at best weak typing. The ability to have guaranteed relations is big too. Trying to guarantee referential integrity or cascade deletes of records in NoSQL is NoFun. Actually what I found increased our productivity a TON was getting rid of all the bullshit ORM crap. THAT improved productivity a zillion percent. Whether you use one or the other type of database you will still have to map your object attributes. At least in our applications this is really rather a detail, not something to worry about. Schemas tend to remain quite stable over the years here, but your usage may vary.

I agree though, AWS is huge. People can call it a fad if they want, but they haven't tried to build out a larger scale data intensive business, clearly. I could easily sink a million $ in capital into hardware to reproduce what I can rent from Amazon with no up front cost at all. EVENTUALLY when you get big enough you probably will find that its cheaper to build a data center, but I'll be damned if it is cheaper than giving away half my company to raise money to buy hardware that will be scrap in 5 years, nor cheaper than hiring a team of people to do capacity planning and babysit that hardware, and pay for all the things needed to make it a 5 9's facility.

A couple of really high profile organizations moving "back" to sql is not indicative of an industry wide shift in anything. The needs and circumstances of super huge organizations like Google and Facebook are unique among the millions of applications and companies out there.

We started working with NoSql because of it's simplicity and increased productivity. Had nothing to do with performance.

On the other hand, the productivity gains realized by having flexible schema and not needing to worry about the object/relational impedance mismatch are pretty hard to deny (depending on the specifics of a given problem domain).

For us it was a big risk considering it was unknown territory. However the risk has paid of in spades in terms of how quickly we have been able to come to market with new products. Moving to NoSql was one of the best technical decisions our business has ever made. But again performance had nothing to do with it. Once the learning curve was over, and a mental switch was made so we stopped thinking about what we were doing in relational db terms, we experience a very quantifiable order of magnitude developer productivity increase.

Same goes with the cloud by the way. Switching to Amazon/Azure has been game changing. Our core business has nothing to do with data centers and servers. They are simply a necessary evil for us. I'd much rather let someone else (who is way better at it) deal with those types of things. I just want a platform to deliver connected applications, and Amazon/Azure allow us to focus on building the apps and not worrying about anything else.

Doesn't mean it's right for everyone. It was for us though. Use the right tool for the job?

Meh, I don't know about RDBMS being so restrictive. There are also some very huge gains. For one thing you have a high degree of data integrity, you can't accidentally drop bad dates into an SQL database for instance, whereas most non-schema-based databases have at best weak typing. The ability to have guaranteed relations is big too. Trying to guarantee referential integrity or cascade deletes of records in NoSQL is NoFun. Actually what I found increased our productivity a TON was getting rid of all the bullshit ORM crap. THAT improved productivity a zillion percent. Whether you use one or the other type of database you will still have to map your object attributes. At least in our applications this is really rather a detail, not something to worry about. Schemas tend to remain quite stable over the years here, but your usage may vary.

I agree though, AWS is huge. People can call it a fad if they want, but they haven't tried to build out a larger scale data intensive business, clearly. I could easily sink a million $ in capital into hardware to reproduce what I can rent from Amazon with no up front cost at all. EVENTUALLY when you get big enough you probably will find that its cheaper to build a data center, but I'll be damned if it is cheaper than giving away half my company to raise money to buy hardware that will be scrap in 5 years, nor cheaper than hiring a team of people to do capacity planning and babysit that hardware, and pay for all the things needed to make it a 5 9's facility.

Neither RDMS or NoSQL are suitable for all database tasks. Which is better is determined by your needs not hype from either side. In reality you have three areas:

1. Situations were ACID compliance and transactional integrity are critical are properly the domain of RDMS databases.

2. Situations were the data is naturally semi-structured and once entered mostly remains unchanged or changes very slowly and ACID compliance and transactional integrity are not critical. Probably a very good application for a NoSQL database.

3. Then there is a domain where the neither is perfect as a backend so the choice is due to other factors such has how one obtains the raw data, overall size of the database, scaling issues, what your staff knows, what you current have, etc.

The problem is when people make technical decisions based on marketing hype not on technical requirements.

There are two main MySQL forks; MariaDB and Percona. (Yes there are others, but they REALLY widely diverge.)

MariaDB diverges more from Oracle's MySQL, adding new incompatible storage engines and features.

Percona focuses on complete byte-level compatibility with the original Oracle MySQL while greatly improving performance and carefully curated features. They're a small shop, and they offer amazing support with no-joke seriously world-class guys answering your support tickets.

I've supported both forks, and my feeling is that Percona is better suited to enterprise use at this time. It's what I recommend to our clients. MariaDB is a more forward-looking product, but it's not fully baked yet.

Oracle doesn't want to support an open-source platform. Oracle is in the process of slowly killing MySQL, in such a way that people (in their eyes hopefully, if they succeed) RECOIL from it in disgust in the future. Whatever you do, don't buy Oracle's flavor of MySQL or pay Oracle for support.

There are a lot of strengths to NoSQL that traditional RDBMSs can't replicate.

When you get to the point where only having 1 or 2 database servers is a bottleneck, NoSQL databases start to shine in terms of scalability. Especially, if you need to ramp up on short notice. Scaling a RDBMS across multiple servers is doable, but hard. And it gets harder with every server you add.

RDBMS are designed for relational data. The moment you start storing non-relational data like "documents", you need to start thinking about NoSQL. Databases are typically the biggest bottleneck for any application. If performance is a requirement, you don't want to be forcing square pegs into round holes when you get to this layer.

NoSQL isn't going to replace RDBMS any time soon, but it also isn't a "fad" that's going away either. It has its strengths, as does a RDBMS. You need to evaluate those strengths against your needs. Use the right tools for the job, not just what's in vogue. Even within NoSQL, there are different database types for different purposes (document, graph, multi-dimensional, object, key-value, etc). But if performance and scalability aren't issues for you, just go with what your developers are most comfortable with. You'll save money through productivity alone.

Also NoSQL doesn't mean you can't use SQL statements to access it. The NoSQL community refer to it as "not only SQL" these days.

When you get to the point where only having 1 or 2 database servers is a bottleneck, NoSQL databases start to shine in terms of scalability. Especially, if you need to ramp up on short notice. Scaling a RDBMS across multiple servers is doable, but hard. And it gets harder with every server you add.

Scaling RDBMS is something of a specialty for me, so please let me comment on this.

The difficulty of scaling is directly related to one thing - data consistency. If data inconsistency doesn't matter (such as web sessions, calculation results or other easily re-created data), then scaling is dirt simple. Just pile on slaves. The same is true for NoSQL services (most of which also use some form of asynchronous replication) - just keep adding slaves.

Now then, when the integrety of your data matters, things get harder (no matter the choice between RDBS or NoSQL). You have to start talking in tradeoffs - performance & setup complexity versus data loss tolerances. You have to start caring that your data is synced to disk before completing a transaction. If you have multiple servers, you have to ether accept potential data drift due to asynchronous replication, or come up with some sort of two-phase distributed commit. You have to start ensuring somehow that data isn't duplicated or overwritten by changes made to multiple writer nodes.

Both NoSQL and RDBMS offer the spectrum of choice when it comes to scaling; it all depends on how much your data matters when compared to performance. When someone tells you that it's "easy", they're not telling you the whole truth. Scaling properly (no matter the software) is hard; it only becomes easy when you make serious compromises somewhere along the way. Anyone who is telling you otherwise has some prime real estate on Florida swamplands they want to sell you.

I wouldn't say "compromises", exactly. Scaling IS easy if you architect for it from day one, in the same kind of way that surviving a jump from an airplane is easy when you have a parachute. It's when you try to scale a monolithic legacy application that you run into challenges.

Redis: Single master, master/slave asynchronous replication. Does not by default write data to disk on transaction commit. Data set must fit in memory - with overhead enough to hold a duplicate of the dataset if you're enabling replication.

Mongo: Single master, master/slave asynchronous replication. Automated failover. Does not by default ensure data is written to disk before allowing a transaction to commit. May shut down completely on automated failover failure.

Couchbase: Cluster based asynchronous replication. Active dataset must fit in memory; all clients may be stopped while data is asynchronously read/written to disk. Does not by default ensure data is written to disk before allowing a transaction to commit.

MySQL 5.5 using default engine: Singal or dual master, master/slave asynchronous replication. Does ensure data is written to disk prior to transaction commit. LRU dataset stored in memory, only querying clients are affected by disk reads. Replication sql application is single threaded.

Percona Cluster: Cluster based synchronous replication, with data consistency checks between active nodes. Does ensure data is written to disk prior to transaction commit. Commit time hampered by network latency to slowest node. Data inconsistencies result in entire dataset being written to node from another node; major performance impairment.

Yes, yes, I worked as a DBA too. Everything has its own special set of limitations. If you plan your architecture around those limitations you do not need to compromise. Even something as simple as sharded mysql on commodity hardware/blades/VMs can massively scale if your application is built around it.

I could easily sink a million $ in capital into hardware to reproduce what I can rent from Amazon with no up front cost at all. EVENTUALLY when you get big enough you probably will find that its cheaper to build a data center, but I'll be damned if it is cheaper than giving away half my company to raise money to buy hardware that will be scrap in 5 years, nor cheaper than hiring a team of people to do capacity planning and babysit that hardware, and pay for all the things needed to make it a 5 9's facility.

AWS isn't five nines unless you build you application to tolerate high degrees of independent component failure. The secret is in knowing that it's possible, practical, and sensible to spend the same money and time making your application tolerate high degrees of independent component failure regardless of whether you choose to run that application in a public cloud or a private cloud (formerly known as a datacenter).

Sean Gallagher / Sean is Ars Technica's IT Editor. A former Navy officer, systems administrator, and network systems integrator with 20 years of IT journalism experience, he lives and works in Baltimore, Maryland.