Posted
by
Soulskillon Wednesday November 28, 2012 @05:51PM
from the flip-a-coin-or-flip-a-table dept.

DoofusOfDeath writes "I've done a good bit of SQL development / tuning in the past. After being away from the database world for a while to finish grad school, I'm about ready to get back in the game. I want to start contributing to some OSS database project, both for fun and perhaps to help my employment prospects in western Europe. My problem is choosing which OSS DB to help with. MySQL is the most popular, so getting involved with it would be most helpful to my employment prospects. But its list of fundamental design flaws (video) seems so severe that I can't respect it as a database. I'm attracted to the robust correctness requirements of PostgreSQL, but there don't seem to be many prospective employers using it. So while I'd enjoy working on it, I don't think it would be very helpful to my employment prospects. Any suggestions?"

I've used Postgres commercially for years, with a number of employers. It's a great DB and having dealt with MySQL, SQL Server, Oracle, et al I'd never go back - though the softies tell me that SQL Server is much better these days.

I'd be surprised if you can't find plenty of work using Postgres. Maybe it's one of those things people don't feel comfortable talking about - like Delphi in the 90s. Plenty of people used it, but few would own up to what made up their "secret sauce".

Ditto. Delphi was the last time I actually loved programming, back in the mid/late 90s. Then I got a job using VB6 & raw C - bleh!Progress wasn't a bad 4GL/RDBMS back in the day - wonder if it's still going...

I've used Postgres commercially for years, with a number of employers. It's a great DB and having dealt with MySQL, SQL Server, Oracle, et al I'd never go back - though the softies tell me that SQL Server is much better these days.

Feature wise I like Postgres better than SQL Server, intervals, arrays and being able to do all sorts of DDL in transactions is excellent. What seems to be the killer in the places I've seen it is Integration Services - being able to create easy workflows to get data into, out of and between databases. If all I wanted was a database to run an application against, I'd have no hesitation with Postgres. SQL Server has mainly been fighting Oracle and that's like Outlook fighting Lotus Notes, you don't have to b

You can't push Oracle as far as you can push Postgres. Ok, there are lot of entities out there managing a huge amount of data on Oracle. But when you look at the extreme cases, the only relational DBMS you'll find is Postgres.

Of course, if you push hard enough no relational DBMS will handle the load.

Oracle is indeed king of the heap with regards to power and features. It also costs an insane amount of money. Unless you want to support Larry Ellison's habit of picking up women he wants to shag by offering them Ferraris (or so the stories go), then Postgresql is probably your next best bet.

With SQL Server, you set your transaction isolation level that you care about and then you begin a transaction - SQL Server will guarentee consistency in that transaction even if you're just doing multiple selects. And, SQL Server will not let you do a 'select for update'.

Interesting comment. MS SQL Server Locking is painful compared to Postgres when you have a huge number of transactions going on (per-page rather than per-row in general). Also, internationalization support in Postgresql is better (in my experience) - which matters if a single server is to provide data for users all over the world. MS SQL Server is fast, but this is for a reason, it cuts corners in some areas. Speed alone is a good enough reason to use SQL Server, but for most uses Postgresql is actually su

I'm a big Postgres fan and no Softie, but you sir preach ignorance as strength

Managing concurrency is the single biggest impediment to scaling we and most outfits face.

If you can't figure out how to perform the same function using an optimistic concurrency model I don't want you anywhere near our systems. If you think that makes me ignorant then so be it.

There is simply is no correct answer which involves laying a read lock. Oracle rightfully does not support ANY of these concepts or isolation modes. This only exists in sql server because certain children with trivial problems did

After reading the title, I opened the thread to recomend either PostgreSQL or SQLite. The two are different enough that after a fast glance on the projects, DoofusOfDeath should know what option he likes most.

Both are the most important free SGDBs out there (again, different enough that they don't compete), and even if few people are using them now (what I also doubt), they are posed to get most of the usage-share (as the term market-share doesn't really apply here) in the near future.

For truly install-and-forget embedded uses, Firebird fails because it has nowhere near the kind of testing that sqlite gets. Sqlite is tested to ensure data integrity that lets it be used in avionics, for example. It's the only open source project out there that I know of that could even begin to claim that. In the recent years, if there's a problem with sqlite database corruption, I treat it like a failed memtest run: it means the hardware needs to be replaced.

It's seeing a constant rise in usage. Also many projects (spacewalk!) have it as the only viable alternative to Oracle.Small companies with small to mid sized applications use it (see Jira or Fisheye, at Atlassian) as their main development platform.Also you shouldn't use your USA'ish perspective and only do something because it will benefit your job or future employer. OSS is about sharing, fun, knowlege and getting better. Getting better at your job is a welcome side effect.

When you start dealing with anything other than very basic website apps, MySQL's many significant deficiencies start becoming obvious. And the problems in the development seem to be institutionalized, it was this way even before Oracle took over. MySQL should be consigned to the dustbin of history.

Absolute rubbish. Postgesql handles massive data sets and can scale down to run on modest hardware (with a multitude of choices in *free as in beer* operating systems). So I wonder what evidence you have to support your counter-factual statements?

If you are an active member committing to a major database's code, then it will help your employment prospects no matter what. If you're committing to PostgreSQL regularly, that's strong evidence you are good at what you do.

Hierarchical DBs have been making a comeback recently, often reclothed as "NoSQL" databases specializing in "big data" analysis. There seem to be many opportunities to make these databases more applicable to current problems or just easier for relational DBAs to understand and implement.

Nosql DBs suffer pretty bad from Inner platform effect, where the users end up implementing their own classic SQL-RDBMS on top of the nosql. "I don't have joins... well I'll write on in ruby". You could probably do the community a huge service by PROPERLY re implementing at least a API compatible mysql system on top of a variety of various nosql services. That way devs could be buzzword compliant, while not actually having to change anything (well, the sysadmins will throw fits at the change for sake of

I actually love MySQL, but FWIW, someone noted a while back that Salesforce.com has announced intent to hire about 50 top gun PostgreSQL guys in the coming year. It seems obvious that they are preparing to unhook the money siphon leading to Oracle.
Assuming Salesforce follows through, all the herd-following executives in the U.S. will want to do the same. So I predict that demand for PostgreSQL talent will be pretty good for many years.

I'm happily surprised to see that all the early comments are in support of PGSQL, with brief anecdotes to back it up. I 100% agree - even if your future hopeful employer uses MySQL (which has actually matured a great deal), everything you learn in PGSQL will teach you the underlying theory of WHY good databases are good. You can apply that to any roughly SQL database. Further, PGSQL is even closer to zero-cost than MySQL, in spirit. So, if you have to go up against beancounters advocating for your software,

I see your point, but the reason I asked/. is because I'm hoping it's not an either/or proposition. So I was looking to find out why MySQL is less problematic than I see it as, and/or why PostgreSQL is more used than I realized, or if there's some other DB I should be thinking about.

Oracle, the database and connection software is quite respectable. The problem with Oracle is the organization which sells and services it. They like to "partner" with their customers, and comport themselves like a criminal enterprise. They send auditors to their customers' sites to ensure license compliance (meaning shake-down money for Larry Ellison). Training is expensive, but so are trained Oracle specialists. They're risking ruining Java with slow updates, and MySQL development seems to have slowed-- p

Stop being a prima donna and pick the one with the best employment prospects.

Or, as a developer with multiple prospects, he could pick the one he ultimately finds more enjoyment working with.

I'm sure there good "employment prospects" working on things like with the tangled mess of spaghetti and backward compatibility in WinSxS and the Windows source code, but I suspect that would be a PITA job to go to every morning.

I find working on better-designed systems tends to come with better job satisfaction.

The post is basically a troll for a video. The video is based on an old list of MySQL 4.x gotchas, [sql-info.de] many of which were fixed in the 5.x series. Most of them involve things like the semantics of NULL in special cases, truncation of indexed strings with trailing spaces, and similar stuff that an application shouldn't be relying on. There's a comparable list of PostGreSQL gotchas [sql-info.de] from the same source.

MySQL has political problems, because Oracle owns it and would prefer users buy their commercial products. The future of the free version is uncertain. The problems in the video aren't the ones to worry about.

We've been looking at work, but not for 6 months or so due to another project taking priority. At the moment we use MySQL and a commercial database, and another part of the organisation uses MS SQL Server. That's clearly not optimal. The software using MySQL is the least-complicated st

Well, I don't know. It is true that MySQL has improved by leaps and bounds over the years. However, every time they get rid of a batch of gotchas, there are new ones. I remember being elated to work on a project with MySQL 5 because I felt it had finally grown up to be a real database, only to find that there were concurrent insert issues that caused a huge chuck of my updates to time out and/or be aborted (I don't remember which).

The problem isn't so much the specific issues as the philosophy that gives ri

If you need a database, use one that tries to be ACID compliant first, then efficient second. Not the other way around. When your database model starts getting more complicated as does your skill set, you'll be thankful later.

I beg to differ - it's all a matter of philosophy. Do you want something that's a stubborn mule of a server that fights your every attempt to get something done, or would you prefer something that is more forgiving and let's you later discover your own personal logic flaws with a, "haha! whoops! There, fixed!" moment? The video for me only serves to further reinforce the reason I use MySQL. For one thing, I have never made any of the bone-headed programmer errors that the host illustrated, but I appreciate

http://en.wikipedia.org/wiki/Michael_Stonebraker [wikipedia.org]
Have a look at what he's done with Postgres, Vertica, VoltDB, and the other systems he's working on. You may find that contributing to this project aligns you with some great, very intelligent people -- that's opportunity for learning, opportunity for contributing, and opportunity for good networking.

You write: "I've done a good bit of SQL development / tuning in the past.... My problem is choosing which OSS DB to help with."

That's akin to saying, "I've done a good bit of SCCA track racing in the past. My problem is choosing which engine builder to intern with." Or even, "I'm an amazing, although out-of-practice chef. Should I work on enhancing Viking or Maytag gas ranges?"

Using a database (or any product) very effectively often has little or not translation into working on the guts of the product.

Not to take anything away from PostgreSQL and MySQL (and their forks), but these are mature systems with extensive communities and a very complex code base. If you want to learn the architecture of a new class of open source database systems, as well as to have the opportunity to make a significant contribution to a project, then you should consider joining a NoSQL project, such as Neo4j or MongoDB.

Try GT.M [wikipedia.org] for something different. It's a key-value database, dating back to the 60's, but still hitting hard in healthcare, vets and financial. Great performance. The underlying technology is called MUMPS [wikipedia.org]. Other MUMPS databases (such as Intersystems Caché, closed source) use MUMPS internally, then offer SQL, XML, object, etc layers.

Pretty much all the test cases from that video fail on MySQL if the sql-mode is set to traditional. MySQL will throw an error when data would be truncated, throws an error when you try to insert a NULL value in a NOT NULL column, refuses to alter a table if the existing data would be truncated, throws an error on an invalid date, on select only returns a warning for division by 0 but throws an error on an insert of division by 0, throws an error if you try to insert a string into a numeric column and so on.

I understand of course that the strict modes aren't enabled by default but they're easy enough to enable if you choose to. Via my.cnf, the command line when mysqld is started up or while connected to the mysql server itself (for just that session, or globally for all sessions).

I didn't run through all their examples, but mostly because I got bored and all their examples that I did try were throwing errors (except the select 1/0 one, which issued a warning) with the sql-mode set to traditional on MySQL (postgresql is also a sql-mode option but I didn't play with that one since I've never used it before).

First I should probably burn some karma and say "what a load of garbage". The headline asks what OSS database to HELP with, but the article summary might as well read "Which free SQL-compatible database to learn to use". And on top of that it contains the answer already, along with questionable dirt-showing on MySQL which makes it read like a guerilla-ad for PostgreSQL.

But in any case, it makes a major, huge difference whether the question is "which database codebase to contribute improvements to" or "which free database to learn for best amployment chances". Sounds like it's the latter, and in that case a follow-up question is what kind of employment. The one correct answer is "whichever database your employee is using" - don't expect to be able to choose a job on the basis of what database engine they happen to be using in one of the departments at the time. Second best answer is go with both; and again it makes a huge difference whether it's for self-employed web-site design or financial analysis for stock brokerage firm.

And if you actually went with MySQL, next question is which database engine. Huh, you ask? Well you see, MySQL is not a single database engine, in actuality it's a front-end to pluggable database engines. The stock release fetures at lest MyISAM, InnoDB, Heap, BDB, NDB and Archive (and few variations). In general it's a choice between MyISAM or InnoDB which are whole different story. When most people say "MySQL has such and such problem" they're actually talking about MyISAM, but MySQL has defaulted to InnoDB engine for years.

But the third and best answer is "none of the above". In most cases everybody seeking employment in relevant job will be fluent in SQL and have at least some experience with both MySQL and PosgreSQL, and it'll be rare for the employer to be at all interested in your ability to actually "hack" the database source. NoSQL databases offer ample opportunity to differentiate both on the job-market, and on the business competitiveness arena by improving the source-code (and in most cases as long as the binaries stay in-house, so can the source which makes bosses happy, but consult your OSS license).

I love PostgreSQL in theory but hate it in practice. It's a pain in the ass to work with... not very productive. For a long time, I felt it was worth it to endure this for the superior design, feature set, and technical correctness.

But one day I realized that I need to get things done, switched the MySQL. The learning curve was small but the main kicker was that things just worked and easily reworked. There are risks, limitations, and problems. It's very imperfect but I get things done now... and never have or care to think about the purist philosophies with which I used to love to indulge in.

In the end, you have to give up perfection to go anywhere.. Otherwise, it's like having to get half-way there first, meaning you have to get half-way to half-way first, etc. recursively forever.. With MySQL I take a reasonable number of precautions for things that can go wrong, ensure there are good backups, and deal with the others as they come.

Now I think MySQL is superior for practical use by a long shot. And I think that's why its adopted so heavily.

The key ingredients to successful technologies are:

(1) You can do something obviously cool or useful with it.(2) It's quick and easy to learn and use.

And that's it. This is why so many successful things are made by idiots. Look at HTML. It was made by Tim Burners Lee back when he knew very little. But 12 year olds were picking it up and making cool (at the time) web pages. Now he know so much more and has tons of backing from heavy weight organizations and money but cannot seem to even force the success of the Semantic Web. It's hard to learn and hard to work with even when you learn it. Furthermore, it's not obvious to most what cool or useful things you can do with it. Proponents keep saying it'll mature and will be easier when tools and libraries are available to make it easier... That misses the point. Even the tools mostly suck and are buggy because the basic tech. is a pain in the ass to work with. There are philosophical visionaries galore but no substantial progress beyond what grants and job requirements force people to do... and there won't be.

But one day I realized that I need to get things done, switched the MySQL. The learning curve was small but the main kicker was that things just worked and easily reworked. There are risks, limitations, and problems. It's very imperfect but I get things done now... and never have or care to think about the purist philosophies with which I used to love to indulge in.

So what things can you do easily in MySQL that you had trouble doing in PostgreSQL? Details, please.

I'm tired of hearing that "everyone uses..." No, they don't. MySQL is pretty popular with the open-source web-crowd but this is the same crowd that respects the engineering behind PHP. I've encountered plenty of people in that arena who would rather roll their own data-checks and treat the database as barely more than a key-value store than use the capabilities of the database and have to deal with handling exceptions. Bring up transactions, ACID compliance, data-integrity and the like at a PHP users group and you get blank-stares. The get-rich-quick-with-a-cute-kitten-website crowd cares not for such things (as an overgeneralization - there are plenty of high-traffic sites such as Instagram, hi5, Etsy and MyYearbook that run on PostgreSQL).

I've found the PostgreSQL community to be wonderful with opportunities to contribute at all levels. Answer questions on the mailing-lists, contribute to documentation, help at users-groups, give a talk at a conference. One always welcome contribution is doing testing and submitting results/patches during commitfests - and this gets you more involved with the code.

As to employment, it sounds like you prefer PostgreSQL. As such, PostgreSQL is by definition the most popular database among places you are interested in working. Do what you love.

The video shows a number of ways that MySQL seems to insert questionable data; ignoring NOT NULL, inserting default values when no default is specified, etc...

There are two databases that I have had to repair... Hypersonic and MySQL. MySQL I have to repair regularly in my MythTV box. Hypersonic states it should not be used in a production system. I have never had to repair Postgres, MSSQL, or Oracle.

I wonder if it's on a MyISAM or InnoDB table type, or if he has cheesy drive that is lying about write barriers, or if he's using a kernel that doesn't treat barriers correctly. You do lose a lot of write performance making mysql act safely.

I'm curious if those are still actually existent in >=5.0. I know I started avoiding MySQL in the bad old days, but from what I understand it's made a lot of strides in the conformance department.

I haven't bothered to look at it again since then, since Postgresql meets all of my needs, but I am curious. It can't still be that bad, can it? I can see all the bad old behavior being hidden behind default for legacy users, that's reasonable, but silent data corruption (and whether you're truncating strings or

Most of what's shitty about it is the MyISAM storage engine, which does approximately dick-all for enforcing integrity. It doesn't even have foreign key constraints. IIRC it can't do transactions either. The trade off is that it's slightly faster for some operations *eyeroll*

If MyISAM is good enough for your application then you may as well—no exaggeration—just use MongoDB or something.

InnoDB is much better. It's got some of the same not-confidence-inspiring quirks shown in the video but at least it supports transactions and foreign key constraints.

Biggest remaining differences off the top of my head are that Postgres supports a shitload more data types and data operations (many through plugins) like stuff related to geographic data and key-value stores (hey, you got NoSQL in my SQL!), and that Postgres has real separate databases, not just separate schema like MySQL, the difference there being strict separation of the data, so you can't, say, do a SELECT across two databases or even tell that there are other databases if you've only got a user account on one of them.

Lots of other under-the-hood stuff, I'm sure, but those are the main ones I can think of from a user's perspective.

Postgres is way, way more powerful, MySQL is (slightly) more widely supported and (IMO) the free tools, both command line and GUI, for working with it are easier to learn and generally friendlier.

MySQL's a completely miserable excuse for a relational database if you use MyISAM; it's only a mostly miserable excuse for a relational database with InnoDB.

The simplest way to say it is that MySQL is really more of a data store than a database. You can store stuff in it, and it'll get the data back reasonably efficiently, but in terms of actually operating as a proper compliant database for critical information it just isn't designed that way. It works great for storing the back end for your web server, but if you wanted to store complex data in it and needed it to be 100% accurate, transactional, and reliable, the product just doesn't fit the bill. For all that it's got a paid "enteprise" edition, it's really more in the space of something like SQLite or SQL CE than it is in the space of Oracle, and again it's not an issue of whether it can scale or whether it's buggy, it just simply isn't designed to be compliant to the required level. That's largely the reason it works so well as a LAMP back end and is so easy to administer, but it just isn't fit for purpose for much more.

If only they were actually edge cases(look carefully they mentioned one was a common Ruby on Rails mistake). MySQL's habit of pretending everything is alright when it's not has burned more than one of my previous employers.

But they missed the real WTFs like mysqldump creating dumps that need to be hand edited before MySQL will restore them or my all time favorite: mysql user authentication simply does a "SELECT * from mysql.users" and if the fields get reordered by a new MySQL release then logins will simply fail. The best part is that the officially documented way to fix that is a mysqldump followed by a restore which... deletes the table and puts the fields in the wrong order again. The last major MySQL upgrade of my employer's systems involved me starting the new install from an empty DB, restoring everything except the mysql.users table and recreating the accounts using a script.

Please don't pretend it's not a crap database. Those of us who have to deal with it every day know better.

How well mysqldump works depends on the data. It seems to have a problem with escaping some strings correctly. Right now we use both MySQL and PostgreSQL with the dream of moving everything to PostgreSQL.

I was looking for an easy way to automate character conversion from Latin-1 to UTF-8 for the forum software I use. I found out the hard way that the built-in MySQL recoder is completely broken, and will barf in different ways depending on which version number of MySQL you are using. No errors or warnings during the conversion for any version. You'll just find out later that all the field limits are wrong. You can only find out if it worked or not by inserting new rows and finding out if you get errors about data being too large to fit in the field, and whether it fails or not has nothing to do with the actual length of the data, but with whether you send 7-bit or 8-bit characters.

I gave up trying to get MySQL to do it, and wrote my own conversion tool.

And that's just for baby stuff for a web forum on a personal web site. I can only imagine what MySQL is like in an enterprise environment.

The problems with MySQL aren't bugs, they're decisions. Decisions that can't be reversed for the sake of backwards compatibility.

Agreed. Or at least, that was my impression. That's what led me to assume that my sensibilities would be seriously mismatched with whichever person(s) decide the design of MySQL.

According to my design and development sensibilities, a lot of things in that video would have been high-priority bugs. The fact that they're still present in whichever, I assume recent, version of MySQL was shown in that video, suggests to me that the leaders of that project have a different definition of "bug" than I do.

I recommend setting yourself about fixing some of that long list of fundamental flaws in MySQL.

Traditionally, especially in 2012, this amounts to listing stuff like "doesn't have transactions" which was fixed back in Bush the Second's first term.Shoveling thru obsolete FUD to find the truth is a harder job than you'd think, which also shows "good little worker bee" stick-to-it-ive-ness

If employment prospects are all that matter, stay away from the major SQL databases. They're mostly feature complete, have large established developer communities that are hard to break into (sometimes requiring employment at the sponsoring company) and often have a lot of legacy baggage that limits what you can accomplish.

Meanwhile, in the NoSQL world, people are busy re-inventing the wheel. You can take decades-old techniques and apply them to new features of these databases. For example, Redis doesn't ha

Actually I wasn't. I figured the/. crowd might have some knowledge about the relative acceptance and prevalence of the two databases in European business settings, and where things are moving.

For example, if the consensus was that PostgreSQL was so rarely used that it was a dead-end, then I'd suck it up and work on MySQL despite my misgivings.

But as long as PostgreSQL is showing some signs of life in a business setting, I'll perhaps try to pitch in on that.

I also figured that maybe there was some other up-and-coming database out there that I should take a look at. The/. community is good at bringing alternatives like this to light.

As far as flames, I should have been clearer about what I meant by "design flaws". I realize that it's somewhat subjective. What I should have said is that MySQL's behavior strikes me as a lot more surprising in some cases than does PostgreSQL's, and I didn't think that was going to chance. (Probably in a similar vein, I like strongly typed programming languages and compile-time correctness checks. I think it's a mindset kind of thing.)

Actually, I started looking at it when people mentioned it in this conversation.

After skimming their website, I'm still a bit sketchy on why MariaDB exists. It sounds like a fork, except that they seem to periodically re-sync with Oracle's MySQL releases. Is the idea to just keep on developing open-source versions of whichever features Oracle adds to MySQL?

It was unintentional. My apologies. It would have been fairer to write that I find MySQL's behavior surprising, and am attracted to working on databases that have behavior that's more in line with what I'd expect.

The video I linked to is what convinced me that MySQL behaved in ways I don't like. I prefer system designs that kick and scream when something bad is entered. MySQL seems geared towards letting some operations succeed and just using defaults, in cases where I would prefer a loud error report.

"Big" projects tend to limit the size and number of companies that are hiring. You may not actually like the sorts of companies that are in that group. "Big" projects can sound glamourous but often aren't all they're cracked up to be.

"Big Data" is horse shit.
It's code for "stuff it all in a pile and maybe someday we can get some of it out, but not in any order that makes sense, and not in any reliable fashion".

The problem with arguments like this is that it overlooks the fact that there is a lot of value in data that doesn't need ordering in advance, and where any missing gaps in data are not critical failures. Not all data can or should be stored in such systems, but for that which can, there are much better ways to store it than RDBMS. RDBMS isn't going away, but it's certainly been commoditised. In time, so will Big Data type storage, but until then it's a good thing to make good money on if you are startin

"Big Data" is, in essence, just a scalable flatfile with better searching. That turns out to be a very useful thing in the real world, where you have no idea what sorts of queries you might run when you store the data, and need to handle a lot of "only ever going to run once" queries.

If your "big flatfile", for example, resides entirely in memory on 1000 cheap servers, you can be remarkably fast with unindexed searches and poor structure. Or you might come along years later when you know what you want, an

Though I have to say I dislike the "let's go 'contribute' so I can fill up the old resume and get laid^Wa job"

That's not precisely where I'm coming from. It's more that I only have enough free time to contribute to one OSS project. But I also need to position myself for getting database work in Europe. I'm trying to find some OSS DB project that resides in a happy intersection of those two goals.