I am Anders Karlsson, and I have been working in the RDBMS industry for many, possibly too many, years. In this blog, I write about my thoughts on RDBMS technology, happenings and industry, and also on any wild ideas around that I might think up after a few beers.

Wednesday, February 15, 2012

Why I think SQL_MODE is useless...

I think the SQL_MODE setting is a sort-of good idea, but that the implementation of it is useless, more or less. What I would want from a feature like this is to allow me to control the quality of the data in the database, but that is not what SQL_MODE does, regrettably.

Rather, what SQL_MODE does is define a default quality checking level for an individual session. This is NOT the same thing as guaranteeing that data has certain qualities in the database.

To begin with, a specific session can change it. If you want to insert an invalid data, just set SQL_MODE=ALLOW_INVALID_DATES and go ahead and insert.

Secondly, setting this to a non-default value requires no specific privileges at all!

Thirdly, as anyone can bypass these checks, you would assume that there was a way to check afterwards that data follows a specific SQL_MODE, or at least a function that I could apply to a table or column to check it, but no, no such thing exists.

The different consistency checks provided are somewhat ambigious and difficult to understand. Some applicatioon that found certaian aspects of MySQL 4.0 to be incompatible with their applications sets SQL_MODE=MYSQL40, which implies a lot of things.

And last but not least: This variable controls data quality. If some user decides to set this to some value to get around the database quality checks, you expect it to be logged somewhere, but no. Well, you can see it if you look carefully in the binlog, after the fact (and there you don't see the actual SQL_MODE, only the numeric representation of it).

And having said all this, I know what many of you think: Why let the database chack your data, that is the job of the application anyway? Right? Well, sort of. In my case, when I write applications of build websites, SQL_MODE isn't often an issue (except that Joomla insists on setting SQLMODE=MYSQL40, which means that my nice FK constrained database gets screwed up). But in larger team, like at work, developers are in one part of the office, and I am playing with the database. My responsibility: Make sure data is there is is right. The developers responsibility: Build their applications and make them run.

Here at Recorded Future, this is a very, very small problem. We have a good team, which isn't that big, we communicate and we also use less and less of MySQL these days. But I have been in places where this is a very real problem, and if delevopers use stuff like Hibernate it gets even worse, as the developers are now even more distanced from the real data.

And note here: This is not to say that developers lack competence and knowledge of databases, or that they write bad database code something, it's just that their focus is different from someone like myself, who work with databases on a regular basis.

And look, if you were a consultant or a developer, wouldn't it be nice to look at a database and just KNOW what data goes in there. Will I find a date that is the 31st of April in a datetime field or not?

And yes, there are applications that just will not run with MySQL 5.x. Fine, allow an option to be set for this, but don't make it changeable on a session level! On a global level then? With multiple applications using different databases, this is also a bad idea. I'd say make it a per database attribute!

And then we come to the situation where you really NEED to disable constraints, SQL_MODE etc. OK, I can allow this, assuming:

A higher level privilege than normal, or even better, a special privilege, a special privilege, is needed.

There has to be a way to check it, like an ALTER DATABASE SQL_MODE=xxx, ALTER TABLE SQL_MODE=xxx etc. And also some means of column level checking (can you write a script for this checking? Sure. Will it be faster or more convenient than running it as part of the database process? Nope, far from it. And convenience is important here as we want to run this often, to make sure that our data is what it should be).

And by the way, it is OK to change SQL_MODE on a session level, but only change it UP (where this is applicable). I.e. if the global SQL_MODE is ALLOW_INVALID_DATES, then a session can turn this off, but not ON (and by the way, again, this is a weird name. Id's rather have all SQL_MODE options be names as a FEATURE that is a good things: STRICT_DATE_CHECKS with the default being ON would be a better option, in my mind).