On Databases, Temporal Datatypes and Trains

Introduction

The data type aspect of databases is a key feature as is it when it comes to programming languages. I would guess that all programming languages, with the possible exception of assembly, provides a set of predefined “built in” datatypes. Some programming languages are limited in this respect, like Forth, where others have a larger set of types. A data type determines what data can be stored, what operations are allowed and semantics.

One family of data types that is present in more or less all relational databases (I don’t say all here as I know someone will tell me about an arcane relational database systems developed in Burundi where this is not true) is the temporal types, i.e. datatypes that hold a time value. This in difference to most programming languages where the native datatypes are numeric and strings, and all other types are extensions using some kind of structure style.

So, databases have temporal datatypes, programming languages do not (OK, that is a generalization). In this blog post I will look at some aspects of temporal datatypes, and in a later blog post I will dig even deeper into this outrageously interesting subject.

Temporal Datatypes in Databases

Before we get into the aspect of trains, let’s spend some time with the temporal datatypes themselves. As already stated, the type of an item among other things determine the semantics of the type. Let’s start with a look at the temporal data types in MariaDB, they are DATETIME, TIMESTAMP, DATE, TIME and YEAR. If we for a second assume that you don’t know anything about how MariaDB looks at these, you might ask yourself what the difference is between TIMESTAMP and DATETIME, so let’s start with there.

Both DATETIME and TIMESTAMP store a date and a time, but there the similarities end. And by the way, I’m not saying that we should change the behavior of these datatypes, just that they are sometimes a bit odd.

The DATETIME datatype is more recent and is more in line with other relational databases, but on the other hand it takes up more space on disk. Both of these types also have microseconds support (since MariaDB 5.3). To enable this you add a precision, such as DATETIME(6). The reason a TIMESTAMP is more compact is that it can only hold a limited range of dates, from Jan 1 1970 up to 2038. Well, that should be enough for most purposes right? Yes, just as representing a year with just 2 digits and allowing for 640 K of RAM was “enough” a few years back! Those kinds of assumptions really made the world a better place for us all.

As for the DATE, TIME and YEAR datatypes, I will skip them for and now focus on DATETIME and TIMESTAMP.

One thing you do NOT want to use temporal types for

I had a customer, many years ago, possible way back during the Reagan administration or so, that had an issue. They were using TIMESTAMP, using millisecond precision, in data in their OLTP systems as a PRIMARY KEY. They had determined that there would not be more than 1 transaction per microsecond, and that this would work. It didn’t. For the simple reasons that:

On average, there was a lot less than 1 transaction per microsecond, but during high load times, it could well be more than this.

Computers tend to get faster over time and the load of popular services also increase which means this scheme was bound to break faster the better it was.

This was a stupid assumption.

Their solution was to have the transaction retry when they had a PRIMARY KEY violation, which was neither effective, nor performant or practical. Don’t do something even remotely similar to this!

Other relational databases support for temporal data types

Other relational databases also support temporal datatypes, and you might have sensed that I feel that the MariaDB temporal datatypes are a bit awkward. This is not so though, as all relation database temporal data types have quirks, to say the least, so it is appropriate to have a look at this too.

Let’s begin with Oracle where there is support for DATE and TIMESTAMP. Oracle also supports INTERVAL types. As for Oracle DATE that is the oldest Oracle temporal datatypes, and it is rather odd in a few ways. One such oddity is that although the type is called DATE and when querying it, by default you get a proper date back, it actually stores the TIME also, up to seconds. Which means that two field that looks like they have the same value using the default format, the comparison might still fail. Odd, to say the least.

As for SQL Server, things are messier still. Here a TIMESTAMP is actually a table attribute that works much like the way the first TIMESTAMP column in a MariaDB table works in that it keeps track of the last insert / update to the row. Then SQL Server has both DATETIME and a DATETIME2 datatypes, where the former has a limited date range. SQL Server also has a DATETIMEOFFSET which is pretty odd. I will not get into NULL handling with temporal data types in SQL Server and I will avoid giving you a headache. Also in SQL Server are DATE and TIME datatypes as well as a SMALLDATETIME, where the latter is a more compact DATETIME with no fractional seconds and again a limited range.

I have not gotten into the issue of how relational databases treat temporal datatypes with incorrect data and NULL values. Note that “incorrect” data when it comes to temporal data is a fuzzy subject. Handling leap years is no big deal, but maybe you haven’t heard about leap seconds? They are there to compensate for the earth slowing down it’s rotation and leap seconds are added now and then to compensate for this and we have one such coming up by the end of this year, the last minute of 2016 will have 61 seconds, so that 2016-12-31 23:59:60 is actually a valid time, something not recognized by either MariaDB, Oracle, SQL Server or for that matter Linux (at least where I tested it) whci all report this as an invalid time specification. If someone asks you “how many seconds are there in a day” your answer should be, in true engineering fashion, “it depends” and if you write code assuming there are always 86400 (24 * 60 * 60) seconds in a day, you might be making a mistake, depending on the situation.

Another situation is with Financial Services where domain-specific calendars are used in some cases, like a 360-day calendar where each year is considered to have 12 months of 30 days each. This is used for example with interest rate calculations, which is why every monthly mortgage payment of your house is the same amount, despite the fact that some months are shorter and other longer in your calendar (but not in the calendar used by your bank).

SQL Standard Temporal Types

In the SQL Standard, let’s assume SQL-99, there are three temporal data types: TIMESTAMP, DATE and TIME. The Oracle TIMESTAMP that was added in Oracle 9 are reasonably well in line with Standard SQL-99.

The SQL Standard TIME and TIMESTAMP datatypes has a number of attributes, namely a precision, in terms of fractional seconds, and if TIME ZONES are used or not. Which brings us to the issue of Time Zones, are an issue they are, but it would probably be even worse without them.

Time Zones and Trains

I guess you are wondering what trains have to do with all this and with relational databases in particular? Well trains are nice and fun and the same goes for relational databases, right? Or maybe not. No, there is an aspect of temporal data that is related to trains. Let’s go back a few years in time to when trains were all new and hot, which is around year 1996. No wait, that was Netscape. We are talking about is the mid 1800s when trains caused people to travel a lot more and to travel much longer distances. There were no time zones though, so noon was different for every station on the railroad, which was when the sun was in zenith at that particular station. This meant that if you traveled for 1 hour you might end up 53 minutes away from your origin.

In 1884, it was determined that we should all have one meridian (the one in Greenwich) and that we should have 24 time zones spread around the globe. And this would be a world standard and as such enforced, and you better follow this, or else…

As we all know standards are universally accepted more or less always and they are also backward compatible. And this is why the SQL standard works so that all SQL databases can talk to each other, all SCSI connectors (remember SCSI? I do, but then I’m an old fart) fit any other SCSI connector and for a more modern example of a truly successful standard, all HDMI cables and connectors works with all other HDMI cables, connectors, screens, players and what have you not.

This explains why the good intentions of having 24 time zones 1 hour apart around the globe isn’t really followed. In particular India and Australia screw things up with 15, 30 and 45 minute time zones.

MariaDB Temporal Datatypes and Time Zones In Practice

And what has this got to do with relational databases you ask? Well there is one difference between TIMESTAMP and DATETIME in MariaDB that we haven’t mentioned so far, and that is timezone support. The MariaDB TIMESTAMP data type supports time zones, which the DATETIME datatypes does not. What does this mean then? Well, it means that a TIMESTAMP data values is stored together with the timezone of the value. To explain what this means, let’s look at an example, first we need a table to insert some data into for our tests:

MariaDB> CREATE TABLE temporaltest (timestamp_t timestamp,

datetime_t datetime);

Before moving on from this, let’s look at how MariaDB interprets this table definition:

As you can see, MariaDB adds a few things to the simple TIMESTAMP column, like a NOT NULL clause that we didn’t specify and a default value that we didn’t ask for. This is for backward compatibility to align more recent MariaDB TIMESTAMP column semantics with how a TIMESTAMP used to work way back during the Harding administration. Before we move on, let’s set the timezone of your server. The default for MariaDB is to use the timezone as defined by the host operating system, but in many cases that is not a good idea in production use. To be able to set the timezone with MariaDB, we first have to import timezone information into MariaDB, and that is done by running this from the command line:

$ mysql_tzinfo_to_sql /usr/share/zoneinfo | mysql -u root mysql -p

This assumes that the zoneinfo file is at the default location, if you have installed MariaDB from a tarball it will be somewhere in that path. With that in place let’s set the timezone:

MariaDB> SHOW VARIABLES LIKE 'time_zone';

+---------------+--------+

| Variable_name | Value |

+---------------+--------+

| time_zone | SYSTEM |

+---------------+--------+

1 row in set (0.00 sec)

MariaDB> SET time_zone = 'PST8PDT';

Query OK, 0 rows affected (0.00 sec)

So now we are running with the PST timezone instead of the one defined by the operating system. So far so good. Let’s then insert some data into the table we created above:

OK, that looks fine, right? Now, let’s assume that we move this server to the east coast and set the timezone as appropriate for that and select the above data again:

MariaDB> SET time_zone = 'EST';

Query OK, 0 rows affected (0.00 sec)

MariaDB> SELECT * FROM temporaltest;

+---------------------+---------------------+

| timestamp_t | datetime_t |

+---------------------+---------------------+

| 2015-03-07 21:05:00 | 2015-03-08 02:05:00 |

+---------------------+---------------------+

1 row in set (0.00 sec)

As you can see, with a different timezone we get data back adjusted to the timezone different for the TIMESTAMP column, but not for the DATETIME column. That does make a difference, right? If you run with MariaDB clients in different time zones, all those clients may well insert data using different time zones! If the server and the client are in different time zones, the data is converted to the timezone of the server and when retrieving data it is converted back to that of the client. Seems fair, right.

Maybe this thing with time zones wasn’t such a bad and difficult thing after all? Yes, it can be handled, but that was before Germany was about to run out of electricity and in an attempt to fix that caused years of suffering to cows and programmers across the globe. That story will be told in the next part of this series of blogs though, so don’t touch that dial, I’ll be right back.

Timezone mess

Thanks for the explanation of these data types. There is one thing you may have missed here: the timezone the server is configured in is only initialized on startup of mysql/mariadb, so changing that will not affect it immediately. We found this out when we had a problem that got dumped on our plate at my previous employer. Some sysop switched the timezones on all servers from Amsterdam/Europe to UTC, and as there was no restart necessary nothing happened. That meant that the database was in a different timezone than the server. So when we did a failover for maintenance (on a freshly rebooted server), this changed the behavior of the database to read already stored datetime values differently (as they are stored with UTC offset) than the timestamp values (as they are stored "as is"). This made the data in the database unreliable. We would have been able to alter the data with the failover switch as the cut-off point, however the schema and application itself were a timezone mess as well: timestamps and datetime used alongside each other in the same tables or joined tables, and then used for calculations. For input/output the application relied on the php date and gmdate functions (e.g. whenever a developer had clue he/she used gmdate, otherwise just date) and the application did not have any timezone selection for the input. Since the timezone change was applied to all servers, also php had landed in a different timezone. This meant that making date/time calculations were unreliable and fixing the data next to impossible, as we would not determine what the date/time entered was supposed to be in UTC.
What did we do in the end? I tried to explain the developer we should change all timestamp columns to datetime, and start using gmdate in the whole application. The developer agreed, but the only change he did was the runtime timezone of php from UTC to Amsterdam/Europe, because he believed that would solve all his problems.

I don't think MariaDB implementation is all that wrong

I haven't looked at Postgres, but maybe I should. That said, I don't think either MariaDB or Oracle gets it that wrong, rather my view is that dealing with timezones is reasonable once you understand it (disregarding the 30 and 45 minute times zones). But I'll have a look at Postgres and see it it adds something that I miss with the MariaDB implementation, something that we might create a MariaDB feature request from. I have a follow up blog post coming up soon that gets into the issue of Daylight Savings Time, where MariaDB also get's it right I think, but it needs to be understood and it's not MariaDB that I have an issue with but DST.