Info on ISO 8601, the date and time representation standard

This document gives a short description of
ISO 8601, the date and time rep­re­sen­ta­tion standard.
It also presents some arguments why it should be applied,
especially in Web authoring.
Sample codes are given for printing date and time in ISO 8601
format in some programming languages.
Links to more detailed technical resources are given.
This document
recommends the following simple format
for dates:
1998-05-12 (year-month-day)
and the following format for combined date and time
in international contexts:
1998-05-12T10:20Z
though it may improve readability to replace the letter T by a space.

Background: some problems

There are different date and time formats in use
in different parts of the world and in different contexts.
There are two major practical problems:

A date designation such as 5/6/98 is
ambiguous: which of the first two numbers is the month and which
is the day?
It is interpreted as the 5th of June, 1998, in most countries.
However, in the United States it is generally interpreted
as the 6th of May, 1998. This has caused a lot of confusion.

Date designations that do not explicitly specify the
century
can cause serious problems in the 21st century. This
"millennium problem" or "year 2000 problem" or "Y2K"
is definitely not over yet. On the contrary, now that we are
in the 21st century, we face the problem in everyday life.

These problems are combined in notations like
01/02/03.
Does it mean 1st of February, 2003, or 2nd of January, 2003,
or 2nd of March, 2001, or what?
(In some notations, the year precedes the date.)
If a product has the text "Use before 03/06/09" without explanation,
what do you do?

Practical problems are also caused by the ambiguity of
time designations:

Does 6:00 refer to six o'clock
in the morning or six o'clock in the evening, i.e. 18:00? Adding
"AM" or "PM" may help, but that would introduce a language-dependent
feature into a notation which is essentially numeric and therefore
language-neutral.

What is the frame of reference for time designations? Especially
on the Internet, people often refer to times without realizing
that there are different
time zones in use.

Although some of these problems could be rather easily
solved
by special solutions
in special cases, it is evident that a uniform and
universal date and time representation format is needed.
For example,
in a monolingual context, the first problem (ambiguity
of notations like 5/6/98) could be solved
by writing the month as a name, not with digits.
But this too would introduce an unnecessary language
dependency. For example, on a multilingual Web page,
one would certainly like to have the "last updated" date
expressed just once, in a language-neutral notation.

As an example, consider the following system message
which is bilingual for a good reason
(from real life, but abbreviated here, and with typos fixed):

So the time is expressed in three ways.
Users have been observed to get easily confused in such
situations. The more you try to explain things in different ways,
the more probable it is that the ways get mixed up.
Moreover, it all becomes too long for short
announcements, headlines, etc.
By switching to simple ISO 8601 notations
things could
be expressed briefly and uniquely:

Internet-yhteydet poikki, Internet connections off:
1998-11-23T17/20

During a transitional period we would, of course, need to
accompany such information with text (in "finer print" when
applicable) which expresses the time period in older,
language dependent notations.

Automatic processing of data is easier to program if
dates and times are expressed in a uniform, preferably fixed-length
notation. The format should allow simple comparison and sorting
of dates and times, which means that the notation should be
either fully descending (with the most significant part, such
as the year, expressed first, the the next significant part, such
as the month, etc, up to seconds and parts of a secord) or fully
ascending (just the opposite).
It should be noted that such uniformity would be
most beneficial
for small, tool-like programs, typically created by private persons or
small companies. In a large project by a large software vendor,
the cost of code for handling a wide variety of date and time
formats is relatively small (although perhaps absolutely large).

On the Internet, the notation of times and
dates has always been problematic.
In particular,
the format of Internet E-mail messages, as defined in 1982-08-13
(with some later modifications)
by
RFC 822 remained valid
(which is still valid
as an Internet standard) for a very long time.
specifies a relatively uniform notation for date and time.
It allowed some variation, but the most common alternative was
something likeFri, 8 May 1998 15:57:33 +0300 (EET DST)
There was enough variation to make it difficult to write simple
programs for processing such data, too little variation to please
everyone. In 2001-04,
RFC 2822
was published as a successor to RFC 822. It restricted the
recommended date and time formats to the format exemplified above.
Note that
this format is hardly used
outside the Internet.

In addition, different programs use date and time formats differing
from the one specified in RFC 822 and RFC 2822.
To illustrate the diversity, let us take a look at the
Proposed Standard
RFC 2068;
in the discussion of time and date formats, it says:

HTTP applications have historically allowed three different formats
for the representation of date/time stamps:

The first format is preferred as an Internet standard and represents
a fixed-length subset of that defined by RFC 1123 (an update to RFC
822). The second format is in common use, but is based on the
obsolete RFC 850 date format and lacks a four-digit year.

How ISO 8601 can be used to address the problems

The ISO 8601 standard, or most officially
ISO 8601:2004
Data elements and interchange formats -- Information interchange -- Representation of dates and times, approved by
ISO in 1988, updated in 2000,
again in 2004,
defines a large number of alternative representation of dates,
times, and time intervals.
Thus, rather than the date and time standard, it is
just a general framework.
To achieve uniformity, we must select one or a few formats from
it and apply them consistently.

Luckily, it seems that people who know about ISO 8601 usually stick
to the same simple alternatives.
The following is an
attempt to describe "best current practice"
(in the informal sense of this phrase):

Date only format

Use format like
1998-05-12, always expressing the year in full,
followed
by the month and then the day.
Thus, the example means the 12th of May in 1998.
Use exactly two digits for the
month and exactly two digits for the day, using leading
zeros when necessary.
Notice that there is no time zone indication, although dates too
are time zone dependent in principle; by default, times are relative
to some local time zone. If this is of some concern for dates
(i.e. you need to be very exact with them), you could express the
date in UTC and append a Z to the date designation to
indicate this. But in such cases, the combined date and the format
is probably preferable (see below).

Time of the day only format, local (national) use

Use format like
14:15
or
14:15:00,
always expressing hours and minutes and seconds (if present)
each with exactly two digits.
Express the time as local time in the time zone implied by
the context.
But whenever there is any possibility of misunderstanding
what the time zone is, use the next option:

Time of the day only format, international use

Use format like
14:15Z
or
14:15:00Z,
always expressing hours and minutes and seconds (if present)
each with exactly two digits.
Express the time as Universal Time Coordinated
(UTC, formerly
called Greenwich Mean Time, GMT); the appended Z
letter indicates that the time is represented in UTC.
Alternatively, use a local time with explicit zone designation
as explained in the next item.

Time of the day only format, explicit zone

Append a zone designation in one of the formats
+hh:mm, +hhmm, and +hh to a time denotation
to indicate that the used local time zone is
hh hours and mm minutes ahead of UTC.
Examples: 12:15+02:00, 12:15+0200, 12:15+02.
Select one of the formats and stick to it within a document.
This format is suitable when the time zone may be relevant.
An alphabetic time zone designation might be even in parentheses, e.g.
12:15+02 (EET), but it is not sufficient alone. There is no
standard on such designations, and the same string is used for
different zones.

Combined date and time format

Use a format where the date designation is followed by the
letter T and the time of the day designation, e.g.
1998-05-12T14:15Z.
Note that the standard clearly requires the use of T
in this context. However, such a notation is often regarded as
odd-looking, and people who otherwise use ISO 8601 might deviate
from it here by using a space instead.

Period of time format

Use a format where an indication of the start of the period
is followed by the slash (solidus) character / and
an indication of the end of the period.
Of course, one of the formats mentioned above is used for the
start and end.
However, to allow reasonably short expressions,
higher order components of the end designation can be omitted,
in which case the
corresponding values from the start designation are used.
Examples:1998-05-12T14:15Z/1998-05-13T16:00Z (time interval
extending from one day to another)1998-05-12T14:15Z/16:00Z (time interval within a day)1998-05-12/15 (time interval from the 12th to 15th of May,
1998).

Note: This is compatible with the
format described in the
Dates and times subsection of the
HTML 4.01 Specification
for use with certain HTML constructs.
However, the format specified there is stricter in the sense
that only the combined date and time format is allowed and it
must contain the seconds part, but more permissive in the sense
that is allows other time zones than UTC, too.

For periods of time, notations such as
1980-85 have often been used. Even if you use an en dash (–) instead
of a hyphen and/or surround that punctuation with spaces, there
misunderstandings may arise. According to
ISO 8601, a notation like 2000-02 uniquely means the second
month of year 2000, so it is risky to use it, or any similar notation,
to denote years from 2000 to 2002. Using
2000/2002 would confirm to the ISO 8601 standard,
but it could easily be misunderstood as meaning
"2000 or 2002". Thus, it is perhaps best to use a
horizontal ellipsis (or, as a replacement, three consecutive dots)
or the en dash, with the year written in four digits:
2000…2002 or 2000–2002.
Writing the year in full would probably remove the
possibility of misunderstanding when using the en dash or even
when using a hyphen as a replacement for an en dash
(2000-2002). But these notations do not conform to ISO 8601.
It specifies that the slash (solidus) "/" is used as the separator, with the
following somewhat vague note:
"In certain application areas a double hyphen is used as a separator instead of a solidus.".
(Notations like 2000--2002 were promoted by previous versions of the standard.)

The ISO 8601 standard does not specify whether a date or time
(or date and time) designation refers to a singular point
in time or a time period.
In particular, a designation of a date can be used to refer to
a full 24-hour day or a specific moment of time within it, probably
by default the start of the day (00:00). Similarly, a time notation
like 9:00 could refer to nine o'clock absolutely sharp or
the period from 09:00 to 09:01 or anything else.
When necessary, a specific agreement or verbal indication of the
meaning can be given, or the most explicit notation with ISO 8601
could be used. For example, one could write 09:00:00 or
09:00:00/09:01:00 to distinguish between the two interpretations
mentioned above.

Within the European standardization organization
CEN, a so-called
CEN Workshop
Agreement (CWA)
on various notations has been prepared, and it
specifies:

For the date and time conventions, the following numeric forms are recommended to be used
in a language-independent, pan-European document.

Long date:

1996-04-28

Abbreviated date and time:

1996-04-28 17:22:06

Abbreviated long date:

1996-04-28

Numeric date:

1996-04-28

Time:

09:22:06

The 24 hour system is used in Europe. Thus the time of the day is given in the range from
00:00:00 to 23:59:59, and the possible leap second 23:59:60. No abbreviation is used for before
or after noon.

NOTE The abbreviated date and time is given as the combination of the date format
and the time format of ISO 8601; as opposed to the combined day-and-time format of
said standard, which includes a "T" between day and time.

There was an European standard, EN 28601, with the same content
as ISO 8601. It has however been withdrawn.
Members of CEN are thus no longer required to have national standards
on this issue.

In modern approach to localization, data is internally stored and
processed in a neutral format as far as possible. If localization is
desired, such as the presentation of data in a particular language
or notation, it is performed as close to the user as possible.
This makes it possible to apply user-selected presentation principles.
The approach is described in quite some detail in the
Common Locale
Data Repository (CLDR) material.
Apparently, ISO 8601 is the suitable neutral format for
dates and times.

Notes on the separators

The basic separators used according to ISO 8601 are the
hyphen "-"
in a date and the colon ":"
in a time designation.

The ISO 8601 standard allows these separators
to be omitted (e.g., 19980512 for a date),
but expressions are much easier to read when separators
are used. The separators also make it more obvious that a date or time
is given; a string of digits could mean different things.

The separators can be omitted in internal data formats that are
never visible to users. Sometimes they need to be omitted due to
technical restrictions or special considerations.
For example, if you use file names that correspond to dates
(e.g. in news archives), a name like 19980512.html is probably
more convenient than 1998-05-12.html.

The standard distinguishes the hyphen from the minus sign as well
as hyphen-minus, often called ASCII hyphen.
(These concepts are explained in
the document Dashes and hyphens.)
However, it mentions that both
hyphen and minus may be mapped to hyphen-minus when the character repertoire
is limited, and this is common practice. Moreover,
programs that interpret date notations might expect to see hyphen-minus.
In principle, however,
U+2010 HYPHEN is the most appropriate character for use in ISO 8601
dates,
when available
(e.g., in text processing when using a font that contains it).

When ISO 8601 date notations are used in text (or in tables), there might a
risk of line break after a hyphen. Although that would not be strictly wrong,
it cannot be regarded as good presentation.
However, technically it would be incorrect (and often ineffective) to use
the non-breaking hyphen character.
Usually the problem needs to be handled at levels other than character level,
e.g. using markup
(see notes on preventing line breaks
on web pages).

Naturally, only the printf function call is affected
by the date format used. Notice the use of zero in the field designator
%02d to force the number to be written with exactly two
digits, using leading zero if needed.

As another example, here is
Perl code for getting the current date and time
and writing it in UTC:

In JavaScript programming,
we can expect currenty used browsers to support the
toISOString method, which yields an ISO 8601 conformant
notation. If you just need the date part, you can pick up
a substring
consisting of the first ten
characters (because in ISO 8601, the date part is of fixed length):

The following information
about clumsier solutions is preserved here mostly
for historical reasons:
In JavaScript,
there are various advanced date functions,
such as getFullYear. Previously they were not
supported by all JavaScript implementations, so it was safest
to use just the basic date functions and "do it yourself" (performing
Y2K corrections too).
Although this is probably irrelevant nowadays, here is
code that constructs an ISO 8601
conformant date notation into the value of the variable
dateString using a just old basic functions:

If you use the strftime function
(see Single UNIX®
Specification
for a description), the following format specification would be suitable:
"%Y-%m-%dT%H:%M:%SZ" to get both date and time.
This means that if you, as a Web author, use
Server Side Includes
(SSI), the following should cause the date and time
denotation
(corresponding to the moment when the server processes and sends
the document)
to be inserted in ISO 8601 format:

Language dependent notations

It is probably unnecessary to apply these notations in running
monolingual text, where language-dependent traditional
expressions with the month expressed with a word like
"the 4th of July"
or
"4. heinäkuuta"
can be used without problems.
But separate date designations, such as date of issue or date of
last update, and tabulated dates,
are best presented using the variant of
ISO 8601 outlined above.

The CLDR
(Common Locale Data Repository) activity, coordinated by the
Unicode Consortium, has defined a general-purpose formalism (a markup language, LDML)
for specifying formats of date and time representation.
It has also
collected voluminous information about
date and time formats in different locales (languages and
language variants)
represented in that formalism.
The general idea is that internally, in data structures
and binary files, dates and times should be represented in ISO 8601 format,
but externally, when displaying data to users, they should be formatted according
to the language of the context and ultimately according to
each user's preferences. Of course, the user's preference could be
ISO 8601, too.