TIP 173: Internationalisation and Refactoring of the 'clock' Command

Abstract

The [clock] command provides Tcl's fundamental facilities for
computing with dates and times. It has served Tcl faithfully since
7.6, but the computing world has advanced significantly in the decade
that it has been in service. This TIP proposes a (nearly entirely
compatible) reimplementation of [clock] that will allow for fewer
ambiguities on input, improved localisation, more portability, and
less exposure of platform-dependent bugs. A significantly greater
fraction of [clock] shall be implemented in Tcl than it is today,
and the code shall be refactored to use the ensemble mechanism
introducted for Tcl 8.5 (see [112]).

Rationale

There is an embarrassing number of open bugs and feature requests
against the [clock] command. As the maintainer of [clock], the
author of this TIP has also received a number of informal feature
requests that are not logged at SourceForge. Unfortunately, many of
the requested fixes and enhancements cannot be effectively addressed
with the current architecture of [clock].

Several users have requested additional input formats to [clock
scan], notably the full range of ISO8601 time formats (including
formats based on week number and day-of-week); year and
day-of-year; Apache "web log" dates and times; numeric dates
placing the month before the day; and localised names of months
and days of the week. Unfortunately, these formats simply cannot
be added in the current architecture of [clock scan]; in fact,
there are several outstanding bugs in [clock scan] (for example,
the parsing of numeric time zones east of Greenwich) that cannot
be fixed without breaking something else.

The fundamental issue is that [clock scan] is asked to process
input with too many ambiguities. An input token such as 2000,
for example, may be interpreted as a year, a time of day, or a
number ("now + 2000 seconds"). 1000 may (perhaps) not be a
year, but could be a time of day, a number, or a time zone.
Localisation would only make this problem worse. Without
additional guidance, there is, even in theory, no way to determine
whether 03-11-2004 represents the third of November or the
eleventh of March.

To solve this problem, a radical redesign of [clock scan] is
required; the programmer must be allowed to specify an expected
input format (or set of expected formats).

A side effect of such a redesign would be improved ease of
maintenance. The current [clock scan] is a YACC-derived parser;
the build process, however, runs a script on the output of YACC to
modify its memory management and alter its external symbol names to
make it compatible with Tcl's conventions. This script is fragile;
at present, it is known to work only with the version of YACC
distributed with Solaris.

There are a number of other issues with [clock scan] that could
be addressed at the same time with such a redesign. For instance,
there is a known problem at present that an input string that
specifies time and time zone but not date can return a time that is
one day too early or late; this problem arises because the existing
parser presumes the current local date when parsing such a
string, rather than the current date in the given time zone. The
problem is difficult to address because of the left-to-right nature
of the LALR(1) parser.

A few enhancements have been requested to [clock format]; most
notably, proper localization on all platforms. In addition, the
documentation of [clock format] is at best approximate, because
it depends on the strftime function in the Standard C Library.
This function differs among platforms, because the C standard, the
Posix standard, and the Single Unix Specification have gone
through evolution over time, and few platforms support all the
features of the current generation of any of them.

In addition, the Year 2038 bug looms large on the horizon. On most
32-bit platforms, time_t (used in the C library funtions) is a
32-bit count of seconds from 1 January 1970; dates beyond 2038
cannot be represented in this format.

The dependence on a complex library function such as strftime
introduces obscure platform-dependent bugs. Several open bugs in
[clock format], for instance, fail only on HP-UX, or only on
Windows.

Date formats have been requested (specifically, the Japanese civil
calendar) that are beyond the capabilities of the Standard C
Library functions.

[clock format] does not honor user preferences for date/time
format on Windows.

All of these concerns seem to indicate that our current dependency
upon vendor-supplied date and time manipulation routines is ill
advised. A single implementation that we control will make the
behavior consistent among platforms, allow the localisation to
follow Tcl's conventions, and let us lead rather than follow the
vendor in fixing bugs.

Server applications frequently require support of multiple locales
and multiple time zones within a single process, because they need
to parse input and format output according to the client's
environment. The current [clock] facilities either do not
support localization at all, or else support a change to locale
only by changing environment variables. This technique, once
again, exposes bugs in the vendor libraries. It also introduces
difficulties with thread safety; Tcl does not have a single
mechanism whereby the TZ and LC_TIME environment variables
are protected.

The only mechanism for performing calculations like "one month
after the current date" is [clock scan]. While this works well
in practice, using a parser to perform arithmetic seems somewhat
perverse.

Specification

The [clock] command shall be reimplemented as an ensemble [112],
with most of the subcommands implemented in Tcl. A minimal set of the
existing C code shall be refactored and placed inside a
::tcl::clock namespace. The existing subcommands seconds and
clicks shall be exposed. The existing scan shall be hidden
inside the namespace. [clock scan] and [clock format] shall be
reimplemented in Tcl. In addition, a new [clock add] command shall be added.

The syntax and semantics of the [clock clicks] and [clock seconds]
commands will remain unchanged.

clock scan

It accepts a character string representing a date and time and returns
the time that the string represents, expressed as a count of seconds
from the Posix epoch (1 January 1970, 0000 UTC).

If a -format option is not supplied, the scan is a free format
scan. The existing YACC parser for clock scan will be used to
interpret the input string. This form of the command is explicitly
deprecated because of the inherent ambiguities in interpreting the
input string. The free-format version of [clock scan] does not
accept -locale or -timezone options, since the legacy code
does not support multiple locales or time zones.

If the -format options is supplied, it is interpreted as a
specification for the expected input form. If the given string
matches the input form, it is converted to a count of seconds and
returned; otherwise, an error is thrown. See FORMATS below for a
discussion of the available format groups and their interpretation.

Extraction of the date from the input string is guided by what fields
are present in the format. The order of preference, from highest to
lowest, is:

{seconds from epoch}, {starDate}: Date fields that specify both date
and time take highest precedence. If format groups for these
fields appear multiple times, the rightmost takes precedence.

{Julian Day Number}: The Julian Day Number uniquely specifies a
calendar date.

{century, year, month, day of month}, {century, year, day of year}, {century, year, week of year, day of week}, {locale era, locale year, month, day of month}:
Formats with complete year are
preferred to formats with a two-digit year. For a two digit year,
the date range is constrained to lie between 1938 and 2037.

{year, month, day of month}, {year, day of year}, {year, week of year, day of week}, {year of locale era, month, day of month}:
Formats that specify the year are preferred to those that do not.

{month, day of month}, {day of year}, {week of year, day of week}:
Formats that specify a day within the year are preferred to those
that specify merely the day of week or day of month. Formats that
do not specify the year are presumed to designate the base year.

{day of month}, {day of week}: If none of the above rules apply, a
day of the month or day of the week standing alone is interpreted
as belonging to the base month or week.

None of the above: If no combination of fields that specifies a date
is found, the base date is used.

The time of day returned by [clock scan] is determined by the
presence of fields in the format string, in the following order of
preference.

{seconds from epoch, StarDate}: If either of these fields is present,
it uniquely determines date and time.

{am/pm indicator, hour am/pm, minute, second}, {hour, minute, second}:
Time with seconds is preferred to time without seconds.

{am/pm indicator, hour am/pm, minute}, {hour, minute}: Time can be
interpreted without the seconds.

{am/pm indicator, hour am/pm}, {hour}: Time can be expressed as an
hour alone, e.g.,

clock scan "6 pm" -format "%I %p"

None of the above: If none of the above indicators is present,
00:00:00 (the start of the day) in the given time zone is used.

In all of the foregoing discussion, the 'base date', 'base month',
'base week', and 'base year' refer to the day, month, week or year
designated by the -base parameter, which is a count of seconds
from the Posix epoch. If no -base parameter is supplied, the
current date is used as the base date. The year, month, week and day
are obtained by interpreting the base date in the time zone specified
by the date/time string. If the given format does not include a time
zone, then the base time is interpreted in the default time zone; see
TIME ZONES below for the way that the default time zone is
determined, and the interpretation of the -timezone and -gmt
options.

The locale is used to determine the spelling of native language words
such as the names of months, names of weekdays, am/pm indicators, and
locale eras. It is also used in the interpretation of the format
groups, '%X', '%x', and '%c'. In addition, the locale determines the
date at which the calendar in use changes from the Julian calendar to
the Gregorian. If no -locale parameter is supplied, the default
is to use the root locale. See LOCALISATION below for more
information.

clock format

It accepts a time, expressed in seconds from the Posix epoch of 1
January 1970, 00:00 UTC, and formats it according to the given format
string. See FORMATS below for a discussion of the available
format codes. If no format string is supplied, a default format, {%a
%b %d %H:%M:%S %Z %Y} is used.

The -timezone, -gmt, and -locale options are interpreted
as for [clock scan]. See TIME ZONES and LOCALISATION below
for how these options work.

clock add

It accepts a time, expressed in seconds from the Posix epoch of 1
January 1970, 00:00 UTC, and adds or subtracts units of time from it
according to the alternating count and unit parameters. Each
count must be a wide integer; each unit is one of the
following:

years year months month
weeks week days day
hours hour minutes minute seconds second

The command works by converting the given time to a calendar day and
time of day in the given locale and time zone. To that day and time
of day, it adds or subtracts the given offsets in sequence. It
reconverts the resulting time to a count of seconds, again using the
given locale and time zone, and returns that count of seconds.

There are subtle differences in many cases between adding seemingly
similar offsets. For instance, on the day before Daylight Saving Time
goes into effect, adding 24 hours will give "the time 24 hours from
the base time, irrespective of any clock change", while adding 1 day
will give "the time it will be at the same time of day on the
following day." Similarly, adding 1 month on 30 January will give
either 28 or 29 February. There are equally strange effects when
performing date/time arithmetic across the change between the Julian
and Gregorian calendars.

The -timezone, -gmt, and -locale options are used to
control the interpretation of the count of seconds as a calendar day
and time. Refer to TIME ZONES and LOCALIZATION below for a
fuller discussion.

Formats

The [clock scan] and [clock format] commands will be implemented
in Tcl, without depending on the local strftime and strptime
functions. For this reason, format groups will function identically
on all platforms. The format groups will be interpreted as follows.

%a: On output, receives the abbreviation for the day of the week in
the given locale. On input, matches the name of the day of the
week (in the given locale) in either abbreviated or full form,
and may be used to determine the calendar date.

%A: On output, receives the full name of the day of the week in the
given locale. On input, treated identically with %a.

%b: On output, receives the abbreviation for the name of the month in
the given locale. On input, matches the name of the month (in
the given locale) in either abbreviated or full form, and may be
used to determine the calendar date.

%B: On output, receives the full name of the month in the given
locale. On input, treated identically with %b.

%C: On output, receives the number of the century, in Indo-Arabic
numerals. On input, matches one or two digits, and accepts the
number of the century in Indo-Arabic numerals. May be used to
determine the calendar date.

%c: On output, produces a correct locale-dependent representation of
date and time of day. On input, matches whatever format %c
produces in the given locale, and may be used to determine
calendar date and time.

%d: On output, produces the number of the day of the month, in
Indo-Arabic numerals, with a leading zero. On input, matches one
or two digits, accepts the day of the month, and may be used to
determine calendar date.

%D: Synonymous with %m/%d/%Y. Should be used only in US locales.

%e: On output, produces the number of the day of the month, in
Indo-Arabic numerals, with no leading zero. On input, treated
identically with %d.

%Ec: On output, produces a locale-dependent representation of date
and time of day in the locale's alternative calendar. On input,
matches whatever %Ec produces, and may be used to determine
calendar date and time.

%EC: On output, produces the name of the current era in the locale's
alternative calendar. On input, accepts the name of the era in
the locale's alternative calendar, and may be used to determine
calendar date.

%Ex: On output, produces the calendar date in a locale-dependent
representation using the locale's alternative calendar and
alternative numerals. On input, accepts whatever %Ex produces
and may be used to determine calendar date.

%EX: On output, produces the time of day in the locale's alternative
representation. On input, accepts whatever %EX produces and may
be used to determine time of day.

%Ey: On output, produces the number of the current year relative to
the locale's current era %EC, expressed in the locale's
alternative numerals. On input, accepts the number of the year
relative to the current era in the locale's alternative
numerics, and may be used to determine calendar date.

%EY: On output, produces an unambiguous representation of the current
year in the locale's alternative calendar and alternative
numerals. This group is often synonymous with %EC%Ey. On
input, accepts whatever %EY produces and may be used to
determine calendar date.

%g: On output, produces the two-digit year number suitable for use
with the ISO8601 week number. On input, accepts a two-digit year
number, and may be used to determine calendar date if the %V
format group is also present.

%G: On output, produces the four-digit year number suitable for use
with the ISO8601 week number. On input, accepts a four-digit
year number, and may be used to determine calendar date if the %V
format group is also present.

%h: Synonymous with %b.

%H: On output, produces the two-digit hour of the day on a 24-hour
clock (00-24). On input, matches two digits, and may be used to
determine time of day.

%I: On output, produces the two-digit hour of the day on a 12-hour
clock (12-11). On input, matches two digits, and may be used to
determine time of day.

%j: On output, produces the three-digit number of the day of the
year. On input, matches three digits, and may be used to
determine the day of the year.

%J: On output, produces the number of the Julian Day Number beginning
at noon of the given date. The Julian Day Number is a
representation popular with astronomers; it is a count of days in
which Day 1 is 1 January, 4713 B.C.E., on the proleptic Julian
calendar; in this system, 1 January 2000 is Julian Day 2451545.
On input, matches any string of digits and interprets it as a
Julian Day; may be used to determine calendar date.

%k: On output, produces the number of the hour on a 24-hour clock
(0-24) without a leading zero. On input, matches one or two
digits and may be used to determine time of day.

%l: On output, produces the number of the hour on a 12-hour clock
(12-11) without a leading zero. On input, matches one or two
digits and may be used to determine time of day.

%m: On output, produces the number of the month (01-12), with exactly
two digits (using a leading zero if necessary). On input,
matches exactly two digits and may be used to determine calendar
date.

%M: On output, produces the number of the minute of the hour (00-59)
with exactly two digits (using a leading zero if necessary). On
input, matches exactly two digits and may be used to determine
time of day.

%N: On output, produces the number of the month, with no leading
zero. On input, matches one or two digits, and may be used to
determine time of day.

%Od, %Oe, %OH, %OI, %Ok, %Ol, %Om, %OM, %OS, %Ou, %ow, %Oy: All of
these format groups are synonymous with their counterparts
without the 'O', except that the string is produced and parsed in
the locale-dependent alternative numerals.

%p: On output, produces the indicator for 'a.m.', or 'p.m.'
appropriate for the given locale, converted to upper case. On
input, accepts whatever %p produces (in upper or lower case) and
may be used to determine time of day.

%P: On output, produces the indicator for 'a.m.', or 'p.m.'
appropriate for the given locale. On input, accepts whatever %p
produces (in upper or lower case) and may be used to determine
time of day.

%Q: On output, produces a StarDate. On input, accepts a StarDate and
may be used to determine calendar date and time of day.

%r: On output, produces a locale-dependent time of day representation
on a 12-hour clock. On input, accepts whatever %r produces and
may be used to determine time of day.

%R: On output, produces a locale-dependent time of day representation
on a 24-hour clock. On input, accepts whatever %R produces and
may be used to determine time of day.

%s: On output, produces a string of digits representing the count of
seconds since 1 January 1970, 00:00 UTC. On input, accepts a
string of digits and accepts it as such a count; may be used to
determine date and time of day.

%S: On output, produces a two-digit number of the second of the
minute (00-59). On input, accepts two digits. May be used to
determine time of day.

%t: On output, produces a TAB character. On input, matches a TAB
character.

%T: Synonymous with %H:%M:%S.

%u: On output, produces the number of the day of the week
(1-Monday,7-Sunday). On input, accepts a single digit. May be
used to determine calendar day.

%U: On output, produces the ordinal number of the week of the year
(00-53). The first Sunday of the year is the first day of week
01. On input accepts two digits which are otherwise ignored.
This format group is never used in determining an input date.

%V: On output, produces the number of the ISO8601 week as a two digit
number (01-53). Week 01 is the week containing January 4; or the
first week of the year containing at least 4 days; or the week
containing the first Thursday of the year (the three statements
are equivalent). Each week begins on a Monday. On input, accepts
the ISO8601 week number, and may be used to determine the
calendar day.

%w: On output, produces a week number (00-53) within the year; week
01 begins on the first Monday of the year. On input, accepts two
digits, which are otherwise ignored. This format group is
never used in determining an input date.

%x: On output, produces the date in a locale-dependent
representation. On input, accepts whatever %x produces and may
be used to determine calendar date.

%X: On output, produces the time of day in a locale-dependent
representation. On input, accepts whatever %X produces and may
be used to determine time of day.

%y: On output, produces the two-digit year of the century. On input,
accepts two digits, and may be used to determine calendar date.
Note that %y does not yield a year appropriate for use with the
ISO8601 week number %V; programs should use %g for that purpose.

%Y: On output, produces the four-digit calendar year. On input,
accepts four digits and may be used to determine calendar date.
Note that %Y does not yield a year appropriate for use with the
ISO8601 week number %V; programs should use %G for that purpose.

%z: On output, produces the current time zone, expressed in hours and
minutes east (+hhmm) or west (-hhmm) of Greenwich. On input,
accepts a time zone specifier (see TIME ZONES below) that
will be used to determine the time zone.

%Z: On output, produces the current time zone's name, possibly
translated to the given locale. On input, accepts a time zone
specifier (see TIME ZONES below) that will be used to
determine the time zone. This option should, in general, be
used on input only when parsing RFC822 dates. Other uses are
fraught with ambiguity; for instance, the string BST may
represent British Summer Time or Brazilian Standard Time.
It is recommended that date/time strings for use by computers use
numeric time zones instead.

Time Zones

There are several ways that a time zone may be specified for use with
[clock scan], [clock format] and [clock add]. In order of preference:

The time zone may appear in the input string matched by a %z or %Z
format group in [clock scan]. These format groups match time
zones in the forms +hhmm, +hhmmss, -hhmm, -hhmmss, and alphanumeric
strings. The numeric representations are self explanatory; an
alphanumeric string must be the one of:

or a single letter other than J. Generally speaking, numeric time
zones should be preferred for communication among computers; the
alphanumeric time zones are provided primarily for the parsing of
legacy RFC822 time stamps.

The time zone may appear in the -timezone argument to the
[clock] command, or may be implied by the presence of -gmt 1.
It is an error to use -timezone and -gmt in the same
call. The -gmt 1 option may be regarded as an obsolete
synonym of -timezone :UTC.

The time zone may appear in the environment variable, TCL_TZ.

The time zone may appear in the environment variable, TZ.

Failing all of these, on Windows systems, the time zone will be
obtained from the Registry.

As a last resort, the time zone is set to ':localtime'.

Once the time zone is obtained by one of these means, it is
interpreted as follows:

":localtime": This specifier requests that the C library functions
localtime() and mktime() be used whenever converting times
between local and Greenwich. It is generally used as a last resort
if the time zone can be determined in no other way.

"+hhmm", "+hhmmss", "-hhmm", "-hhmmss": These specifiers give the
time zone explicitly in terms of hours, minutes and seconds east
(+) or west (-) of Greenwich.

":filename": The given file name is interpreted as a path name
relative to [info library]/tzdata, and the specified file is
loaded as a Tcl script. The script is expected to set the
:filename element in the tzdata array to a list of
transitions. Each transition is a four-element list comprising:

* the time at which the transition takes place, expressed in
seconds from the Posix Epoch (1 January 1970, 00:00 UTC)

* the offset (in seconds east of Greenwich) to apply.

* an indicator (0=Standard Time, 1=Daylight Saving Time)

* the name to use when displaying the given time zone in the root
locale.

The first transition is expected to take place at time
-9223372036854775808, the smallest value of a wide integer.

Any other string is processed by prefixing a colon and attempting to
load the given file, as shown above.

Localisation

The [clock] command is localised by a set of message catalogs
located in [file join [info library] clock msgs] and loaded into
the namespace, ::tcl::clock. The possible strings to be translated
include:

AM: The string that identifies ante meridiem times when
expressing a time of day in the given locale. This string has
the value, {am} in the root locale.

BCE: The string that identifies dates before the Common Era in the
given locale. This string has the value, {B.C.E.} in the root
locale. Those localising this string should be aware that,
depending on local culture, a name such as "B.C." (before
Christ) may be offensive.

CE: The string that identifies dates of the Common Era in the given
locale. This string has the value, {C.E.} in the root locale.
Those localising this string should be aware that, depending on
local culture, a name such as "A.D." (Latin, anno Domini,
"in the year of Our Lord") may be offensive.

DATE_FORMAT: The format specifier for calendar dates in the given
locale. In the root locale, %m/%d/%Y is used for compatibility
with earlier versions of the [clock] command, even though
%Y-%m-%d would probably be preferable.

DATE_TIME_FORMAT: The format specifier for combined date and time in
the given locale. In the root locale, {%a %b %e %H:%M:%S %Y} is
used for compatibility with earlier versions of the [clock]
command, even though %Y-%m-%dT%H:%M:%S would be preferable.

DAYS_OF_WEEK_ABBREV: Abbreviations of the days of the week in the
given locale. In the root locale, this string has the value,
{Sun Mon Tue Wed Thu Fri Sat}. In any locale, this string
is expected to represent a valid Tcl list.

DAYS_OF_WEEK_FULL: Full names of the days of the week in the given
locale. In the root locale, this string has the value, {Sunday
Monday Tuesday Wednesday Thursday Friday Saturday}.
In any locale, this string is expected to represent a valid
Tcl list.

GREGORIAN_CHANGE_DATE: The date on which the change from the Julian
to the Gregorian calendar takes place, expressed as a Julian Day
Number. In the root locale, this string has the value,
{2299161}, corresponding to 15 October 1582 New Style. In the
'en' locale, this value is {2361222}, 14 September 1752 New
Style.

LOCALE_DATE_FORMAT: The format to use when formatting dates in the
locale's alternative calendar. In the root locale,
LOCALE_DATE_FORMAT is %x, which causes formatting without
alternative numerals.

LOCALE_DATE_TIME_FORMAT: The format to use when formatting date/time
strings in the locale's alternative calendar. In the root locale,
LOCALE_DATE_TIME_FORMAT is %Ex %EX, which causes concatenation
of the locale's format for date, a space character, and the
locale's format for time.

LOCALE_ERAS: In a locale where a calendar with multiple eras is in
use, gives a list of triples. The first element of each triple
is the time (in seconds from the Posix epoch of 1 January 1970,
00:00 UTC) at which the era begins; the second is the name of the
era, and the third is a constant offset to be subtracted from the
Gregorian year to give the year of the era.
In any locale, this string is expected to represent a valid
Tcl list.

LOCALE_NUMERALS: In a locale where alternative numerals may be used,
gives a list containing the numerals that represent the numbers
from zero to ninety-nine. Note that these numerals are the ones
typically used on calendars, not the ones that represent
currencies or quantities. For instance, in a Han locale, the
number twenty-one is represented by \u5eff\u4e00, not by
\u4e8c\u5341\u4e00.
In any locale, this string is expected to represent a valid
Tcl list.

LOCALE_TIME_FORMAT: The time format to use when formatting a time of
day using a locale's alternative numerals. In the root locale,
this string is %X, which causes formatting without alternative
numerals.

LOCALE_YEAR_FORMAT: The time format to use when formatting a year in
the locale's alternative calendar. In the root locale, this
string is %Y.

MONTHS_ABBREV: Abbreviated names of the months in the given locale.
In the root locale, consists of three-letter abbreviations for
the English months: Jan-Dec.
In any locale, this string is expected to represent a valid
Tcl list.

MONTHS_FULL: Full names of the months in the given locale. In the
root locale, consists of the names of the English months in order
from 'January' to 'December'.
In any locale, this string is expected to represent a valid
Tcl list.

PM: The string that identifies post meridiem times when
expressing a time of day in the given locale. This string has
the value, {pm} in the root locale.

TIME_FORMAT: String that specifies the default time format in the
given locale. In the root locale, this string is {%H:%M:%S}

TIME_FORMAT_12: String that formats time on a 12-hour clock in the
given locale. In the root locale, this string is {%I:%M:%S %p}.

TIME_FORMAT_24: String that formats time on a 24-hour clock in the
given locale. In the root locale, this string is {%H:%M}.

There is a defined order for substitution of locale strings, which
constrains the format groups that can appear in the _FORMAT strings.
Specifically:

DATE_TIME_FORMAT and LOCALE_DATE_TIME_FORMAT may contain any
format groups other than %c and %Ec.

LOCALE_DATE_FORMAT and LOCALE_TIME_FORMAT may not contain
%c, %Ec, %Ex, or %EX.

DATE_FORMAT and TIME_FORMAT may not contain %c, %Ec,
%x, %Ex, %X, or %EX.

In addition to the standard locales, two special locales may appear on
the -locale parameter; current, which designates the result of
evaluating [mclocale], and system, which designates the current
"system" locale, which is determined by (in order of preference):

the date/time format settings on the Windows control panel

the environment variable LC_TIME

the current locale from [mclocale].

Build System

Several tools are provided for the use of maintainers:

loadICU.tcl:
Given a distribution of IBM's icu4chttp://oss.software.ibm.com/icu/index.html ,
this program analyzes the source code of the message catalogs and
extracts appropriate Tcl-based messages for the date and time
formats in the supported locales.

loadtzif.tcl:
Given a time zone information file used by the Olson version of
'tzset' (for a description, see the latest 'tzcode' file in
[ftp://elsie.nci.nih.gov/pub/]), creates the corresponding Tcl
'tzdata' file.

makeTestCases.tcl:
Makes several thousand auto-generated test cases to exercise
the time conversion algorithms.

tclZIC.tcl:
Given the source code for the Olson time zone descriptions
(obtainable as the latest 'tzdata' file in
[ftp://elsie.nci.nih.gov/pub/]), creates the full set of Tcl
'tzdata' files.

Since these tools depend on third party source, they will not be
included in the usual build steps; instead, maintainers will be
expected to run them whenever changing files on which they depend. It
will be a good practice to update the ICU and Olson files just before
cutting a release.

Reference Implementation

The implementation of a refactored [clock] command is a work
in progress, and interested developers are urged to contact the
TIP author if they want to help with implementation, documentation,
or testing. The code is available in the same SourceForge
repository as the Tcl core, and Tcl maintainers can obtain it
with

cvs -d:ext:USER@cvs.sf.net:/cvsroot/tcl co newclock

Notes on the cost of implementation

Since it is well known that Tcl code is typically 30-50 times slower
than the equivalent C, it is to be expected that [clock scan],
[clock format], and [clock add]
will be in that performance range. [clock seconds] and
[clock clicks] will still be C code and are not expected to
suffer a measurable change in performance. (If they do, the
implementors plan to address the issue.)

The cost of the time zone data files and the message catalogs
is not trivial; they occupy about 1.6 megabytes exclusive of file
system fragmentation and may occupy multiple megabytes depending
on the minimum size of a file. The implementors assume (and are
working to ensure) that some sort of compressed virtual file system
will be available as core functionality in the 8.5 final release.
With zlib compression, the message catalogs and time zone data total
less than half a megabyte. It is worth noting that a distribution
that must run in the absolute minimum space may omit both message
catalogs and time zone data; if this is done, named time zones
(e.g., :America/New_York) will not be available on systems such
as Windows that lack 'zoneinfo', and will suffer from Y2038
bugs on systems such as Solaris and Linux that have 'zoneinfo'.
Without the message catalogs, the only
supported locale will be the root locale (and on Windows, the 'system' locale). This combination
provides functionality comparable to the [clock] command prior
to this TIP. The Tcl code that implements [clock] is less than
eighty kilobytes with comments and blank lines removed; this
amount of overhead is thought to be negligible.

Bugs

The reference implementation does not attempt any calendars not based
on the hybrid Julian/Gregorian calendar. This implementation is
adequate for the Western countries and for the Japanese civil
calendar, but does not address the Hijri, Hebraic, Thai, Chinese or
Korean calendars. (No Tcl user has requested these, to the best of the
knowledge of the author of this TIP.)

The Gregorian change date is not supplied in most locales.

Localisation in most locales was done by an American who is probably
excessively ignorant in such matters.

Copyright

Acknowledgments

The author of this TIP wishes to thank all the Tcl'ers who have
taken the time to read and comment on it, most notably Joe English,
Donal K. Fellows, Jeff Hobbs, Arjen Markus, Reinhard Max,
Christopher Nelson, Donald G. Porter,
Pascal Scheffers, and Peter da Silva.