std-unix@ut-sally.UUCP (Moderator, John Quarterman) (01/02/86)
Date: Wed, 1 Jan 86 18:39:26 est
From: seismo!cbpavo.cbosgd.ATT.UUCP!mark@sally.UTEXAS.EDU (Mark Horton)
The latest draft asked for input about time zones.  I'd like to
make a few comments.
There are two basic ways that time zones are done today.  There
is the V7 method (where the zone offset is in the kernel) and the
System III method (where the zone offset and name is in an environment
variable.)  4.2BSD descends from V7 (although it has a fancier "offset
to zone name" conversion algorithm that knows more about the world)
and System V descends from System III.
There are elegance considerations (e.g. should the kernel know or
care about time zones?) and efficiency considerations (e.g. is it
faster to look in the environment, do a system call, or read a file.)
But both of these are dwarfed by a much more important consideration:
does it work properly in all cases?  I claim that neither method is
correct all the time, but the V7 method is right more often than the
System III method.
In V7, when you configure your kernel, you tell it the zone offset,
in minutes, from GMT, and whether you observe Daylight time.  These
numbers are stored in the kernel, which doesn't do anything with them
except return them when an ftime system call asks for them.  So, in
effect, they are in the kernel for administrative convenience.  (You
don't have to open a file, and they aren't compiled into ctime, so it
isn't necessary to relink every program that calls ctime or localtime
when you install a new system.)  The smarts that generate the time zone
name and decide if Daylight time is in effect are in ctime in user code.
(By comparison, in TOPS 20, the equivalent of ctime is a system call,
with lots of options for the format.  This may seem inelegant, but it
results in only one copy of the code on the system, even without shared
libraries.)
In System III, the kernel doesn't know about time zones at all.  The
environment variable TZ stores both the zone names and the zone offset,
and ctime looks there.  This makes things very flexible - no assumptions
about 3 character zone names, and it's easy for a person dialing in from
a different time zone to run in their own time zone.  Also, it's very
efficient to look in the environment - faster than a system call.
However, there are some very serious problems with the System III method.
One problem is that, with an offset measured in hours, places like
Newfoundland, Central Australia, and Saudi Arabia, which don't have
even hour time zones, are in trouble.  But that's easy to fix in the standard.
The time zone is configured into the system by modifying /etc/profile,
which is a shell script of startup commands run by the Bourne shell when
any user logs in, much like .profile or .login.  This means that we assume
everybody is using the Bourne shell.  This is a false assumption - one of
the documented features of UNIX ever since V6 is that the shell is an
ordinary program, and a user can have any shell they like.  In particular,
the Berkeley C shell does not read /etc/profile, so all csh users don't get
a TZ variable set up for them in every System III or V UNIX I've ever used.
Well, after all, the Bourne shell is the standard shell, and everybody should
use the standard shell, right?  After all, the new Korn shell reads and
understands /etc/profile.
Even if we believe the above statement and ignore the documented feature
of being able to put your favorite shell in the passwd file (and I don't)
we still get into trouble.  For example, uucp has a special shell: uucico.
It doesn't read /etc/profile.  And what about programs that don't get run
from a login shell?  For example, all the background daemons run out of
/etc/rc?  Or programs run from cron?  Or from at?  Or programs run while
single user?  None of these programs get a TZ.
Does it matter if a non-interactive program is in the wrong time zone?
After all, the files it touches are touched in GMT.  The answer is yes:
background processes generally keep logs, and the logs have time stamps.
For example, uucico gets started hourly out of crontab, and this means
that almost any uucico on the system (from crontab or from another system
logging in) will run in the wrong time zone.  Since L.sys has restrictions
on the times it can dial out, being in the wrong time zone can cause calls
to be placed during the day, even if this is supposedly forbidden in L.sys.
Also, of course, things like "date > /dev/console" every hour from crontab
will have problems.
It turns out that System III has a "default time zone" which is used by
localtime if there is no TZ variable.  On every version of System III or
V I've ever used, this is set to the time zone of the developer.  It's
EST in the traditional versions from AT&T.  It's PST in Xenix.  So the
developers of the system never see any problems - uucp logs are right,
for example, and csh users still get the right time.  Until they ship
the system out of their time zone.
This problem isn't really that hard to fix.  You just have init open
a configuration file when it starts up, and set its own environment
from there.  (If you're willing to have init open files that early.)
But it turns out there is an even more serious problem with the TZ
environment variable method.  It's a security problem.  Let's say the
system administrator has configured UUCP to only call out when the
phone rates are low, because the site is poor and can't afford daytime
rates, or to keep the machine load low during the day.  But some user
wants to push something through right away.  He sets TZ to indicate that
he's in, say, China.  Now, he starts up a uucico (or a cu.)  The localtime
routine believes the forged TZ and thinks it's within the allowed time zone,
and an expensive phone call is placed.  The log will be made in the wrong
time zone too, so unless the SA is sharp, he won't notice until the phone
bill shows up.
The fundamental difference between the two approaches is that the V7
method makes the timezone a per-system entity, while the Sys III method
makes the timezone a per-process entity.  While an argument can be made
that users dialing in from other time zones might want their processes
to be in their local time zone, this isn't a very strong argument.
(I've never seen anyone actually do it.)  [This is a symptom of a disease
that is affecting UNIX: a tendency to put things in the environment that
are not per-process.  David Yost pointed out, for example, that TERM is
not a per-process entity, it's a per-port entity.  Berkeley's V6 had a
file in /etc with per-port terminal codes, similar to /etc/utmp, but
we've actually taken a step backwards by putting it into the environment.
Take a look at tset and people's .profile's and .login's to see the
complexity this decision has cost us.]
So anyway, so far I've argued that the System III method is inadequate.
How about the V7 method?
The V7 method doesn't suffer from any of the weaknesses described above.
It does require a system call to get the time zone, which is a bit more
overhead than a getenv, and the kernel doesn't have any business knowing
about time zones.  (The same argument could be made that the kernel doesn't
have any business knowing the host name, too, but both System III and
4.2BSD have that information in the kernel, and it works awfully well.)
The weaknesses in the V7/4.2BSD method are in the time zone name
(since it's computed from a table and the offset value) and in the
rule for deciding when DST begins and ends.  The second problem is
also present in the Sys III method.
Suppose localtime finds out that the offset is +60, that is, we are
one hour east of GMT.  What time zone name do we use?  Well, if we're
France, it might be MET (Middle European Time.)  If we're in Germany,
it might be MEZ (Mitten European Zeit.)  I probably have the specifics
wrong (I doubt that the French tell time in English) but you get the
idea.  An offset just specifies 1/24 of the world, and moving north/south
along that zone you can get a lot of countries, each with widely varying
languages and time zone names.  Even Alaska/Hawaii had trouble sharing
a time zone.  (The mapping in the other direction is ambiguous too, there
has been lots of amusement and frustration from code that thought BST was
Bering Standard Time, when run in England in July and fed the local
abbreviation for British Summer Time.  This is why electronic mail and
news standards currently recommend that messages be stamped in UTC.
Or is that UT?  Or GMT?  Ick.)  So far we've survived because there tends
to be a dominant time zone name in each zone where UNIX has appeared, and
source has been present for the places that have to change it.  But as UNIX
becomes more popular in places like Africa, Eastern Europe, and Asia, this
will get a lot messier, especially with binary licenses.
The decision about when daylight time begins and ends is more interesting.
If you read the manual page or the code, you'll discover that localtime
knows rules like "the 3rd Sunday in October" and has a table with special
hacks for 1974 and 1975, when the US Congress decided to change things.
This table "can be extended if necessary".  Of course, extending the table
means modifying the source and recompiling the world.  Might be hard on
all those binary systems, especially the ones without a programmer on staff.
This hasn't been a problem yet, but now Congress is making noises about
moving the dates around a bit.  It's a controversial proposal, and I wouldn't
be surprised if they try two or three rules before the pick one they like.
Given all the old releases still out there, and the development time for
a new release, we need about a 2 year warning of such an impending change.
We'll be lucky to get 6 months.
So what do we do about all this?  Well, I think the basic requirements are
that
 (1) The time zone should be a per-system entity.  There should only be
     one place on each system where the zone is specified to ensure that
     all programs on the system use the same notion of time zone.
 (2) The time zone offset, names, and daylight conventions should be
     easily configured by the system administrator.  We should have a
     standard that allows zone offsets in minutes (maybe even seconds,
     I don't know how precise Saudi Arabia needs), zone names of arbitrary
     length (they are 7 characters in Australia, for example), whether
     we use daylight time at all, when daylight time begins, and when it ends.
     The latter two must be allowed to vary as a function of the year.
The exact method for doing this isn't clear.  We certainly need a configuration
file.  But interpreting this file on each call to ctime isn't a good idea.
It would be nice to have /etc/rc look at the source file and plug a simple
interpretation of it into either a binary file or the kernel, but we have
to worry about what happens if the system is up (and processes are running)
when we convert over to or from daylight time.  Perhaps cron has to run a
program to update it every hour.  You could have cron only run this program
twice a year, but this would require putting the configuration information
into crontab (which doesn't understand things like "3rd Sunday in October")
and would lose if the system happened to be down at conversion time.  Also,
the algorithm has to work for dates that aren't this year, e.g. "ls -l" of
an old file.
How much of this does P1003 need to specify?  After all, if they are just
specifying sections 2 and 3, then the file format and the method for making
sure things are right should be up to the implementor.  Well, at a minimum,
we need ctime and localtime.  We also a standard way to get the time zone
name and offset - there is a lot of ifdeffed code out there that either
looks for TZ or calls ftime - and whether daylight time applies (and by
how much) for a given time.
But there's more.  When the mood strikes Congress to change the rules and
gives us 2 months notice, it ought to be possible to publish a new table
that everybody with a P1003 system can just type in.  It would be nice if
the location and format of this table were standardized.  (After all, the
US Congress doesn't set the rules for the rest of the world, and they are
just as subject to having arbitrary bodies with different rules.)
Finally, there needs to be a requirement that the time zone always work.
Some discussion of these issues should be present.  Otherwise, some
implementor is going to think that the System III method works adequately.
	Mark Horton
Volume-Number: Volume 5, Number 3