std-unix@ut-sally.UUCP (Moderator, John Quarterman) (01/02/86)
Date: Wed, 1 Jan 86 18:39:26 est From: seismo!cbpavo.cbosgd.ATT.UUCP!mark@sally.UTEXAS.EDU (Mark Horton) The latest draft asked for input about time zones. I'd like to make a few comments. There are two basic ways that time zones are done today. There is the V7 method (where the zone offset is in the kernel) and the System III method (where the zone offset and name is in an environment variable.) 4.2BSD descends from V7 (although it has a fancier "offset to zone name" conversion algorithm that knows more about the world) and System V descends from System III. There are elegance considerations (e.g. should the kernel know or care about time zones?) and efficiency considerations (e.g. is it faster to look in the environment, do a system call, or read a file.) But both of these are dwarfed by a much more important consideration: does it work properly in all cases? I claim that neither method is correct all the time, but the V7 method is right more often than the System III method. In V7, when you configure your kernel, you tell it the zone offset, in minutes, from GMT, and whether you observe Daylight time. These numbers are stored in the kernel, which doesn't do anything with them except return them when an ftime system call asks for them. So, in effect, they are in the kernel for administrative convenience. (You don't have to open a file, and they aren't compiled into ctime, so it isn't necessary to relink every program that calls ctime or localtime when you install a new system.) The smarts that generate the time zone name and decide if Daylight time is in effect are in ctime in user code. (By comparison, in TOPS 20, the equivalent of ctime is a system call, with lots of options for the format. This may seem inelegant, but it results in only one copy of the code on the system, even without shared libraries.) In System III, the kernel doesn't know about time zones at all. The environment variable TZ stores both the zone names and the zone offset, and ctime looks there. This makes things very flexible - no assumptions about 3 character zone names, and it's easy for a person dialing in from a different time zone to run in their own time zone. Also, it's very efficient to look in the environment - faster than a system call. However, there are some very serious problems with the System III method. One problem is that, with an offset measured in hours, places like Newfoundland, Central Australia, and Saudi Arabia, which don't have even hour time zones, are in trouble. But that's easy to fix in the standard. The time zone is configured into the system by modifying /etc/profile, which is a shell script of startup commands run by the Bourne shell when any user logs in, much like .profile or .login. This means that we assume everybody is using the Bourne shell. This is a false assumption - one of the documented features of UNIX ever since V6 is that the shell is an ordinary program, and a user can have any shell they like. In particular, the Berkeley C shell does not read /etc/profile, so all csh users don't get a TZ variable set up for them in every System III or V UNIX I've ever used. Well, after all, the Bourne shell is the standard shell, and everybody should use the standard shell, right? After all, the new Korn shell reads and understands /etc/profile. Even if we believe the above statement and ignore the documented feature of being able to put your favorite shell in the passwd file (and I don't) we still get into trouble. For example, uucp has a special shell: uucico. It doesn't read /etc/profile. And what about programs that don't get run from a login shell? For example, all the background daemons run out of /etc/rc? Or programs run from cron? Or from at? Or programs run while single user? None of these programs get a TZ. Does it matter if a non-interactive program is in the wrong time zone? After all, the files it touches are touched in GMT. The answer is yes: background processes generally keep logs, and the logs have time stamps. For example, uucico gets started hourly out of crontab, and this means that almost any uucico on the system (from crontab or from another system logging in) will run in the wrong time zone. Since L.sys has restrictions on the times it can dial out, being in the wrong time zone can cause calls to be placed during the day, even if this is supposedly forbidden in L.sys. Also, of course, things like "date > /dev/console" every hour from crontab will have problems. It turns out that System III has a "default time zone" which is used by localtime if there is no TZ variable. On every version of System III or V I've ever used, this is set to the time zone of the developer. It's EST in the traditional versions from AT&T. It's PST in Xenix. So the developers of the system never see any problems - uucp logs are right, for example, and csh users still get the right time. Until they ship the system out of their time zone. This problem isn't really that hard to fix. You just have init open a configuration file when it starts up, and set its own environment from there. (If you're willing to have init open files that early.) But it turns out there is an even more serious problem with the TZ environment variable method. It's a security problem. Let's say the system administrator has configured UUCP to only call out when the phone rates are low, because the site is poor and can't afford daytime rates, or to keep the machine load low during the day. But some user wants to push something through right away. He sets TZ to indicate that he's in, say, China. Now, he starts up a uucico (or a cu.) The localtime routine believes the forged TZ and thinks it's within the allowed time zone, and an expensive phone call is placed. The log will be made in the wrong time zone too, so unless the SA is sharp, he won't notice until the phone bill shows up. The fundamental difference between the two approaches is that the V7 method makes the timezone a per-system entity, while the Sys III method makes the timezone a per-process entity. While an argument can be made that users dialing in from other time zones might want their processes to be in their local time zone, this isn't a very strong argument. (I've never seen anyone actually do it.) [This is a symptom of a disease that is affecting UNIX: a tendency to put things in the environment that are not per-process. David Yost pointed out, for example, that TERM is not a per-process entity, it's a per-port entity. Berkeley's V6 had a file in /etc with per-port terminal codes, similar to /etc/utmp, but we've actually taken a step backwards by putting it into the environment. Take a look at tset and people's .profile's and .login's to see the complexity this decision has cost us.] So anyway, so far I've argued that the System III method is inadequate. How about the V7 method? The V7 method doesn't suffer from any of the weaknesses described above. It does require a system call to get the time zone, which is a bit more overhead than a getenv, and the kernel doesn't have any business knowing about time zones. (The same argument could be made that the kernel doesn't have any business knowing the host name, too, but both System III and 4.2BSD have that information in the kernel, and it works awfully well.) The weaknesses in the V7/4.2BSD method are in the time zone name (since it's computed from a table and the offset value) and in the rule for deciding when DST begins and ends. The second problem is also present in the Sys III method. Suppose localtime finds out that the offset is +60, that is, we are one hour east of GMT. What time zone name do we use? Well, if we're France, it might be MET (Middle European Time.) If we're in Germany, it might be MEZ (Mitten European Zeit.) I probably have the specifics wrong (I doubt that the French tell time in English) but you get the idea. An offset just specifies 1/24 of the world, and moving north/south along that zone you can get a lot of countries, each with widely varying languages and time zone names. Even Alaska/Hawaii had trouble sharing a time zone. (The mapping in the other direction is ambiguous too, there has been lots of amusement and frustration from code that thought BST was Bering Standard Time, when run in England in July and fed the local abbreviation for British Summer Time. This is why electronic mail and news standards currently recommend that messages be stamped in UTC. Or is that UT? Or GMT? Ick.) So far we've survived because there tends to be a dominant time zone name in each zone where UNIX has appeared, and source has been present for the places that have to change it. But as UNIX becomes more popular in places like Africa, Eastern Europe, and Asia, this will get a lot messier, especially with binary licenses. The decision about when daylight time begins and ends is more interesting. If you read the manual page or the code, you'll discover that localtime knows rules like "the 3rd Sunday in October" and has a table with special hacks for 1974 and 1975, when the US Congress decided to change things. This table "can be extended if necessary". Of course, extending the table means modifying the source and recompiling the world. Might be hard on all those binary systems, especially the ones without a programmer on staff. This hasn't been a problem yet, but now Congress is making noises about moving the dates around a bit. It's a controversial proposal, and I wouldn't be surprised if they try two or three rules before the pick one they like. Given all the old releases still out there, and the development time for a new release, we need about a 2 year warning of such an impending change. We'll be lucky to get 6 months. So what do we do about all this? Well, I think the basic requirements are that (1) The time zone should be a per-system entity. There should only be one place on each system where the zone is specified to ensure that all programs on the system use the same notion of time zone. (2) The time zone offset, names, and daylight conventions should be easily configured by the system administrator. We should have a standard that allows zone offsets in minutes (maybe even seconds, I don't know how precise Saudi Arabia needs), zone names of arbitrary length (they are 7 characters in Australia, for example), whether we use daylight time at all, when daylight time begins, and when it ends. The latter two must be allowed to vary as a function of the year. The exact method for doing this isn't clear. We certainly need a configuration file. But interpreting this file on each call to ctime isn't a good idea. It would be nice to have /etc/rc look at the source file and plug a simple interpretation of it into either a binary file or the kernel, but we have to worry about what happens if the system is up (and processes are running) when we convert over to or from daylight time. Perhaps cron has to run a program to update it every hour. You could have cron only run this program twice a year, but this would require putting the configuration information into crontab (which doesn't understand things like "3rd Sunday in October") and would lose if the system happened to be down at conversion time. Also, the algorithm has to work for dates that aren't this year, e.g. "ls -l" of an old file. How much of this does P1003 need to specify? After all, if they are just specifying sections 2 and 3, then the file format and the method for making sure things are right should be up to the implementor. Well, at a minimum, we need ctime and localtime. We also a standard way to get the time zone name and offset - there is a lot of ifdeffed code out there that either looks for TZ or calls ftime - and whether daylight time applies (and by how much) for a given time. But there's more. When the mood strikes Congress to change the rules and gives us 2 months notice, it ought to be possible to publish a new table that everybody with a P1003 system can just type in. It would be nice if the location and format of this table were standardized. (After all, the US Congress doesn't set the rules for the rest of the world, and they are just as subject to having arbitrary bodies with different rules.) Finally, there needs to be a requirement that the time zone always work. Some discussion of these issues should be present. Otherwise, some implementor is going to think that the System III method works adequately. Mark Horton Volume-Number: Volume 5, Number 3