dupuy@westend.columbia.edu (Alexander Dupuy) (01/01/88)
Ever since the leap second (23:59:60 GMT Jan 1, 1988) the realtime clocks on
Sun-3s have been behaving strangely. When booting /vmunix, just after the
message about using nn buffers, the kernel prints out a little message like the
above. That's not too bothersome for us, since we use rdate and ntpd to keep
our Suns' clocks in synch anyhow.
What is bothersome is that the system clocks have started to slew wildly.
Using a little program I hacked up, I have found that there are spurious deltas
showing up in adjtime(2) on *ANY SUN-3* which has had its time set or adjusted
since the leap second. Running the following program a few times:
adjtime.c
---------
#include <sys/time.h>
struct timeval delta = { 0, 0 },
olddelta = { 0, 0 };
main ()
{
if ( adjtime (&delta, &olddelta) == -1)
perror ("adjtime");
printf ("adjust %d.%d, oldslew %d.%d\n", delta.tv_sec, delta.tv_usec,
olddelta.tv_sec, olddelta.tv_usec);
}
I get results like this:
Script started on Fri Jan 1 02:14:18 1988
finest# alias adj '/src/local/local/netdate/adjtime; date'
finest# adj
adjust 0.0, oldslew -1490.-143408
Fri Jan 1 02:15:26 EST 1988
westend# adj
adjust 0.0, oldslew 0.0
Fri Jan 1 02:15:32 EST 1988
westend# adj
adjust 0.0, oldslew -1729.-961408
Fri Jan 1 02:15:46 EST 1988
westend#
westend# date 8712311650
Thu Dec 31 16:50:00 EST 1987
westend# adj
adjust 0.0, oldslew 0.0
Thu Dec 31 16:50:09 EST 1987
westend# adj
adjust 0.0, oldslew 0.0
Thu Dec 31 16:50:29 EST 1987
westend# adj
adjust 0.0, oldslew 0.0
Thu Dec 31 16:50:48 EST 1987
script done on Thu Dec 31 16:50:49 1987
As can be seen, every ten to fifteen seconds, some monstrous time adjustment
gets added in by the kernel. This is *not* being done by ntp or any other time
daemon - it even happens in single user mode. It can also be seen that after
the date is reset to 1987 (GMT) this behavior disappears, and time stabilizes.
The silly message when booting disappears as well.
So it looks like the guilty party is /sys/sundev/clock.c. But not having
source code, what can I do?
Other observations: Our Sun-2s (bless their little obsolete cpus) have not even
stuttered since the leap second went down. Their TOD clock code seems to be
just fine.
So will someone with access to Sun kernel sources please help me out? This is
a serious bug, and I imagine Sun will have a patched OBJ/clock.o for binary
sites eventually, but in the meantime, it is stretching the resources of ntpd
to even keep the machines within a *minute* or so of true time. The poor
machines which aren't running ntp are okay until they are rebooted, or someone
foolishly tries to set their time, but once that happens, their watch gears get
unsprung.
@alex
---
arpanet: dupuy@columbia.edu
uucp: ...!seismo!columbia!dupuy
---
arpanet: dupuy@columbia.edu
uucp: ...!seismo!columbia!dupuy
dupuy@westend.columbia.edu (Alexander Dupuy) (01/01/88)
Forgot to give the versions for which this problem exists: SunOS 3.2 @(#)clock.c 1.1 86/07/07 and SunOS 3.4 @(#)clock.c 1.2 86/10/08 --- arpanet: dupuy@columbia.edu uucp: ...!seismo!columbia!dupuy
mp@allegra.UUCP (Mark Plotnick) (01/04/88)
If anyone has a patch for sun4's, please send it along; we don't yet have source code or much knowledge of the assembly language. A short-term workaround, for both sun3's and sun4's, is to minimize reliance on the tod clock: run rdate as soon as possible after booting, and patch the kernel variable dosynctodr to be 0 so that the unix date is not periodically copied (well, actually, it's adjtime'd) from the incorrect info in the todr. This workaround may result in the unix date running a bit slow due to missed clock interrupts. Mark Plotnick Department of Solar Calendars allegra!mp
bzs%bu-cs.bu.edu@bu-it.BU.EDU (Barry Shein) (01/05/88)
Urgh, the nice thing about all these patches is one gets so many to choose from. Stu Levy's looks better than mine (mine: to turn off dosynctodr), if you applied mine undo it (trivial) and try his. I'm copying the note so it nullifies my advice on Unix-wizards also. -Barry Shein, Boston University Date: Sat, 2 Jan 88 00:27:09 CST From: slevy@uc.msc.umn.edu (Stuart Levy) To: tcp-ip@sri-nic.arpa, westend!dupuy@columbia.edu Subject: Re: WARNING: TOD clock not initialized -- CHECK AND RESET THE DATE! Whew. I was pinging umd1.umd.edu at leap second time, hoping to catch it in the act (wonder how many others were doing the same thing?), when suddenly the time difference started hurtling into outer space. For a moment I wondered if Dave Mills had added a leap minute instead of second, but no, our SUNs had all gone mad. It was a great relief to hear that someone else saw the same thing. I believe I have a fix for this.. Probably the easiest way to distribute it without annoying SUN too much is as a binary patch. Say: # adb -w -k /vmunix /dev/mem resettodr+0xca?X (It should contain 0x536efff4, a subqw #1,a6@(-0xc) instruction.) (Change it to NOP's in the /vmunix file with...) .?W 4e714e71 (and in the running kernel (this seems to be safe) with...) ./W 4e714e71 $q # For those who have source, the relevant module is sun3/clock.c. The line in resettodr() reading t += MONTHSEC(--mon, year); breaks, since MONTHSEC evaluates the --mon twice in leap years. It could change to mon--; t += MONTHSEC(mon, year); This appears to work on our SUNs running 3.3. Stuart Levy, Minn. Supercomputer Center slevy@uc.msc.umn.edu
mark@nova.usc.edu (Mark A. Brown) (01/05/88)
Here's a binary patch for the leap year bug that will work for Sun 4s. # adb -w -k /vmunix /dev/mem resettodr+0x110?X (It should contain 0xba276001, a sub %i5, 0x1, %i5 instruction. (Change it to a nop in both /vmunix and kernel memory) .?W 0x1000000 ./W 0x1000000 (To make yourself feel better, do the following) .?i ./i (If they're nop's, things should now be better) $q We are running the SYS4 GAMMA release, but things should be the same for SYS4 3.2. If not, here's the original GAMMA binary and you can go from there. _resettodr+0xf0: srl %i5, 0x10, %i5 _resettodr+0xf4: orcc %g0, %i1, %g0 _resettodr+0xf8: bne _resettodr + 0x120 _resettodr+0xfc: sub %i5, 0x1, %i5 _resettodr+0x100: sll %i5, 0x10, %i5 _resettodr+0x104: srl %i5, 0x10, %i5 _resettodr+0x108: cmp %i5, 0x2 _resettodr+0x10c: bne,a _resettodr + 0x120 _resettodr+0x110: sub %i5, 0x1, %i5 <<< change to nop _resettodr+0x114: sethi %hi(0x263800), %o5 _resettodr+0x118: ba _resettodr + 0x134 _resettodr+0x11c: add %o5, 0x380, %i3 _resettodr+0x120: sll %i5, 0x10, %i5 _resettodr+0x124: srl %i5, 0x10, %i5 _resettodr+0x128: sub %i5, 0x1, %i3 _resettodr+0x12c: sll %i3, 0x2, %i3 _resettodr+0x130: ld [%i3 + %l7], %i3 _resettodr+0x134: add %i4, %i3, %i4 _resettodr+0x138: mov %i4, %o0 Mark
dupuy@westend.columbia.edu (Alexander Dupuy) (01/05/88)
Just so that everyone installs the best patch for this problem - Robert Elz posted a better binary patch than Stuart Levy's in that Elz's will also work in non-leap years (like 1989, justin case you're still running 3.4 then...) From: kre@munnari.oz (Robert Elz) Newsgroups: comp.protocols.tcp-ip Summary: An alternative binary patch (for SunOS 3.4), which will work forever Message-ID: <1944@munnari.oz> Date: 3 Jan 88 08:25:17 GMT Here's an alternative (binary) patch that will work in both leap years, and in boring old ordinary years. # adb -w -k /vmunix /dev/mem resettodr+0xca?X (It should contain 0x536efff4, a subqw #1,a6@(-0xc) instruction. If you applied Stuart's patch it will contain 0x4e714e71, 2 nop's so put back the subw in both the kernel a.out, and memory) .?W 536efff4 ./W 536efff4 (next, apply a slightly better fix) resettodr+0xc0?i (it should contain "bnes resettodr+0xca", which we will change to be "bnes resettodr+0xce" and avoid the incorrect subw) .?w 660c (now verify that its correct) .?i (and assuming it is "bnes resettodr+0xca", change the running kernel) ./w 660c $q I can't verify that this actually fixes the reported problem, but it clearly does fix a bug, and should have the same effect this year as Stuart's fix, while not hurting next year. I used SunOS 3.4 to do this, in case other versions of SunOS deviate (3.3 is apparently the same), here is the original section of binary ... _resettodr+0xa6: movw a6@(-0x10),d0 _resettodr+0xaa: moveq #3,d1 _resettodr+0xac: andw d1,d0 _resettodr+0xae: andl #0xffff,d0 _resettodr+0xb4: bnes _resettodr+0xca _resettodr+0xb6: subqw #1,a6@(-0xc) _resettodr+0xba: cmpw #2,a6@(-0xc) _resettodr+0xc0: bnes _resettodr+0xca <<<< change this to _resettodr+0xc2: movl #0x263b80,d0 _resettodr+0xc8: bras _resettodr+0xde _resettodr+0xca: subqw #1,a6@(-0xc) _resettodr+0xce: moveq #0,d0 <<<< branch to here _resettodr+0xd0: movw a6@(-0xc),d0 _resettodr+0xd4: lea _monthsec:l,a0 _resettodr+0xda: movl a0@(-4,d0:l:4),d0 _resettodr+0xde: addl d0,d7 kre --- arpanet: dupuy@columbia.edu uucp: ...!seismo!columbia!dupuy
matt@oddjob.UChicago.EDU (Keeper of the Sacred Tablets) (01/06/88)
) From: kre@munnari.oz (Robert Elz) ) ) Here's an alternative (binary) patch that will work in both leap years, ) and in boring old ordinary years. ) ) # adb -w -k /vmunix /dev/mem ) resettodr+0xca?X etc ... "kpatch", a conceptual variant of Larry Wall's "patch" will automatically search incoming news articles for binary kernel patches and apply them to your running system. ________________________________________________________ Matt University matt@oddjob.uchicago.edu Crawford of Chicago {astrovax,ihnp4}!oddjob!matt
chuq@plaid.Sun.COM (Chuq Von Rospach) (01/06/88)
In article <5203@columbia.edu> dupuy@columbia.edu (Alexander Dupuy) writes: >Ever since the leap second (23:59:60 GMT Jan 1, 1988) the realtime clocks on >Sun-3s have been behaving strangely. When booting /vmunix, just after the >message about using nn buffers, the kernel prints out a little message like the >above. here's the official sun3 patch. The official sun4 patch will be posted as soon as it gets released and I get approval. Chuq Sun Tech Support ============================================ There exists a problem for all Sun3 (68020) machines running SunOS Releases 3.0-3.5, 4.0beta1, and all Sun4 (SPARC) machines running SunOS Release Sys4-3.2 FCS and 4.0beta1. As of Jan 1 00:00 1988, the clock routine in the kernel will put the clock chip into an uncertain state if you attempt to set the date. The visible effects of this is to 1) cause the message WARNING: TOD clock not initialized -- CHECK AND RESET THE DATE! to appear while booting vmunix, and to 2) cause the system date to start to drift widely. Any attempts to actually *set* the date will have only a temporary effect (i.e., the date you set will be good for about 30 seconds). In order to solve this problem, you must patch both the kernel and system object files. ============================================================================== Sun3 System Patch Releases 3.2, 3.3, 3.4, 3.5 As root, run the follwing command: echo 'resettodr+c0?i' | adb /vmunix - | grep reset You should see the following printed out: _resettodr+c0: bnes _resettodr+0xca If you see instead: _resettodr+c0: bnes _resettodr+0xce the patch has already been applied to this system. Proceed with the rest of the patch procedure anyway! If you do not see either of these messages, go no further with this patch, and please contact Sun Microsystems Customer Service. If you do see either of those messages, then run, as root, the following commands: cp /sys/OBJ/clock.o /sys/OBJ/clock.o- echo 'resettodr+c0?w 660c' | adb -w /sys/OBJ/clock.o echo 'resettodr+c0?w 660c' | adb -w /vmunix Reboot and then *set* the date. If you build kernels for your system, also rebuild your kernel. Chuq "Fixed in 4.0" Von Rospach chuq@sun.COM Delphi: CHUQ What do you mean 'You don't really want to hurt her?' I'm a Super-Villian! That's my Schtick!
jra@jc3b21.UUCP (Jay R. Ashworth) (01/07/88)
In article <37929@sun.uucp>, chuq@plaid.Sun.COM (Chuq Von Rospach) [ Hi, Chuq! ] says: > > In article <5203@columbia.edu> dupuy@columbia.edu (Alexander Dupuy) writes: >>Ever since the leap second (23:59:60 GMT Jan 1, 1988) the realtime clocks on ^^^ ^^ ^^^^ Whoops! Did I misunderstand something here? I thought the leap second occured at 23:59:60 UTC Dec 31, 1987. I have the same kind of esoteric interest here as all you internet folk who kept poking the fuzzballs that night, trying to get to see it. (You know who you are. :-) -- jra -- Jay R. Ashworth ---+-- The Great Ashworth & ------------+...!uunet!codas!pdn! 10974 111th St. N. | Petrillo Production Company | jc3b21!jra Seminole FL 34648 +-- watch for BayLink Public Access -+- UNIX ----+--------- (813) 397-1859 ----+-- Tampa Bay's Smallest Video Production House -+ :-) !$