hilary@snll-arpagw.UUCP (Hilary Jones) (03/07/91)
I have a major problem for which I need a solution by 3/1/92. Or else! I am using my two MIPS machines as Kerberos servers, so I cannot tolerate very much clock drift. However, last Friday (3/1/91) and a year ago (3/1/90) the clock on my M/2000 started to gain time spontaneously, at a rate of about 3 minutes per hour. I had to reboot the system to "solve" the problem, but when I looked later I found that the clock had still gained a whole day -- exactly to the minute. If I were to guess, I would say that there is a problem with the hardware and/or operating system that doesn't handle the last day of February correctly. This problem occurs on my M/2000 running RISC/os 4.0, and on my RC3260 running RISC/os 4.51, but not on my RS2030, nor on my friend's Magnum 3000. Has anyone else seen this problem? Is there a fix?
datri@convex.com (Anthony A. Datri) (03/07/91)
>"solve" the problem, but when I looked later I found that the clock had >still gained a whole day -- exactly to the minute. I see exactly this on our 3xxx and our 6280 -- which is what prompted me to attempt to get ntp going. -- -- In MDDT no one can hear you scream
jabarby@vlsi.waterloo.edu (J.A. Barby) (03/07/91)
In article <167@snll-arpagw.UUCP>, hilary@snll-arpagw.UUCP (Hilary Jones) writes: > I have a major problem for which I need a solution by 3/1/92. Or else! > > I am using my two MIPS machines as Kerberos servers, so I cannot tolerate > very much clock drift. However, last Friday (3/1/91) and a year ago > (3/1/90) the clock on my M/2000 started to gain time spontaneously, at > a rate of about 3 minutes per hour. ... > > Has anyone else seen this problem? Is there a fix? Yes, we had the same problem on both our M/2000 and RC6280. We are also interested in the fix. -- Jim Barby (U of Waterloo VLSI Group, Waterloo Ont.) jabarby@vlsi.waterloo.{cdn,edu,bitnet} jabarby@vlsi.UWaterloo.ca
at@cc.tut.fi (Toivo Veli) (03/08/91)
In article <167@snll-arpagw.UUCP> hilary@snll-arpagw.UUCP (Hilary Jones) writes: > However, last Friday (3/1/91) and a year ago > (3/1/90) the clock on my M/2000 started to gain time spontaneously, at > a rate of about 3 minutes per hour. ... > Has anyone else seen this problem? Is there a fix? After others had indicated some problems with Mips clocks, I thought to translate this text into English. It was originally written to a local newsgroup. The day this happened was - surprise - March 1st. Used timezone is EET, unless some other zone is mentioned or context shows something else. Oh yes, some of these time-stamps are completely out of any zones, but try to hold on... The problems begun on February 28th, when our machine (RC6280, RiscOs 4.51) crashed before midnight. It couldn't boot on its own, so it waited patiently and was reset in the morning. The boot was quite normal, fsck made only 6 inch list of complaints. When we were up running unix we found the first console-log -messages to be dated "Mar 2 08:40:44 lehtori unix: CPU: MIPS R6000 Processor Chip Revision: 3.1", which indicated that we had gained an extra day. Simple, just say "date 03010844" and get the situation in control. It wasn't so easy... At about 11:55 we found the clock to have gained 10 minutes, so something was wrong. First we tried to use "date 03011156", but the clock still continued running too fast. So the problem didn't belong to the typing error -class. Timed-process caught kill -9 wery fast indeed, and only reason it did still exist was that nobody had had any time to rip it off before. No change. When no comprehensible software reasons had not been found and the system had functioned perfectly at least as far as the clock is considered :-) our conclusion was to try to find hardware malfunction. After the machine was down (13:40) it was killed from the Big Black Switch on the faceplate. After a couple of minutes, boot claimed the time to be 13:49 and date March 2nd... After we had found the right date we found an interesting feature; before date-command the minute consisted of 60 seconds, after date it was only 57 (measured with my wrist-watch). For a minute we thought about all statements of processing speed, but it didn't sound reasonable; even this one can't do endless loop in 10 minutes. Next phase was, of course, to shut down the machine and talk directly to the todc from prom-monitor. For informational purposes, here are some numbers from the time of writing this text: 0x27D6071D - seconds, hex... 668337949 - seconds, decimal (same number as above) 07.03.91 - date 11:25 - time Here is what the conversation looked like: ... >> pr_tod tod = 0x281fabf8 [ Ok, we have the date in hex, shouldn't cause any problems, but... wait a minute... this is wrong. Seems to be 02.05.91, 09:04! ] >> init_tod Setting of TOD not supported in this bootmode [ Hummm... I must RTFM, it could be... ] >> setenv bootmode d >> pr_tod tod = 0x281fad19 >> init_tod Setting tod to 0 seconds [ So there! Now I have to set the time right, it seems to like it in hex, so... (had to consult a friend who used perl...) ] >> init_tod 0x27ce4df1 Setting tod to -1357294851 seconds [ It didn't want it in hex? Ok, let's try decimal... ] >> init_tod 66783500 Setting tod to 66783500 seconds [ This went right, except that one of the digits was missing. Later we figured this to mean 13.02.72, but not accordnig to Mips, as we will see... ] >> setenv bootmode c >> boot dkip()unix initarg=s ... WARNING: clock lost 112 days WARNING: CHECK AND RESET THE DATE! ... [ In single-user the machine tried to claim it was "Fri Nov 9 01:01:39 EET 1990". Not year -72, but -90. After a little command ("date 0301145691") we had the date right, and could go back to tests. Shut down the machine... ] >> pr_tod tod = 0x281faf57 >> boot dkip()unix initarg=s ... lehtori # date Sat Mar 2 14:59:47 EET 1991 ... [ At this stage I screwed it up. I didn't recognize the day, because time was otherwise correct... So it was time to use "date 03011507". But, because the clock was again wrong all by itself, it was time also to shut down again from multiuser... ] ... >> setenv bootmode d >> init_tod Setting tod to 0 seconds >> pr_tod tod = 0x4000003 [ Ok, 0 seconds is somewhere far away, but the counter wraps around or something else happens, because localtime() returns 16.02.72, which once again had nothing to do with the time unix uses, because... ] >> setenv bootmode c >> boot dkip() unix initarg=s ... WARNING: clock lost 59 days WARNING: CHECK AND RESET THE DATE! ... lehtori # date Tue Jan 1 02:01:43 EET 1991 [ Oh, yes... It's about time to try a new way: ] lehtori # date 0101000070 lehtori # date Thu Jan 1 00:00:00 EET 1970 [ ...and after that I was quite qurious to see what the monitor had to say about this, but: ] ... The system is down. todc clock invalid secs=48 mins=37 hours=17 day=28 month=15 year=70 initializing tod clock ... >> pr_tod tod = 0x400000a >> boot dkip()unix initarg=s ... WARNING: preposterous time in file system WARNING: CHECK AND RESET THE DATE! ... lehtori # date Thu Jan 1 00:01:53 EET 1970 [ Wow! We are in the right year! So it is quite accurate, this Mips machine, anyway... ;-) Best to check the todc, however... ] >> pr_tod tod = 0xc7ea2da [ Really? This seems to be more like 23.08.76, so we are a little 'out of date'. ] >> boot dkip()unix initarg=s ... WARNING: clock lost 32 days WARNING: CHECK AND RESET THE DATE! ... lehtori # date Sat Nov 29 23:14:30 EET 1969 ^^^ ^^ [ Why this always happens to me? After this we booted the machine to multiuser and let other people in... Boot, after setting the date and after shutdown changed the day again to March 2nd. ] The solution for this problem was fetch the date from other machines (hard way - cron job and sh-script to set date...) and after the following weekend when this was cut off, it seemed to keep its clock wery stable so the problem might well occur only at Mar 1st. - at -- The processor usually has lowest priority because in general it can stop whatever it is doing without serious consequenses. - pdp 11 Processor Handbook, 1979
rogerk@mips.com (Roger B.A. Klorese) (03/09/91)
In article <1991Mar7.003809.6051@vlsi.waterloo.edu> jabarby@vlsi.waterloo.edu (J.A. Barby) writes: >In article <167@snll-arpagw.UUCP>, hilary@snll-arpagw.UUCP (Hilary Jones) writes: >> I have a major problem for which I need a solution by 3/1/92. Or else! >> >> I am using my two MIPS machines as Kerberos servers, so I cannot tolerate >> very much clock drift. However, last Friday (3/1/91) and a year ago >> (3/1/90) the clock on my M/2000 started to gain time spontaneously, at >> a rate of about 3 minutes per hour. ... >> >> Has anyone else seen this problem? Is there a fix? > >Yes, we had the same problem on both our M/2000 and RC6280. We are also >interested in the fix. We have checked in a fix for this problem. It will be released in the next (post-4.52) operating system release. For now, correct the date with the "date" command. The fix should be shipped long before 3/1/92. (Actually, 3/1/92 will not manifest the problem, as it occurs only on non-leap years.) There is a related problem, fixed in the same code, which will cause the clock to go back one day on or after 1/1/92 under some circumstances. In any event, the fix will be shipped before it is needed. -- ROGER B.A. KLORESE MIPS Computer Systems, Inc. MS 6-05 930 DeGuigne Dr. Sunnyvale, CA 94088 +1 408 524-7421 rogerk@mips.COM {ames,decwrl,pyramid}!mips!rogerk "I'm the NLA" "WAR: been there, done that... hated it." -- QueerPeace/DAGGER chant