[comp.sys.mips] Help. My clock runs fast on March 1.

hilary@snll-arpagw.UUCP (Hilary Jones) (03/07/91)

I have a major problem for which I need a solution by 3/1/92.  Or else!

I am using my two MIPS machines as Kerberos servers, so I cannot tolerate
very much clock drift.  However, last Friday (3/1/91) and a year ago
(3/1/90) the clock on my M/2000 started to gain time spontaneously, at
a rate of about 3 minutes per hour.  I had to reboot the system to
"solve" the problem, but when I looked later I found that the clock had 
still gained a whole day -- exactly to the minute.  If I were to guess, 
I would say that there is a problem with the hardware and/or operating
system that doesn't handle the last day of February correctly.

This problem occurs on my M/2000 running RISC/os 4.0, and on my RC3260 
running RISC/os 4.51, but not on my RS2030, nor on my friend's Magnum 3000.

Has anyone else seen this problem?  Is there a fix?

datri@convex.com (Anthony A. Datri) (03/07/91)

>"solve" the problem, but when I looked later I found that the clock had 
>still gained a whole day -- exactly to the minute.

I see exactly this on our 3xxx and our 6280 -- which is what prompted me
to attempt to get ntp going.
--


--
In MDDT no one can hear you scream

jabarby@vlsi.waterloo.edu (J.A. Barby) (03/07/91)

In article <167@snll-arpagw.UUCP>, hilary@snll-arpagw.UUCP (Hilary Jones) writes:
> I have a major problem for which I need a solution by 3/1/92.  Or else!
> 
> I am using my two MIPS machines as Kerberos servers, so I cannot tolerate
> very much clock drift.  However, last Friday (3/1/91) and a year ago
> (3/1/90) the clock on my M/2000 started to gain time spontaneously, at
> a rate of about 3 minutes per hour.  ...
> 
> Has anyone else seen this problem?  Is there a fix?

Yes, we had the same problem on both our M/2000 and RC6280.  We are also
interested in the fix.
-- 
	Jim Barby  (U of Waterloo VLSI Group, Waterloo Ont.)
	jabarby@vlsi.waterloo.{cdn,edu,bitnet}
	jabarby@vlsi.UWaterloo.ca

at@cc.tut.fi (Toivo Veli) (03/08/91)

In article <167@snll-arpagw.UUCP> hilary@snll-arpagw.UUCP (Hilary Jones) writes:
> However, last Friday (3/1/91) and a year ago
> (3/1/90) the clock on my M/2000 started to gain time spontaneously, at
> a rate of about 3 minutes per hour.
...
> Has anyone else seen this problem?  Is there a fix?

After others had indicated some problems with Mips clocks, I thought
to translate this text into English. It was originally written to a
local newsgroup. The day this happened was - surprise - March 1st.

Used timezone is EET, unless some other zone is mentioned or context
shows something else. Oh yes, some of these time-stamps are completely
out of any zones, but try to hold on...

The problems begun on February 28th, when our machine (RC6280, RiscOs
4.51) crashed before midnight. It couldn't boot on its own, so it
waited patiently and was reset in the morning.

The boot was quite normal, fsck made only 6 inch list of complaints.

When we were up running unix we found the first console-log -messages
to be dated "Mar  2 08:40:44 lehtori unix: CPU: MIPS R6000 Processor
Chip Revision: 3.1", which indicated that we had gained an extra day.
Simple, just say "date 03010844" and get the situation in control. It
wasn't so easy...

At about 11:55 we found the clock to have gained 10 minutes, so
something was wrong. First we tried to use  "date 03011156", but the
clock still continued running too fast. So the problem didn't belong
to the typing error -class. Timed-process caught kill -9 wery fast
indeed, and only reason it did still exist was that nobody had had any
time to rip it off before. No change.

When no comprehensible software reasons had not been found and the
system had functioned perfectly at least as far as the clock is
considered :-) our conclusion was to try to find hardware malfunction.
After the machine was down (13:40) it was killed from the Big Black
Switch on the faceplate. After a couple of minutes, boot claimed the
time to be 13:49 and date March 2nd...

After we had found the right date we found an interesting feature;
before date-command the minute consisted of 60 seconds, after date it
was only 57 (measured with my wrist-watch). For a minute we thought
about all statements of processing speed, but it didn't sound
reasonable; even this one can't do endless loop in 10 minutes.

Next phase was, of course, to shut down the machine and talk directly
to the todc from prom-monitor. For informational purposes, here are
some numbers from the time of writing this text:

0x27D6071D	- seconds, hex...
668337949	- seconds, decimal (same number as above)
07.03.91	- date
11:25		- time

Here is what the conversation looked like:

...
>> pr_tod
tod = 0x281fabf8

[ Ok, we have the date in hex, shouldn't cause any problems, but...
wait a minute... this is wrong. Seems to be 02.05.91, 09:04! ]

>> init_tod
Setting of TOD not supported in this bootmode

[ Hummm... I must RTFM, it could be... ]

>> setenv bootmode d
>> pr_tod
tod = 0x281fad19
>> init_tod
Setting tod to 0 seconds

[ So there! Now I have to set the time right, it seems to like it in
hex, so... (had to consult a friend who used perl...) ]

>> init_tod 0x27ce4df1
Setting tod to -1357294851 seconds

[ It didn't want it in hex? Ok, let's try decimal... ]

>> init_tod 66783500
Setting tod to  66783500 seconds

[ This went right, except that one of the digits was missing. Later we
figured this to mean 13.02.72, but not accordnig to Mips, as we will
see... ]

>> setenv bootmode c
>> boot dkip()unix initarg=s

...
WARNING: clock lost 112 days

WARNING: CHECK AND RESET THE DATE!
...

[ In single-user the machine tried to claim it was "Fri Nov  9
01:01:39 EET 1990". Not year -72, but -90. After a little command
("date 0301145691") we had the date right, and could go back to tests.
Shut down the machine... ]

>> pr_tod
tod = 0x281faf57
>> boot dkip()unix initarg=s

...

lehtori # date
Sat Mar  2 14:59:47 EET 1991

...

[ At this stage I screwed it up. I didn't recognize the day, because
time was otherwise correct... So it was time to use "date 03011507".
But, because the clock was again wrong all by itself, it was time also
to shut down again from multiuser... ]

...
>> setenv bootmode d
>> init_tod
Setting tod to 0 seconds
>> pr_tod
tod = 0x4000003

[ Ok, 0 seconds is somewhere far away, but the counter wraps around or
something else happens, because localtime() returns 16.02.72, which
once again had nothing to do with the time unix uses, because... ]

>> setenv bootmode c
>> boot dkip() unix initarg=s

...

WARNING: clock lost 59 days

WARNING: CHECK AND RESET THE DATE!

...

lehtori # date
Tue Jan  1 02:01:43 EET 1991

[ Oh, yes... It's about time to try a new way: ]

lehtori # date 0101000070
lehtori # date
Thu Jan  1 00:00:00 EET 1970

[ ...and after that I was quite qurious to see what the monitor had to
say about this, but: ]

...
The system is down.
todc clock invalid
secs=48 mins=37 hours=17 day=28 month=15 year=70
initializing tod clock

...

>> pr_tod
tod = 0x400000a
>> boot dkip()unix initarg=s

...

WARNING: preposterous time in file system

WARNING: CHECK AND RESET THE DATE!

...

lehtori # date
Thu Jan  1 00:01:53 EET 1970

[ Wow! We are in the right year! So it is quite accurate, this Mips
machine, anyway... ;-) Best to check the todc, however... ]

>> pr_tod
tod = 0xc7ea2da

[ Really? This seems to be more like 23.08.76, so we are a little 'out
of date'. ]

>> boot dkip()unix initarg=s

...

WARNING: clock lost 32 days

WARNING: CHECK AND RESET THE DATE!

...

lehtori # date
Sat Nov 29 23:14:30 EET 1969
    ^^^                   ^^
[ Why this always happens to me? After this we booted the machine to
multiuser and let other people in... Boot, after setting the date and
after shutdown changed the day again to March 2nd. ]

The solution for this problem was fetch the date from other machines
(hard way - cron job and sh-script to set date...) and after the
following weekend when this was cut off, it seemed to keep its clock
wery stable so the problem might well occur only at Mar 1st.

	- at
--
The processor usually has lowest priority because in general it can
stop whatever it is doing without serious consequenses.
	- pdp 11 Processor Handbook, 1979

rogerk@mips.com (Roger B.A. Klorese) (03/09/91)

In article <1991Mar7.003809.6051@vlsi.waterloo.edu> jabarby@vlsi.waterloo.edu (J.A. Barby) writes:
>In article <167@snll-arpagw.UUCP>, hilary@snll-arpagw.UUCP (Hilary Jones) writes:
>> I have a major problem for which I need a solution by 3/1/92.  Or else!
>> 
>> I am using my two MIPS machines as Kerberos servers, so I cannot tolerate
>> very much clock drift.  However, last Friday (3/1/91) and a year ago
>> (3/1/90) the clock on my M/2000 started to gain time spontaneously, at
>> a rate of about 3 minutes per hour.  ...
>> 
>> Has anyone else seen this problem?  Is there a fix?
>
>Yes, we had the same problem on both our M/2000 and RC6280.  We are also
>interested in the fix.

We have checked in a fix for this problem.  It will be released in the next
(post-4.52) operating system release.  For now, correct the date with the
"date" command.

The fix should be shipped long before 3/1/92.  (Actually, 3/1/92 will not
manifest the problem, as it occurs only on non-leap years.)  There is a
related problem, fixed in the same code, which will cause the clock to
go back one day on or after 1/1/92 under some circumstances.  In any event,
the fix will be shipped before it is needed.
-- 
ROGER B.A. KLORESE                                  MIPS Computer Systems, Inc.
MS 6-05    930 DeGuigne Dr.   Sunnyvale, CA  94088              +1 408 524-7421
rogerk@mips.COM         {ames,decwrl,pyramid}!mips!rogerk         "I'm the NLA"
"WAR: been there, done that... hated it."  -- QueerPeace/DAGGER chant