[comp.sys.apollo] DON'T PUT SR10.3 ON DN2500 !!!!

system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (01/26/91)

There is a severe bug in SR10.3 on DN2500's (only DN2500's according to the
Hotline) - the system clock goes crazy, gaining 1-10 minutes for every
minute of real time, with the gains being worse the heavier the load
is on the system. The system is semi-useable with DM/pads (except that
vi/page/more/rlogin/telnet only work once), but once
X/xterm/otherXclients are used, you can't even get a character to echo
in the xterm for 2-5 minutes. Moving the mouse causes the load average
to shoot over 15 (which is probably wrong since the clock is screwed),
and all manner of screen events happen - windows cycle automatically,
menus pop up / pull down automatically, areas of text are
marked/yanked/pasted at random.

The Hotline says the patch will be available in March. FANTASTIC!
I guess I'll just watch the xclock hands jump around the dial for
a month or so.

I found this problem within 30 seconds of booting SR10.3 using X
Windows - doesn't anybody at HP/Apollo test anything, not to mention
the beta testers !?!?! Sorry if you feel attacked/insulted, but this
is ridiculous.
-- 
Mike Peterson, System Administrator, U/Toronto Department of Chemistry
E-mail: system@alchemy.chem.utoronto.ca
Tel: (416) 978-7094                  Fax: (416) 978-8775

kts@quintro.uucp (Kenneth T. Smelcer) (01/27/91)

In article <1991Jan25.160628.10897@alchemy.chem.utoronto.ca> system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) writes:
>There is a severe bug in SR10.3 on DN2500's (only DN2500's according to the
>Hotline) - the system clock goes crazy, gaining 1-10 minutes for every
>minute of real time, with the gains being worse the heavier the load
>is on the system. The system is semi-useable with DM/pads (except that
>vi/page/more/rlogin/telnet only work once), but once
>X/xterm/otherXclients are used, you can't even get a character to echo
>in the xterm for 2-5 minutes. Moving the mouse causes the load average
>to shoot over 15 (which is probably wrong since the clock is screwed),
>and all manner of screen events happen - windows cycle automatically,
>menus pop up / pull down automatically, areas of text are
>marked/yanked/pasted at random.
>-- 
>Mike Peterson, System Administrator, U/Toronto Department of Chemistry
>E-mail: system@alchemy.chem.utoronto.ca
>Tel: (416) 978-7094                  Fax: (416) 978-8775

Well, I don't know about anyone else, but I've been running SR10.3
on my DN2500 (16MB, 200M disk) for about three weeks with very few
problems.  The system clock is fairly stable (it loses about 8 seconds
a day), and X windows works fine.  My standard window system is the 
MIT X11R4 server and xdm (no DM running), but I've also used the HP/Apollo 
supplied X11R3 server in shared mode.  I don't have any problems with strange 
window events happening, although I have had a couple instances where a 
window would disappear without any apparent reason.

Did the support line give you a reason for the problems on DN2500 machines?
Is it something that's configuration based or are they talking about kernel 
problems?

-- 
--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
Ken Smelcer        Glenayre Corp.           quintro!kts@lll-winken 
                   Quincy,  IL              tiamat!quintro!kts@uunet

system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (01/27/91)

In article <1991Jan26.181425.21685@quintro.uucp> kts@quintro.uucp (Kenneth T. Smelcer) writes:
>In article <1991Jan25.160628.10897@alchemy.chem.utoronto.ca> system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) writes:
>>There is a severe bug in SR10.3 on DN2500's (only DN2500's according to the
>>Hotline) - the system clock goes crazy, gaining 1-10 minutes for every
>>minute of real time, etc.
>
>     <DN2500 works at SR10.3 text deleted>
>
>Did the support line give you a reason for the problems on DN2500 machines?
>Is it something that's configuration based or are they talking about kernel 
>problems?

It is a kernel problem in the timer interrupt routine - a register is
not being saved properly.
-- 
Mike Peterson, System Administrator, U/Toronto Department of Chemistry
E-mail: system@alchemy.chem.utoronto.ca
Tel: (416) 978-7094                  Fax: (416) 978-8775

hanche@imf.unit.no (Harald Hanche-Olsen) (01/28/91)

In article <1991Jan26.181425.21685@quintro.uucp> kts@quintro.uucp (Kenneth T. Smelcer) writes:

   In article <1991Jan25.160628.10897@alchemy.chem.utoronto.ca> system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) writes:
   >There is a severe bug in SR10.3 on DN2500's (only DN2500's according to the
   >Hotline) - the system clock goes crazy, gaining 1-10 minutes for every
   >minute of real time, with the gains being worse the heavier the load
   >is on the system.
[...]
   >Moving the mouse causes the load average
   >to shoot over 15 (which is probably wrong since the clock is screwed),
[...]

   Well, I don't know about anyone else, but I've been running SR10.3
   on my DN2500 (16MB, 200M disk) for about three weeks with very few
   problems.  The system clock is fairly stable (it loses about 8 seconds
   a day), and X windows works fine.

We have seen problems similar to those described by Mike.  The most
reproducible way to provoke this behaviour is as follows: Someone
logged in on a node other than the 2500 starts compiling a big file in
a directory which is on the 2500.  Once every minute or so, the load
jumps sky high, and the clock jumps forward by a couple minutes.
Meanwhile, the poor guy who is trying to use the 2500 screen is stuck,
unable to do a thing.  We "solved" the problem by moving everybody's
home directory away from the node, after which the problem is still
present but not nearly so noticable.  (The guy we hired to take care
of our computers was supposed to follow up on this, but he quit before
Christmas and apparently never got around to it, which means I will
have to do it (sigh)).  Anyway, some clock racing was still present,
but after we installed xntpd on all our machines the 2500's xntpd has
managed to keep the clock in line, more or less.

The advice to not install sr10.3 on a 2500 before the patch is out is
probably a good one, although Kens experience implies that not all
machines will be bitten by this bug.  That could explain why neither
HP nor the beta testers ever saw it.

- Harald Hanche-Olsen <hanche@imf.unit.no>
  Division of Mathematical Sciences
  The Norwegian Institute of Technology
  N-7034 Trondheim, NORWAY

krowitz@RICHTER.MIT.EDU (David Krowitz) (01/28/91)

Hmmm ... your message indicates that you see the error on DN2500's that
are providing file service. The DN2500's we have are all diskless machines,
which may be why we never saw this particular problem during the beta-test.

This strengthens my conviction on the need for a pre-beta-release testing
lab ...


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter.mit.edu@eddie.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

kts@quintro.uucp (Kenneth T. Smelcer) (01/29/91)

In article <HANCHE.91Jan27181040@hufsa.imf.unit.no> hanche@imf.unit.no (Harald Hanche-Olsen) writes:
>In article <1991Jan26.181425.21685@quintro.uucp> kts@quintro.uucp (Kenneth T. Smelcer) writes:
>
> In article <1991Jan25.160628.10897@alchemy.chem.utoronto.ca> system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) writes:
>>>There is a severe bug in SR10.3 on DN2500's (only DN2500's according to the
>>>Hotline) - the system clock goes crazy, gaining 1-10 minutes for every
>>>minute of real time, with the gains being worse the heavier the load
>>>is on the system.
>>
>> Well, I don't know about anyone else, but I've been running SR10.3
>> on my DN2500 (16MB, 200M disk) for about three weeks with very few
>> problems.  The system clock is fairly stable (it loses about 8 seconds
>> a day), and X windows works fine.
>
>We have seen problems similar to those described by Mike.  The most
>reproducible way to provoke this behaviour is as follows: Someone
>logged in on a node other than the 2500 starts compiling a big file in
>a directory which is on the 2500.  [...]

I think this is why I haven't had any problems with my DN2500.  This
node has had some sporatic disk time-out errors, so we haven't loaded
anything on it except the OS.  It sounds like the problems are obvious
only when there's heavy usage of the 2500's local disk.

BTW, thanks to Mike for letting us know about this MAJOR problem.  I wish
HP/Apollo would let people know about problems like these (at least people
who are register DN2500 owners.)

-- 
--=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=--
Ken Smelcer        Glenayre Corp.           quintro!kts@lll-winken 
                   Quincy,  IL              tiamat!quintro!kts@uunet

system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (01/29/91)

In article <1991Jan28.172841.9547@quintro.uucp> kts@quintro.uucp (Kenneth T. Smelcer) writes:
>BTW, thanks to Mike for letting us know about this MAJOR problem.  I wish
>HP/Apollo would let people know about problems like these (at least people
>who are register DN2500 owners.)

Your welcome, and I agree 100% -- this sort of information should be
forwarded IMMEDIATELY to all local offices (who should contact all
their DN2500 customers) and IMMEDIATELY to all known customers,
and posted on comp.sys.apollo by someone from HP.
A notice should also be inserted in all outgoing SR10.3 shipments as of
the day the problem was verified.
-- 
Mike Peterson, System Administrator, U/Toronto Department of Chemistry
E-mail: system@alchemy.chem.utoronto.ca
Tel: (416) 978-7094                  Fax: (416) 978-8775