[comp.sys.att] Unix-pc lockup problems - A follow-up

jeff@cjsa.UUCP (C. Jeffery Small) (12/30/87)

Thanks to everyone who sent me reports of their problems with the Unix-pc
locking up.  This appears to be a widespread problem and is being looked
into by AT&T software engineers.  The engineers have asked me a question
which I have never checked during a lockup and I thought that some of you
who are experiencing the problem could check this out the next time the
problem occurs.

During lockup, keyboard input is typically ignored.  The question is: Do
the Caps-Lock and Num-Lock lights work (ie come on and off) when the machine
has crashed?  Mail me the results if you [unfortunately] get the opportunity
to verify this.

I'll keep these newsgroups posted of any future results.
--
Jeffery Small          (203) 776-2000     UUCP:   uunet!---\
C. Jeffery Small and Associates	                  ihnp4!--- hsi!cjsa!jeff
123 York Street, New Haven, CT  06511          hao!noao!---/

wtm@neoucom.UUCP (Bill Mayhew) (01/03/88)

Re: the question.  Yes, the keyboard capslock and numlock LEDs
still work while the machine is brain-dead.

New info:  by doing some reboots, I noticed that on my mahcine,
ps -fe always yields up that wmgr is process 82.  Seems to always
wind up as the 82nd process.  The kernel addressing fault painic
that I cited in my previous message had -- you guessed it --
process 82 to blame.

Sighhhhhh.


--Bill

I wish the damn hotline would quit pretending like they never
heard of this problem before.

allbery@axcess.UUCP (Brandon S. Allbery) (01/04/88)

Just in case anyone's interested:  I've managed to duplicate that lockup.
Notable is that it happened not long after I started rearranging things
around the computer... most notably, that d*mned printer cable.  I still
suspect spurious interrupts; but the printer may not be the only device
capable of sending them.  (Serial ports?  Maybe even bad termination on the
expansion ports?)

I was able to get the window manager to change the current window, but text
output didn't work, and the machine never got to a point in any window where
it was ready for input.  No, I didn't think to check the Caps Lock or Num Lock.

Possibly related?  I've been having a few other oddities:

Using "windy" too many times, or loading/unloading fonts (even the ones that
come with the machine) will cause an "su" in a subwindow to echo the password,
and immediately hang.  It *can* be interrupted without any problems.  I've
noticed that this tends to make the pre-crash sequence happen much sooner...
and this time, the actual crash as well.  (It appears to be based on the
"parent" window; log out and log back in (which closes and re-opens your login
window) and "windy" again works fine... for a while.)

I saw another unusual thing as well:  a program which up until just before the
crash worked perfectly suddenly started spitting out "calloc returned NULL in
_makenew" (yes, it uses curses/terminfo) errors when run.  The pre-crash
sequence began immediately afterward, when I fired up Emacs to look at the
program source....

Conclusion:  I strongly suspect a problem where the windaemon is somehow
interacting with the page daemon.  Spurious interrupts could be causing the
latter to go into some strange state, font mounting and/or whatever "windy"
does to create new windows could be confusing the former, and the two
apparently decide to get into a fight with each other.  (Maybe windaemon is
causing a massive number of page faults?)  The page daemon's involvement would
also explain the "out of memory" aspect.
-- 
 ___  ________________,	Brandon S. Allbery	       cbosgd \
'   \/  __   __,  __,	aXcess Company		       mandrill|
 __  | /__> <__  <__	6615 Center St. #A1-105		       !ncoast!
/  ` | \__. .__> .__>	Mentor, OH 44060-4101	       necntc  | axcess!allbery
\___/\________________.	Moderator, comp.sources.misc   hoptoad/

farren@gethen.UUCP (Michael J. Farren) (01/06/88)

In article <904@neoucom.UUCP> wtm@neoucom.UUCP (Bill Mayhew) writes:
>
>The kernel addressing fault painic
>that I cited in my previous message had -- you guessed it --
>process 82 to blame.

I've gotten kernel panics and (horrors!) Double panics, and each and
every time it's been wmgr that's been the offending process.  Please
note that all of this seems to have stopped, now that I no longer
use the phone lines from the computer - only an external modem
connected to tty000.

>I wish the damn hotline would quit pretending like they never
>heard of this problem before.

Second that!

-- 
Michael J. Farren             | "INVESTIGATE your point of view, don't just 
{ucbvax, uunet, hoptoad}!     | dogmatize it!  Reflect on it and re-evaluate
        unisoft!gethen!farren | it.  You may want to change your mind someday."
gethen!farren@lll-winken.llnl.gov ----- Tom Reingold, from alt.flame 

wtm@neoucom.UUCP (Bill Mayhew) (01/08/88)

<<internal modem and panics, etc.>>

I think the problem has been narrowed down to uucico choking on
garbage characters it gets as a result of cutting off the con-
nection after an incoming uucp transfer.  Apparently this causes
hardware interrupts to be generated at a furious rate.  It also
prevents getty from respawing for that port.

I apologize for any gainsaying I've done upon the Hotline.  They
have cooked up a new improved uucico to try.  The nice person at
the hotline is going to download me a copy to test out.

I'll advise the Net on success or lack there of with the new
patched-up uucico.

--Bill

Thanks to all who have helped find the gremlin.

lenny@icus.UUCP (Lenny Tropiano) (01/11/88)

Here is what happened this evening.  My machine was unattented all day
as I was out of town.  I came home at 2:00am and pressed a key to wake up
the screen saver.  Lo and behold I noticed it was talking to one of
my UUCP connections indicated by the status line (phone daemon I wrote).
Now I know not everyone is running this, and this problem existed *LONG*
before I even started work with phdaemon.  The clock indicated it was
just 9:56pm (even though it was 2:00am).  The keyboard did not respond
in echoing characters and the CAPS/NUM LOCK keys did work (they lit up).

Oh well, had to search for that RESET button at 2am, that was a chore! :-)

The people who are on the war-path at AT&T looking for this problem should
definately go on the idea of the phone manager/window manager problem after
uucico dies.  NOTE:  I am running HDB UUCP, so it isn't just inherent in
the generic UUCP.

							-Lenny
-- 
============================ US MAIL:   Lenny Tropiano, ICUS Computer Group
 IIIII   CCC   U   U   SSSS             PO Box 1
   I    C   C  U   U  S                 Islip Terrace, New York  11752
   I    C      U   U   SSS   PHONE:     (516) 968-8576 [H] (516) 582-5525 [W] 
   I    C   C  U   U      S  AT&T MAIL: ...attmail!icus!lenny  TELEX: 154232428
 IIIII   CCC    UUU   SSSS   UUCP:
============================    ...{uunet!godfre, harvard!talcott}!\
                   ...{ihnp4, boulder, mtune, bc-cis, ptsfa, sbcs}! >icus!lenny 
"Usenet the final frontier"        ...{cmcl2!phri, hoptoad}!dasys1!/