[comp.sys.next] WindowManager crash-out

rogerj@batcomputer.tn.cornell.edu (Roger Jagoda) (09/12/89)

We just had a weird system crash on our NeXT cubes and I'm pleading to
the system folks at NeXT for help (Avi?) for some insights.
 
We run a NeXT-only net in this arrangement with one "/" file-server
(netinfo-wise) and four other client servers. There are also 25 diskless
clients that use the main server for kernel launch but find all other
programs on their respective file-servers. Basically, the main file     
server no longer seems to launch WindowManager. The sceen comes up
with a "login:" prompt just as if someone typed CONSOLE and hit return
twice. You can log in, and netinfo seems to be working (niutil works)
and the group loaded into "." is recognized, but if I tail the file:
 
/usr/adm/messages
 
for errors I see a BIG problem. The last few lines are as follows:
 
NFS IO error in pagein (bread)  ---pass 9
NFS IO error in pagein (bread)  ---FATAL
 
target 0 media error -- FATAL timeout
 
Now does this mean that the hard drive (a 660MB) is just trashed?
 
FSCK reports no errors even after repeated reboots. The other servers
and clients all get their window managers going fine, and SUM reports
that the file on the main server (vmunix, WindowManager) are the same
as on the other machines. BTW, we nfs mount everything among the servers
and I have discovered that netinfo and NFS don't really get along that
great. Is there anything I can do to remedy this situation? We haven't
had anything like this ever before so our experience is just a little
limited. Really, if anyone has ever seen anything like this, or someone
at NeXT know where the problem is...PULEESE don't be shy. Post away, or
better yet mail to below as this machine is really no ours...just
borrowed. Thanks in advance.
 
Roger Jagoda
Cornell University
(607) 255-8960
FQOJ@CORNELLA.CIT.CORNELL.EDU
 

avie@wb1.cs.cmu.edu (Avadis Tevanian) (09/13/89)

In article <8828@batcomputer.tn.cornell.edu> rogerj@tcgould.tn.cornell.edu (Roger Jagoda) writes:
>Basically, the main file     
>server no longer seems to launch WindowManager. The sceen comes up
>with a "login:" prompt just as if someone typed CONSOLE and hit return
>twice. You can log in, and netinfo seems to be working (niutil works)

At this point it looks like someone has edited your /etc/ttys file and
changed it to run a getty instead of the WindowServer.  However, after
reading on...

>and the group loaded into "." is recognized, but if I tail the file:
> 
>/usr/adm/messages
> 
>for errors I see a BIG problem. The last few lines are as follows:
> 
>NFS IO error in pagein (bread)  ---pass 9
>NFS IO error in pagein (bread)  ---FATAL
> 
>target 0 media error -- FATAL timeout
> 

I first wonder why you are getting NFS IO errors, which server might you be
referencing and does it seem to be functioning normally?  The target 0 media
error looks like some type of disk drive problem.

In any case, it looks like you are experiencing some type of disk failure
and should seek hardware help.

	Avie
-- 
Avadis Tevanian, Jr.    (Avie)
Manager, System Software Group / Chief Operating System Scientist
NeXT, Inc.
avie@cs.cmu.edu or avie@NeXT.com