[comp.sys.sgi] IRIX 3.2.1 maintenance release problem

stephen@mincom.OZ (Stephen Kirby) (12/01/89)

I received a maintenance release of IRIX 3.2.1 from Silicon
Graphics, which was to fix some system hangs and network
problems we experienced on IRIX 3.2 on our IRIS 140 4cpu
server system.

IRIX 3.2.1 has run for two days then it also crashed with a 
new type of condition.   This time CPU1 went into a mode of
100% kernel activity.  The CPU's 0 2 3 were basiclly idle.
User consoles started to lock up.  The network died.  If a console
which was working started a new shell it would die.  The system
console was active and allowed us to run osview.  However we could
not kill processes.  And a powerdown did nothing.  So we had to
reset the system.

SG still seem to have a problem in getting the multi processor
servers to stay up for more than a day or too.

We find it very difficult to work with it at the moment.  We are using 
the server to develop software in F77 for the mining and petroleum industries.

If others are also having problems please let
SG know so we can get them to fix it.

Thankyou

Stephen Kirby
MINCOM
Australia.

jmb@patton.sgi.com (Jim Barton) (12/03/89)

In article <279@mincom.OZ>, stephen@mincom.OZ (Stephen Kirby) writes:
> I received a maintenance release of IRIX 3.2.1 from Silicon
> Graphics, which was to fix some system hangs and network
> problems we experienced on IRIX 3.2 on our IRIS 140 4cpu
> server system.
> 
> IRIX 3.2.1 has run for two days then it also crashed with a 
> new type of condition.   This time CPU1 went into a mode of
> 100% kernel activity.  The CPU's 0 2 3 were basiclly idle.
> User consoles started to lock up.  The network died.  If a console
> which was working started a new shell it would die.  The system
> console was active and allowed us to run osview.  However we could
> not kill processes.  And a powerdown did nothing.  So we had to
> reset the system.
> 
> SG still seem to have a problem in getting the multi processor
> servers to stay up for more than a day or too.
> 
> We find it very difficult to work with it at the moment.  We are using 
> the server to develop software in F77 for the mining and petroleum industries.
> 
> If others are also having problems please let
> SG know so we can get them to fix it.
> 
> Thankyou
> 
> Stephen Kirby
> MINCOM
> Australia.

There's an awful lot of assumptions in this one, and not enough information
to diagnose (and repair) the problem.  Please tell us more!

Second, the MP servers are very reliable.  Please make sure that there
aren't other conditions causing these problems (which you seem to imply
have always been going on).

First, what is the network environment?  3.2 puts all networking activity
on processor 1, so your description immediately makes one think of some
problem with your network setup.  For instance, constant input or output
over the built-in Ethernet can do this.  Since the processor is constantly
servicing the Ethernet, and you are obviously running many things which
depend on networking, then the system could "seem" to hang without really
doing so.  Perhaps pulling the Ethernet cable when the problem occurs
could immediately decide this one.

You may also wish to have your FE run diagnostics on the on-board Ethernet
interface to check for possible hardware problems as well.

-- Jim Barton
Silicon Graphics Computer Systems    "UNIX: Live Free Or Die!"
jmb@sgi.sgi.com, sgi!jmb@decwrl.dec.com, ...{decwrl,sun}!sgi!jmb