stephen@mincom.OZ (Stephen Kirby) (12/01/89)
I received a maintenance release of IRIX 3.2.1 from Silicon Graphics, which was to fix some system hangs and network problems we experienced on IRIX 3.2 on our IRIS 140 4cpu server system. IRIX 3.2.1 has run for two days then it also crashed with a new type of condition. This time CPU1 went into a mode of 100% kernel activity. The CPU's 0 2 3 were basiclly idle. User consoles started to lock up. The network died. If a console which was working started a new shell it would die. The system console was active and allowed us to run osview. However we could not kill processes. And a powerdown did nothing. So we had to reset the system. SG still seem to have a problem in getting the multi processor servers to stay up for more than a day or too. We find it very difficult to work with it at the moment. We are using the server to develop software in F77 for the mining and petroleum industries. If others are also having problems please let SG know so we can get them to fix it. Thankyou Stephen Kirby MINCOM Australia.
jmb@patton.sgi.com (Jim Barton) (12/03/89)
In article <279@mincom.OZ>, stephen@mincom.OZ (Stephen Kirby) writes: > I received a maintenance release of IRIX 3.2.1 from Silicon > Graphics, which was to fix some system hangs and network > problems we experienced on IRIX 3.2 on our IRIS 140 4cpu > server system. > > IRIX 3.2.1 has run for two days then it also crashed with a > new type of condition. This time CPU1 went into a mode of > 100% kernel activity. The CPU's 0 2 3 were basiclly idle. > User consoles started to lock up. The network died. If a console > which was working started a new shell it would die. The system > console was active and allowed us to run osview. However we could > not kill processes. And a powerdown did nothing. So we had to > reset the system. > > SG still seem to have a problem in getting the multi processor > servers to stay up for more than a day or too. > > We find it very difficult to work with it at the moment. We are using > the server to develop software in F77 for the mining and petroleum industries. > > If others are also having problems please let > SG know so we can get them to fix it. > > Thankyou > > Stephen Kirby > MINCOM > Australia. There's an awful lot of assumptions in this one, and not enough information to diagnose (and repair) the problem. Please tell us more! Second, the MP servers are very reliable. Please make sure that there aren't other conditions causing these problems (which you seem to imply have always been going on). First, what is the network environment? 3.2 puts all networking activity on processor 1, so your description immediately makes one think of some problem with your network setup. For instance, constant input or output over the built-in Ethernet can do this. Since the processor is constantly servicing the Ethernet, and you are obviously running many things which depend on networking, then the system could "seem" to hang without really doing so. Perhaps pulling the Ethernet cable when the problem occurs could immediately decide this one. You may also wish to have your FE run diagnostics on the on-board Ethernet interface to check for possible hardware problems as well. -- Jim Barton Silicon Graphics Computer Systems "UNIX: Live Free Or Die!" jmb@sgi.sgi.com, sgi!jmb@decwrl.dec.com, ...{decwrl,sun}!sgi!jmb