sun@me.utoronto.ca (Andy Sun Anu-guest) (01/13/90)
There is a strange thing happen to an Apollo DN3500 that I've never seen and doesn't seem to find a cure for it. We've phoned up Apollo tech. support and they don't have a clue either. I was wondering some Apollo experts on the net might shed some light on it. The problem is this: The Apollo that has problem is connected to an Apollo ring, consisting of two other Apollo DN3000s. It is running SR10.1. The other two machines are running SR9.7 (yucky, isn't it?). I don't know if anyone has done something to it or it decided to go on strike itself, it just won't load DM. I tried soft boot it (shutdown and reboot) and hardware reset (press the little white button at the back of it). Everything went well (disk check passed, salvage boot volume okay, loading Init okay) but that's it. It threw me back to a "Phase II Shell" with a ")" prompt instead of loading the display manager and ask for login. I manually type in dm to try loading DM from that shell, but all I got was something like (pm_$init) 3040001. I can still access its disk from the other SR9.7 machines (so the network is probably okay), but I cannot get it to do anything else. Anyone have seen this kind of thing happened before? Any help/suggestions are welcome. Andy -- _______________________________________________________________________________ Andy Sun | Internet: sun@me.utoronto.ca University of Toronto, Canada | UUCP : csri.toronto.edu!me.utoronto.ca!sun Dept. of Mechanical Engineering | BITNET : sun@me.utoronto.BITNET
kts@quintro.uucp (Kenneth T. Smelcer) (01/16/90)
In article <1990Jan12.172440.851@me.toronto.edu> sun@me.utoronto.ca writes: >The Apollo that has problem is connected to an Apollo ring, consisting >of two other Apollo DN3000s. It is running SR10.1. The other two machines >are running SR9.7 (yucky, isn't it?). I don't know if anyone has done >something to it or it decided to go on strike itself, it just won't >load DM. I tried soft boot it (shutdown and reboot) and hardware reset >(press the little white button at the back of it). Everything went >well (disk check passed, salvage boot volume okay, loading Init okay) >but that's it. It threw me back to a "Phase II Shell" with a ")" prompt >instead of loading the display manager and ask for login. I manually >type in dm to try loading DM from that shell, but all I got was something >like (pm_$init) 3040001. I can still access its disk from the other SR9.7 >machines (so the network is probably okay), but I cannot get it to do >anything else. Anyone have seen this kind of thing happened before? Any >help/suggestions are welcome. >_______________________________________________________________________________ >Andy Sun | Internet: sun@me.utoronto.ca >University of Toronto, Canada | UUCP : csri.toronto.edu!me.utoronto.ca!sun >Dept. of Mechanical Engineering | BITNET : sun@me.utoronto.BITNET We've had a similar problem with one of our DN3000s running 10.1. The problem is that the /etc/ttys file (actually `node_data/etc/ttys) occasionally disappears. Without this file, Init doesn't know to start DM or any other login process, so it just exits. The solution is to re-create the file from another node, SHUTdown (not Reset) the problem node and reboot. Hope this helps! -- -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Ken Smelcer Glenayre Corp. quintro!kts@lll-winken Quincy, IL tiamat!quintro!kts@uunet
lampi@pnet02.gryphon.com (Michael Lampi) (01/16/90)
Sounds to me like you might have the "service mode" switch set to "service". It's a small toggle switch adjacent to the reset button. If it's in the down position, then it is in service mode. BTW, you can start the DM at the Phase II Shell by entering the "GO" command. Michael Lampi MDL Corporation 213/782-7888 fax 213/782-7927 UUCP: {ames!elroy, <routing site>}!gryphon!pnet02!lampi INET: lampi@pnet02.gryphon.com "My opinions are that of my corporation!"
dbfunk@ICAEN.UIOWA.EDU (David B Funk) (01/16/90)
In posting <1990Jan12.172440.851@me.toronto.edu> Andy Sun <sun@me.utoronto.ca> writes: > There is a strange thing happen to an Apollo DN3500 ... The problem is > this: > > The Apollo that has problem is connected to an Apollo ring, consisting > of two other Apollo DN3000s. It is running SR10.1. The other two machines > are running SR9.7 (yucky, isn't it?). I don't know if anyone has done > something to it or it decided to go on strike itself, it just won't > load DM. I tried soft boot it (shutdown and reboot) and hardware reset > (press the little white button at the back of it). Everything went > well (disk check passed, salvage boot volume okay, loading Init okay) > but that's it. It threw me back to a "Phase II Shell" with a ")" prompt > instead of loading the display manager and ask for login. I manually > type in dm to try loading DM from that shell, but all I got was something > like (pm_$init) 3040001. I can still access its disk from the other SR9.7 > machines (so the network is probably okay), but I cannot get it to do > anything else. Anyone have seen this kind of thing happened before? Any > help/suggestions are welcome. I saw something like this once. I saw 2 machines at another site that were sick, one wouldn't get out of the Phase II shell, the other wouldn't load the DM. In both cases it was a user (dumb sys_admin) that caused the problem. This person had logged in as "root" and had opened a DM edit pad on "/etc/init" on the first machine and on "/etc/dm_or_spm" on the second. This seriously wounded these 2 programs so that they couldn't do their job. I would guess that if the DM itself (/sys/dm/dm) were shot you'd get that kind of problem. There are possibly other system files that could cause such problems if they were messed up (such as /sys/env, /etc/sys.conf, or something in /lib). Try poking around in /sys, /sys/node_data, /etc, & /lib and look for a system file that has been modified recently that shouldn't have been. Look in `node_data/system_logs on that machine to see if there's some kind of error log file that you can read. If you had another sr10 machine handy it would help alot. You could just start doing 'cmt's on various system directories. Try starting the "SPM" to see if it is just the DM that is hurt or something more basic. At the Phase II shell prompt, type in "spm" to force it to start the server process manager. If the SPM starts OK then it's something directly connected with the DM. If the SPM croaks too then its more basic, like a messed up library or /sys/env. Dave Funk
dente@els.uucp (Colin Dente) (01/16/90)
In article <1990Jan12.172440.851@me.toronto.edu> sun@me.utoronto.ca writes: >There is a strange thing happen to an Apollo DN3500 that I've never >seen and doesn't seem to find a cure for it. We've phoned up Apollo Help might be at hand... >... it just won't >load DM. I tried soft boot it (shutdown and reboot) and hardware reset >(press the little white button at the back of it). Everything went >well (disk check passed, salvage boot volume okay, loading Init okay) >but that's it. It threw me back to a "Phase II Shell" with a ")" prompt >instead of loading the display manager and ask for login. I manually >type in dm to try loading DM from that shell, but all I got was something >like (pm_$init) 3040001. I can still access its disk from the other SR9.7 >machines (so the network is probably okay), but I cannot get it to do >anything else. Anyone have seen this kind of thing happened before? I do remember having a problem like this - though I don't recall getting a returned status of 3040001 (cleanup handler released out of order) - sounds *weird*. The problem that I had was that /dev/null had somehow been corrupted - causing the DM to gracelessly exit when you tried to start it. I tried starting the DM with debug set to 1 - no help. In the end, out of desparation, I invoked spm with debug (dunno if the debug was necessary) - and lo and behold - the last thing it did before dying was print something like 'cant open /dev/null'. One new /dev/null later, and everything was hunky dorey - far more pleasant than the OS reload that Apollo suggested. Colin Colin Dente | JANET: dente@uk.ac.man.ee.els Dept. of Electrical Engineering | ARPA: dente@els.ee.man.ac.uk University of Manchester | UUCP: ...!mcvax!ukc!man.ee.els!dente England | These might work now, but then again...
crh@BLUEBIRD.ENG.OHIO-STATE.EDU (Charlotte Hawley) (01/16/90)
I've seen this happen twice. One of the first systems where I installed SR10.1; I rebooted from the tape and redid the install as an update. A couple of file were read in and everything worked fine. A friend of my just had this problem - seems Apollo had the wrong disk controller. He was going from 10.1 to 9.7 to run some special application software.
krowitz%richter@UMIX.CC.UMICH.EDU (David Krowitz) (01/17/90)
Typing the command "stcode 3040001" gets me: leanup handler released out of order (process manager/process fault manager) Sounds like your SR10 software installation has been mucked up ... probably some of the stuff in /sys/node_data. One of the likely candiates is the /sys/boot_shell file (it the one *I* always trash!). You can copy a new version from another node. -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter.mit.edu@eddie.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)
ross@cancol.oz (Ross Johnson) (01/17/90)
In article <1990Jan12.172440.851@me.toronto.edu>, sun@me.utoronto.ca (Andy Sun Anu-guest) writes: > There is a strange thing happen to an Apollo DN3500 that I've never > seen and doesn't seem to find a cure for it. We've phoned up Apollo The same thing has happened to me twice on a DN4500 running 10.1 (including 10.1.2 I think). I believe I fixed the problem each time by copying /sys/dm/dm from another node and rebooting. I say "believe" because as far as I could tell, the old and new dm's where the same in all respects (maybe it just needs some attention now and then :-) This is the only machine this has happened to (we have a couple of DN3x00 machines in heavy use as well). The 4500 is a routing node and a server for 8 PC's running PCI. Hope this helps. +----------------------+---+ | Ross Johnson |:-)| ACSnet: ross@cancol.oz.au | Info. Sciences & Eng.|___| ARPA: ross%cancol.oz.au@uunet.uu.net | Uni. of Canberra | UUCP: {uunet,ukc}!munnari!cancol.oz.au!ross | P.O. Box 1 | CSNET: ross%cancol.oz@australia | Belconnen A.C.T. 2616 | JANET: ross%au.oz.cancol@EAN-RELAY | AUSTRALIA | BITNET: ross%cancol.oz.au@relay.cs.net +--------------------------+