[comp.sys.apollo] Diskless node booting problems

rich@eddie.MIT.EDU (Richard Caloggero) (07/02/88)

     We recently encountered problems booting one of our diskless
     nodes.  It seemed that the node couldn't find its partner.  We
checked all the relavant files and made sure netman was alive on the
partner.  All seemed in order to us, but the diskless node just
refused to cooperate with us.

     Finaly, we decided to boot it explicitly from another disked node
     (one other than its original partner).  This seemed to work.  We
then tried booting explicitly from its original partner.  Again, this
worked fine.  So, the question remains: why can't this poor diskless
node get help from its partner? Is the partner not responding to the
nodes plea for help, or is the node not asking for help?  Does anyone
out there have any ideas ...  Please e-mail your responses to me and
I'll post the results.


-- 
						-- Rich (rich@eddie.mit.edu).
	The circle is open, but unbroken.
	Merry meet, merry part,
	and merry meet again.

jec@iuvax.cs.indiana.edu (James E. Conley) (07/02/88)

	We've had something like this happen on occasion and it has usually
turn out to be either:

(1) The /sys/net/diskless_list was wrong on the boot partner, or

(2) The node was listed in multiple /sys/net/diskless_list files.

    III			Usenet:     iuvax!jec
UUU  I  UUU		ARPANet:    jec@iuvax.cs.indiana.edu
 U   I   U		Phone:      (812) 335-7729
 U   I   U		U.S. Mail:  Indiana University
 U   I   U			    Dept. of Computer Science
  UUUIUUU			    021-E Lindley Hall
     I				    Bloomington, IN. 47405
    III (Home of Bob Knight and the Indiana Hoosiers)

jec@iuvax.cs.indiana.edu (James E. Conley) (07/02/88)

	I should point out that we upgraded our boot proms about a year
ago for all of our diskless nodes.  Seems there were some bugs a while
back, but I doubt that Apollo has been shipping these prom recently.  If
you have an older diskless (DN3000 in our cases) machine, you might check
the prom version (do an 'RE' at the '>' prompt).  We are now running with
version 5.3 proms and things work better.

    III			Usenet:     iuvax!jec
UUU  I  UUU		ARPANet:    jec@iuvax.cs.indiana.edu
 U   I   U		Phone:      (812) 335-7729
 U   I   U		U.S. Mail:  Indiana University
 U   I   U			    Dept. of Computer Science
  UUUIUUU			    021-E Lindley Hall
     I				    Bloomington, IN. 47405
    III (Home of Bob Knight and the Indiana Hoosiers)

tmac@caen.engin.umich.edu (thomas allen mcleary) (07/02/88)

In article <9622@eddie.MIT.EDU>, rich@eddie.MIT.EDU (Richard Caloggero) writes:
>      We recently encountered problems booting one of our diskless
>      nodes.  It seemed that the node couldn't find its partner.  We
> checked all the relavant files and made sure netman was alive on the
> partner.  All seemed in order to us, but the diskless node just
> refused to cooperate with us.

If you send me e-mail detailing exactly what the screen display in your
attempts, I can probably help you.

Some things to check:
	1) See if the node you're booting off of has been netsvc'ed
	2) Make sure the disked node has a /sys/node_data.<nodeid>
	3) Try RE then DI N <disked node id> then EX AEGIS

Best way for me to help you is if you e-mail me.

--------------------------------------------------------------------------------
ARPAnet: tmac@caen.engin.umich.edu
USMAILnet: Tom McLeary                             "It's not whether you
	   Computer Operations Support                win or lose;
	   Univ. of Michigan/CAEN                   it's what you drive
	   231 Chrysler Center                        home in."
	   Ann Arbor, Mi. 48109
BELLnet: (313) 936-3497
--------------------------------------------------------------------------------

rees@A.CC.UMICH.EDU (Jim Rees) (07/02/88)

         We recently encountered problems booting one of our diskless
         nodes.  It seemed that the node couldn't find its partner.  We
    checked all the relavant files and made sure netman was alive on the
    partner.  All seemed in order to us, but the diskless node just
    refused to cooperate with us.

One way to narrow this down is to kill off the netman on the disked
partner, then restart it in a window (just run /sys/net/netman).
Then you'll be able to see if it actually got the request, and if so,
what it did with it.
-------
-------

achille@cernvax.UUCP (achille) (07/04/88)

I've seen another problem with diskless nodes appearing at sr9.5
(sr9.6 actually):
When a diskless crashes (or is reset), it leaves its `node_data/boot_shell
file locked and then will fail the boot sequence with a 'uid request failed'
message. The workaround (???) I've found is to "dlf -du" the boot_shell
file at the end of the `node_data/startup.whatever file. This seems to
work fine.
Hope this helps,
	Achille Petrilli