[comp.sys.hp] Booting discless over multiple LANs

perry@hpfcdc.HP.COM (Perry Scott) (02/01/90)

I worked on Diskless for the 6.0 release, so my information might be old.
At the time, schedule pressure prevented us from supporting Diskless
protocol traffic on two LANs (ROI and all that).  Maybe that has changed.

So while running another rbootd might boot a kernel, the Server may not
be listening to the second LAN, and you'll get "Server not responding",
or some such.

Notes isn't very reliable at getting questions like these answered - if
you have a support contract, use it.  I just happened to be scanning
through comp.sys.hp.

Perry Scott
HP Ft. Collins

markl@hpbbi4.HP.COM (#Mark Lufkin) (02/02/90)

> I had everything correctly connected, and then tried to add the discless
> machine using reconfig.  I entered the proper Link Level Address, gave it
> an Internet address corresponding to the proper LAN card and waited.  
> Unfortunately it didn't boot.  Here's the reason:  In the system admin
> manual, while following the flowcharts in the trouble shooting section,
> it said that *** the remote boot daemon can only run on one LAN ***.
> I then confirmed this with Response Center.  This means I can't boot
> the machine in the other building as a discless cnode.
> 
> My question:  Has anyone managed to get around this?  Is there patches
> to the rbootd code to allow more than one LAN?  My present solution is
> to take a 55 Meg Tape/Disk drive with barebones HP-UX 6.21 up to the 
> remote machine and make it standalone.  This causes me to sysadmin two
> machines instead of one along with some possible software licensing
> problems.  My other alternatives all involve even uglier situations
> (extending lan0 to other building, remote boot off other depts. hp's on
> the lan, making it a full system{$$}).

	This is a well known feature of the HPUX diskless. The point is that
	when your system boots, it does a broadcast. If there happens to be
	a gateway or router or whatever between your two LANs then the
	message will not be passed across. The reason for implementing
	the boot procedure in this way is to make the system be able to 
	recover from network errors -  a diskless system is able to check the
	LAN if it has a failure. If it detects that the LAN is broken then
	the client will wait indefinitely for the LAN to be fixed and the
	server to respond. Another point is that it is not a ggod idea to
	boot across the backbone LAN as this will cause performance problems -
	swapping on the LAN, filesystem transfers and broadcasts (well, 
	multicasts really).

Mark Lufkin
EMC/CPS OS Technical Support
HP/Apollo Boeblingen
West Germany.

daveg@hpfcdc.HP.COM (Dave Gutierrez) (02/02/90)

Sorry, the configuration shown below is not supported. The HP discless
solution, as it exists today, will not work through gateways which is what
your 360 server now represents. The diskless transport will only work over
one of the LAN cards at a time, in this case the aiesds1 318 and aiesds2 
340 client being served on lan0. The only way to do what you want is to
put the server and all clients on the backbone, assumming that the buildings
are tied together via a bridge or repeater (i.e. not another gateway),
-or- physically extend your lan to the other building (which you specified as
undesirable). Sorry..

daveg@hpfclg

>
>
>I have a 9000/360 server in a discless environment serving a 9000/318
>and a 9000/340.  The 360 is running HP-UX 6.5.  This discless cluster
>operates fine.  All networking services and even PC-NFS work great.  We
>decided to connect our little LAN up to the corporation backbone so that 
>we could be the server for a discless client (9000/340) in the next building,
>or so we thought.  This involved adding another LAN card and modifying the
>appropriate software (/etc/rc and /etc/netlinkrc).  See figure below for
>setup.
>                                           <- building ->
>        -------               -------             |
>        |     |               |     |             |
>        | 318 |discless       | 340 |discless     |
>        |     |               |     |             |
>        -------               -------             |
>  PC's     |aiesds1              |aiesds2         |
>   |       |192.46.233.11        |192.46.233.12   |
>   ---------------------o---------                |
>                        | aiehost                 |
>                   lan0 | 192.46.233.10           |
>                    ---------                     |         -------
>                    |       |                     |         |     |
>                    |  360  |server               |         | 340 |discless(?)
>                    |       |                     |         |     |
>                    ---------                     |         -------
>                   lan1 |aiegate                  |            |aietwk1
>                        |208.10.1.254             |            |208.10.1.253
> backbone --------------o----------------------=======---------o---------
>                                                  |
>                                                  |
>                                                  |
>
>

tesler@hpcupt1.HP.COM (Joel Tesler) (02/02/90)

Another problem with this is that Diskless assumes that all clients in the
cluster can talk to one another, and your configuration violates this.  One
example that I know will break is that if a client sets the time of day in
this configuration the client will hang.  There may very well be others
problems (if it runs at all).

Joel Tesler
tesler@hpda
telnet 447-6970

ken@hpubrcf.HP.COM (Ken Green) (02/02/90)

> / hpubrcf:comp.sys.hp / rocky@hpfcmgw.HP.COM (Rocky Craig) /  6:10 pm  Jan 31, 1990 /
> > My question:  Has anyone managed to get around this?
> 
> Have you tried running another rbootd, giving it the device file of your
> backbone card?  I'm not saying it WILL work, but I can't think of a reason
> it WON'T.
> 
> By the way, even if it does work it is still UNSUPPORTED and any problems
> will be your own.  Caveat programmor.

	rbootd will try to boot the node but it seems the csps can't handle 
	multiple lans.

					Ken Green
					--ukcrc--

henkp@ruuinf.cs.ruu.nl (Henk P. Penning) (02/02/90)

Could somebody please explain why it would be difficult
for a server to serve clients on more than one ethernet.

Do the protocols rely heavily on broadcasts ?
Has it to do with synchronization ?
Or is it something that could be added quite easily ?
-- 
Henk P. Penning, Dept of Computer Science, Utrecht University.
Padualaan 14, P.O. Box 80.089, 3508 TB Utrecht, The Netherlands.
Telephone: +31-30-534106
e-mail   : henkp@cs.ruu.nl (uucp to hp4nl!ruuinf!henkp)

milburn@me10.lbl.gov (John Milburn) (02/05/90)

In article <1720001@hpbbi4.HP.COM> markl@hpbbi4.HP.COM (#Mark Lufkin) writes:
>	This is a well known feature of the HPUX discless. The point is that
>	when your system boots, it does a broadcast. If there happens to be
>	a gateway or router or whatever between your two LANs then the
>	message will not be passed across.

I think you are missing the point somewhat, Mark.  The discless
machine does indeed broadcast, but the server is the router,
so it picks up the broadcast.  It just doesn't respond.

The problem is that the server machine is unable to serve clients on
two networks, even if the server has direct interfaces to both
networks.  This is somewhat limiting.

>       [...] Another point is that it is not a good idea to
>	boot across the backbone LAN as this will cause performance problems -
>	swapping on the LAN, file system transfers and broadcasts (well, 
>	multicasts really).

This is entirely dependent on the site.  A well designed Ethernet can
handle quite a few discless machines without severe performance
penalty.  Hell, our backbone traffic consists mostly of LAVC (Local
Area Vax Cluster) traffic, even with tens of discless clients on the
main net.  All of this traffic is minimized by intelligent placement
of LanBridges.
John Milburn - Advanced Light Source - Lawrence Berkeley Laboratory
INTERNET: JEMilburn@lbl.gov   BITNET:    JEMilburn@LBL.bitnet
UUCP:      {...}!ucbvax!lbl.gov!JEMilburn
SnailMail: 1 Cyclotron Road 46-161 Berkeley, Ca. 94720  Ph:  (415) 486-6969

burzio@mmlai.UUCP (Tony Burzio) (02/05/90)

In article <1720001@hpbbi4.HP.COM>, markl@hpbbi4.HP.COM (#Mark Lufkin) writes:
> 	The reason for implementing
> 	the boot procedure in this way is to make the system be able to 
> 	recover from network errors -  a diskless system is able to check the
> 	LAN if it has a failure. If it detects that the LAN is broken then
> 	the client will wait indefinitely for the LAN to be fixed and the
> 	server to respond.

Why can't a discless node realize this after it is running?  If you
unplug a discless node from the network (accidentally one hopes :-)
it almost immediately panics.  Couldn't there be a way to make it
wait for a few minutes before panicking?

*********************************************************************
Tony Burzio               * Don't touch that ..FRZZZZZT.. cord!
Martin Marietta Labs      *	Sigh...
mmlai!burzio@uunet.uu.net *
*********************************************************************

daveg@hpfcdc.HP.COM (Dave Gutierrez) (02/07/90)

The current design of HP diskless is to limit the cnodes to a single logical
LAN (i.e. repeaters and/or bridges are OK, but not gateways. Once the kernel
is downloaded to the diskless cnode, the cnode expects its root file services,
and potentially its swap resources, to come from the machine that provided
it with the kernel. While it is true that the HP diskless implementation
provides LAN break detection, this feature has nothing to do with root server
support being limited to a single LAN card, although repeaters and bridges act
as terminators and it is difficult to detect breaks on the other side. The
inability to work over a gateway is a result of using link-level addressing
for inter-cluster communication. The use of broadcast messages is also very
limited in the implementation.

To answer the question, yes the implementation "could" be extended
to work across multiple LAN cards on the server. The rbootd daemon does
not care which card it attaches to and would be perfectly capable of
downloading a kernel from the server over two cards. The changes to support
this type of topology would require changes to the transport code and
would not be real difficult. It would require keeping track of which card
a request came in on and making sure the reply went out on the same card.
Handling broadcasts would be a little more difficult.

Overall high level algorithms play a key role in performance of distributed
systems. Special purpose light-weight networking protocols, server processes,
and network buffer management routines must all play together in the design
of such a system. Good performance not only requires a system view of the
goals to be achieved but the implementation of the design must also be
efficient.

Special purpose designs like the HP-UX diskless implementation have their
advantages and disvantages. The advantages are considerable in the context
of a closely knit work group where a single file system view, high speed
intra-cluster communication, and transparent sharing of files and access
of data is extremely important. As long as the special purpose design allows a
peaceful co-existence and complete inter-connectivity with the outside
world via standard and evolving networking services (Arpa/Berkeley, NFS,
etc), the user is then provided with a powerful combination of capabilities.
The HP-UX diskless operating system provides these capabilities.
It is only in the more limited context of wide-area connectivity for diskless
cnodes that the special purpose design shows disadvantages, Specifically,
the inability to operate across a gateway limits the range of
inter-connectivity possible. However, it is precisely this type of situation
that places other undesirable limits on the design and tends to hinder
the ability to achieve higher levels of performance.

It can certainly be argued that with new VanJacobson TCP/IP implementations
and improved buffer and data management, special purpose networking protocols
are unnecessary. The HP diskless implementation preceeds these IP modifications.

regards: daveg ( #include <disclaimer> )

daveg@hpfcdc.HP.COM (Dave Gutierrez) (02/08/90)

>
>> 	The reason for implementing
>> 	the boot procedure in this way is to make the system be able to 
>> 	recover from network errors -  a diskless system is able to check the
>> 	LAN if it has a failure. If it detects that the LAN is broken then
>> 	the client will wait indefinitely for the LAN to be fixed and the
>> 	server to respond.

	Actually, the booting procedures have nothing to do with LAN
	Break detection. Network errors fall into a class all their own,
	that is not really applicable to the topic.
>
>Why can't a discless node realize this after it is running?  If you
>unplug a discless node from the network (accidentally one hopes :-)
>it almost immediately panics.  Couldn't there be a way to make it
>wait for a few minutes before panicking?
>
>*********************************************************************
>Tony Burzio               * Don't touch that ..FRZZZZZT.. cord!
>Martin Marietta Labs      *	Sigh...
>mmlai!burzio@uunet.uu.net *
>*********************************************************************

I guess I will try to provide a high=level description of how the
lan-break detection works.

The diskless HP-UX protocol in conjunction with the recovery and selftest
code is capable of frequently surviving a broken or unterminated LAN
cable [1]. However, there are LAN cable topological configurations that must be
considered prior to configuring a diskless cluster.

At all times the integrity and survivability of the diskless cluster 
should be maintained.  The diskless LAN break detection and recovery code
will not detect a broken or unterminated LAN if the diskless cnodes and
their respective root server cnode are on opposite sides of either a LAN bridge
box, LAN repeater, or any other device that acts as a terminator (Fig 1).
In addition, if the MAU or AUI cable is disconnected from a diskless cnode,
the rootserver, after certain selftest periods, will probably declare the
cnode dead. This situation is only detectable on the diskless cnode in
question; the rest of the backbone is still functioning correctly. The
converse situation is where the root server's MAU or AUI cable is disconnected.
This will most likely result in the diskless cnodes losing contact with
the root server.

If the LAN cable is broken or unterminated the following messages may be
received.  These messages are considered recoverable temporary failures.

	o Suspected backbone cable not properly terminated.
	o Suspected backbone cable not properly terminated or MAU disconnect.
	o Suspected AUI cable disconnected from MAU or grounded backbone cable.

The following messages may be received and are non-recoverable failures.

	o Panic(Diskless: LAN Failure, Unknown Cause)
	o Panic(FATAL ERROR: DISKLESS LAN FAILURE: Card State = X)
	  (where X is a number that is interpreted for you as one of the
	   following panics).

	Panics:  
		Panic(Diskless: LAN Interface Card Failure)
		Panic(Diskless: LAN Link Failure)
		Panic(Diskless: LAN Hardware Failure)
		Panic(Diskless: Lan Failure, Invalid Card State)


	---------	---------	---------	
	|  W	|	|  W	|	|  W	|
	|	|	|	|	|	|
	---------	---------	---------	
	   |                |               |
    o---------------------------------------------------o  LAN Segment A
							|
						  -------------
						  |   Bridge  | 
						  ----- or ----
						  |  Reapeater |
						  -------------
							|
    o---------------------------------------------------o LAN Segment B
	   |                |               |
	---------	---------	---------	
	|  W	|	|  W	|	|  S	|
	|	|	|	|	|	|
	---------	---------	---------	


	If segment B were broken or unterminated, the diskless cnodes (W)
	on segment A would lose contact with their root server (S). The
	diskless cnodes and root server on segment B would recognize the 
	problem and continue local processing (if possible) and wait 
	until the LAN in repaired. There may be a slight delay after the
	LAN is repaired for the recovery code to declare the LAN as
	being UP and for transmissions/receptions to continue.

			Fig. 1



			Footnotes
			--------
[1] At no time is it recommended that backbone cable reconfigurations be done
on an active diskless cluster. Good practice dictates that the entire
cluster be shut down prior to performing any sort of backbone
cable maintenance.

markl@hpbbi4.HP.COM (#Mark Lufkin) (02/10/90)

	OK ... so I didn't read the question properly ( guess it was
	Monday morning or something). Anyway, having looked a bit further
	into the question (and seen some of the other responses) the answer
	is obviously that it is not supported. rbootd may well work (and
	according to Ken Green, it does) however diskless relies to
	a large extent on being able to broadcast (clustercast) to all the
	nodes in the cluster for info. The clustercard hardware address is
	listed at the beginning of the clusterconf file -> just one lan card
	is accepted therefore no clustercast to the other nodes. It does not
	to add a second line - diskless justs complains (ccck).

----

Mark "got to learn to read :-(" Lufkin
OS Tech Support
EMC/CPS
HP/Apollo GmbH

jim@hpuinda.HP.COM (Jim Cooper) (02/11/90)

You may want to look at the kernel entry: check_alive_period

jim cooper
Indianapolis