[comp.sys.sun] Booting from server over network

poffen@lookitthat (Russ Poffenberger) (09/01/89)

I thought that maybe it was something just at my site, but it seems
somebody else has seen it too. Maybe somebody out there can shed some
light on this.

In a recent article, somebody wrote about how to install a CDC Wren V
drive.  In the story, they indicated that after configuring the disk, they
forgot to run installboot (hasn't everybody done this at least once?) so
the machine would not boot. No problem they said, boot as a client of
another machine.

Now the interesting part...

They said it took a long time just to load tftpboot and such and got a lot
of not responding messages. I have seen the same thing on my network.
Sometimes (not often anymore though) I needed to boot one of my servers as
a client of the other server so I could change the disks and controllers
and such. While trying to tftpboot, the spinning prompt literally CRAWLS
and takes upwards of 10-15 minutes to load boot. Then the same story for
loading vmunix. This was very frustrating. Once booted, it seems OK, no
NFS problems with other server or anything like that. I have noticed this
if the client is almost or more powerful a machine than the server. ie
booting a 3/260 from a 3/260 doesn't work well. Booting a 3/160, 3/50,
3/75 from a 3/260 works fine, although we have a couple of 3/60's that are
diskless that exhibit this problem, although to a lesser degree. Sometimes
aborting (L1-a) and re-booting will get things in sync and it will go
smoothly.

We are running a network of ~40 sun clients from 2 3/260 servers under 4.0.3.

Does anybody have any suggestions?


Here is a part of the article somebody posted that prompted my article...

>Once we got all of our files restored from the remote tape drive.  We
>tried to reboot.  Oops, we were supposed to run installboot.  Now our disk
>would not boot, and our tape drive was inoperative.  There is always
>another way.  We proceeded to install the Sun3 SunOS on a nearby Sun4, so
>that we could boot it in client mode.  After much frustration, we got it
>to work.  However, while booting from the Ethernet, it got caught in the
>Size: prompt where is says 'Size: #####|/-\####|/-\#####'.  When it got to
>the first spinning prompt, it slowed way down, and took a long time.
>Every now and then it would say something like the server was not
>responding, and then it would recover and we would see a flurry of network
>activity.  What made things worse, was that sometimes once it booted, it
>complained that it could not find its domain, and we had to reboot.


Thanks in advance,
Russ Poffenberger
Schlumberger Technologies
poffen@sj.ate.slb.com

carl@doctor.tymnet.com (10/04/89)

We have experienced exactly the same "slow" response.  We have 6 3/280s
acting as a large NFS system and wanted to use an alternate root/swap/usr
partition (as a client of another server) in case of catastrophic failure
of the primary disk on each server.

It took more than 15 minutes to boot the server as a client of another one
of our servers.  This is appalling!!!

-Carl

+-----------------------------------------------------------------------------+
| TYMNET:  CARL@D35, carl@doctor UUCP: {ames|pyramid}oliveb!tymix!doctor!carl |
| INTERNET:carl@doctor.Tymnet.COM         PHONE: Carl Baltrunas (408)922-6206 |
+-----------------------------------------------------------------------------+

jim@eda.com (Jim Budler) (10/06/89)

carl@doctor.tymnet.com writes:

} We have experienced exactly the same "slow" response.  We have 6 3/280s
} acting as a large NFS system and wanted to use an alternate root/swap/usr
} partition (as a client of another server) in case of catastrophic failure
} of the primary disk on each server.

} It took more than 15 minutes to boot the server as a client of another one
} of our servers.  This is appalling!!!

} -Carl

I had this problem. I'm booting off another workstation since our
fileserver is not yet upgraded to 4.0.

Doing a pstat -T on the serving workstation during boot I found that 100%
of the inodes were in use. Having the user close several windows at this
time enabled the diskless station to complete booting.

Repeating these tests with various kernels on the serving workstation,
under various conditions led me to the conclusion that *during boot* the
serving workstation must require huge number of inodes for the boot
process. After loading the boot with tftp the usage appeared to drop to
much more reasonable levels.

Upping maxusers on the serving workstation helped by increasing the number
of inodes available. Of course this has some detrimental effect on the
user's normal operation if you carry it too far and make the kernel
manipulate huge tables all the time.

My conclusions:

USAGE                   maxusers
Text Processing            4       (graphic equipped machine)
News Server               10       (my poor lil' 386i 8^)
Programming/Dbxtool        8       (graphic equipped machine)
Prog./Dbxtool/Serving one 10       (graphic equipped machine)

Add one or two maxusers for each additional client. Sun's comments in the
config file say one, my impression was two. Your call.

Under 3.5 my fileserver (no graphics) boots 4 clients fine with maxusers
8, but I suspect that under 4.0 maxusers 12 will be more appropriate.

Now, does anyone know why tftpboot uses so much resource? I watched, and
it's just the process of getting that first boot program that does it.
Although it's hard to be sure, my impression was that loading the kernel
took far less resource than tftp'ing the boot program.  And it's much
larger.

Jim Budler   address = uucp: ...!{decwrl,uunet}!eda!jim
                     domain: jim@eda.com
                 compuserve: 72415,1200
voice     = +1 408 986-9585    fax     = +1 408 748-1032

gary@svx.sv.dg.com (10/12/89)

I'll trot my ignorance out as a possible solution to your slow net boot.
We have a sub-netted 128 address - let's call it 128.1.2.3 where 2 is the
subnet and 3 is the host address - netmask 0xffffff00. When we did

b le(0,0,3) -a

The net began booting 128.1.0.3 at the speed of dark. We tried it a few
times and it did the same thing. Then it hit me that tftp had no notion of
our netmask - it was broadcasting all over our 128.1 net and eventually
getting a reply ("all over" being about 4000 miles in this case). When we
did

b le(0,0,203) -a

it booted in about 4 seconds - from a 3/50. Now it was asking for
128.1.2.3!

Gary Bridgewater, Data General Corp., Sunnyvale Ca.
gary@sv4.ceo.sv.dg.com or 
{amdahl,aeras,amdcad,mas1,matra3}!dgcad.SV.DG.COM!gary
No good deed goes unpunished.