poffen@lookitthat (Russ Poffenberger) (09/01/89)
I thought that maybe it was something just at my site, but it seems somebody else has seen it too. Maybe somebody out there can shed some light on this. In a recent article, somebody wrote about how to install a CDC Wren V drive. In the story, they indicated that after configuring the disk, they forgot to run installboot (hasn't everybody done this at least once?) so the machine would not boot. No problem they said, boot as a client of another machine. Now the interesting part... They said it took a long time just to load tftpboot and such and got a lot of not responding messages. I have seen the same thing on my network. Sometimes (not often anymore though) I needed to boot one of my servers as a client of the other server so I could change the disks and controllers and such. While trying to tftpboot, the spinning prompt literally CRAWLS and takes upwards of 10-15 minutes to load boot. Then the same story for loading vmunix. This was very frustrating. Once booted, it seems OK, no NFS problems with other server or anything like that. I have noticed this if the client is almost or more powerful a machine than the server. ie booting a 3/260 from a 3/260 doesn't work well. Booting a 3/160, 3/50, 3/75 from a 3/260 works fine, although we have a couple of 3/60's that are diskless that exhibit this problem, although to a lesser degree. Sometimes aborting (L1-a) and re-booting will get things in sync and it will go smoothly. We are running a network of ~40 sun clients from 2 3/260 servers under 4.0.3. Does anybody have any suggestions? Here is a part of the article somebody posted that prompted my article... >Once we got all of our files restored from the remote tape drive. We >tried to reboot. Oops, we were supposed to run installboot. Now our disk >would not boot, and our tape drive was inoperative. There is always >another way. We proceeded to install the Sun3 SunOS on a nearby Sun4, so >that we could boot it in client mode. After much frustration, we got it >to work. However, while booting from the Ethernet, it got caught in the >Size: prompt where is says 'Size: #####|/-\####|/-\#####'. When it got to >the first spinning prompt, it slowed way down, and took a long time. >Every now and then it would say something like the server was not >responding, and then it would recover and we would see a flurry of network >activity. What made things worse, was that sometimes once it booted, it >complained that it could not find its domain, and we had to reboot. Thanks in advance, Russ Poffenberger Schlumberger Technologies poffen@sj.ate.slb.com
carl@doctor.tymnet.com (10/04/89)
We have experienced exactly the same "slow" response. We have 6 3/280s acting as a large NFS system and wanted to use an alternate root/swap/usr partition (as a client of another server) in case of catastrophic failure of the primary disk on each server. It took more than 15 minutes to boot the server as a client of another one of our servers. This is appalling!!! -Carl +-----------------------------------------------------------------------------+ | TYMNET: CARL@D35, carl@doctor UUCP: {ames|pyramid}oliveb!tymix!doctor!carl | | INTERNET:carl@doctor.Tymnet.COM PHONE: Carl Baltrunas (408)922-6206 | +-----------------------------------------------------------------------------+
jim@eda.com (Jim Budler) (10/06/89)
carl@doctor.tymnet.com writes: } We have experienced exactly the same "slow" response. We have 6 3/280s } acting as a large NFS system and wanted to use an alternate root/swap/usr } partition (as a client of another server) in case of catastrophic failure } of the primary disk on each server. } It took more than 15 minutes to boot the server as a client of another one } of our servers. This is appalling!!! } -Carl I had this problem. I'm booting off another workstation since our fileserver is not yet upgraded to 4.0. Doing a pstat -T on the serving workstation during boot I found that 100% of the inodes were in use. Having the user close several windows at this time enabled the diskless station to complete booting. Repeating these tests with various kernels on the serving workstation, under various conditions led me to the conclusion that *during boot* the serving workstation must require huge number of inodes for the boot process. After loading the boot with tftp the usage appeared to drop to much more reasonable levels. Upping maxusers on the serving workstation helped by increasing the number of inodes available. Of course this has some detrimental effect on the user's normal operation if you carry it too far and make the kernel manipulate huge tables all the time. My conclusions: USAGE maxusers Text Processing 4 (graphic equipped machine) News Server 10 (my poor lil' 386i 8^) Programming/Dbxtool 8 (graphic equipped machine) Prog./Dbxtool/Serving one 10 (graphic equipped machine) Add one or two maxusers for each additional client. Sun's comments in the config file say one, my impression was two. Your call. Under 3.5 my fileserver (no graphics) boots 4 clients fine with maxusers 8, but I suspect that under 4.0 maxusers 12 will be more appropriate. Now, does anyone know why tftpboot uses so much resource? I watched, and it's just the process of getting that first boot program that does it. Although it's hard to be sure, my impression was that loading the kernel took far less resource than tftp'ing the boot program. And it's much larger. Jim Budler address = uucp: ...!{decwrl,uunet}!eda!jim domain: jim@eda.com compuserve: 72415,1200 voice = +1 408 986-9585 fax = +1 408 748-1032
gary@svx.sv.dg.com (10/12/89)
I'll trot my ignorance out as a possible solution to your slow net boot. We have a sub-netted 128 address - let's call it 128.1.2.3 where 2 is the subnet and 3 is the host address - netmask 0xffffff00. When we did b le(0,0,3) -a The net began booting 128.1.0.3 at the speed of dark. We tried it a few times and it did the same thing. Then it hit me that tftp had no notion of our netmask - it was broadcasting all over our 128.1 net and eventually getting a reply ("all over" being about 4000 miles in this case). When we did b le(0,0,203) -a it booted in about 4 seconds - from a 3/50. Now it was asking for 128.1.2.3! Gary Bridgewater, Data General Corp., Sunnyvale Ca. gary@sv4.ceo.sv.dg.com or {amdahl,aeras,amdcad,mas1,matra3}!dgcad.SV.DG.COM!gary No good deed goes unpunished.