DHOLAKIA@intellicorp.com (Functionally Obsolete) (02/23/89)
Here is a note on some diskless boot problems that we had with Sun OS 3.5. Setup : Diskless nodes (3/150s, 3/60s etc) running off two servers (3/280s) all running OS3.5. One server is a yellow pages (yp) master the other is the slave, both running ypbind and rarpd. Problem : Some of the diskless nodes would broadcast the ether address and then hang (getting no response to the request for the internet address). The only way to get them to boot would be to kill ypbind and rarpd on the associated server and then restart rarpd. This would have the problem nodes booting like crazy and you could then restart ypbind (ypbind and a new rarpd actually...). This seemed to indicate that the yellow pages was somehow screwing up the boot process. There was no discernable pattern (architecture, date of install, prom revision etc...) as to the problem nodes. Solution : After some fruitless conversations with Sun Support, we gathered the evidence and stared at it and noticed that all of the diskless Suns that had this problem had a single digit someplace in the last two fields of the ethernet address. e.g. 8:0:20:1:3:D8 or 8:0:20:1:AE:1 ^ ^ Since ether addresses also come in the flavour 8:0:20:1:68:73 the ethers file had at some point in time been prettied up by padding zeroes on the single digits. e.g. 8:0:20:1:03:D8 or 8:0:20:1:AE:01 ^^ ^^ This was a hangover from pre-yp times and posed no problem to the diskless boot path taken in the absence of yp. Yp, however turns out to be sensitive to this matter e.g. ypmatch 8:0:20:1:3:D8 ethers.byaddr ----> myopia (or whatever) ^ ypmatch 8:0:20:1:03:D8 ethers.byaddr ----> nil ^^ This was easy enough to fix. MORAL : It is not just the p's and q's that matter, check the zeroes as well. -Rajiv Comment about Sun support... A few comments about the Sun support person who was assigned to help us out. I don't know if very many people have had such experiences but this one particularly unfortunate. (a) The first exchange between us was "Can you set up an account for me to log into your system and fix the problem...?". I'd hope to hear a few questions and some diagnostic recommendations before a suggestion like that. I don't know if this is SOP in the Sun community and support circles but I am not thrilled about letting a foreign object into my system. [[ SOP == Standard Operating Procedure. --wnl ]] (b) This person billed himself as a yp expert and we went through the usual checks (tabs in hosts, ethers files etc) and after none of these yielded anything the tone of the call degraded markedly. Our yp setup takes advantage of a feature that allows you to bunch your ypfiles into a single directory instead of scattering them in /etc, in our case this was /etc/ypfiles on the ypserver. Also the ypserver passwd file does not consult yellow pages to restrict access to the ypserver. We were told that "your yp setup is all messed up..." and had to put up with grumbling about how "...can't expect to get support for non standard setups " in the hope of getting some clue to the problem. When finally we did point him to the appropriate sections of the manual which illustrated the use of such a setup, we were told that that option "did not work" and that the supporter always told his customers not to use it. We asked if this was a known bug, documented someplace because there was no change in the SunOS4.0 manual either, we got a whole bunch of hand waving about it being documented someplace in the software release bulletins (no specific reference available...) (c) The sad part about this is that the tone of the support call was more accusative than helpful and could easily have intimidated a new user.