klm@goon.cme.nbs.gov (Ken Manheimer) (05/06/89)
Issue: Sun OS 4.0.1 rc.local hangs if designated subnet mask is different than standard. This is fixable by one transposition in the rc.local sequence. Environment ----------- OS: Sun OS 4.0.1 both with and without "Net-Security" patches applied CPUs: Sun 3/280 (file server), 3/180, 3/60, 3/50 (clients) Subnet mask: 255.255.255.0, designated in /etc/netmasks file Problem Description ------------------- Boot hangs forever on remote mounts if any. The boot will hang on any of nfsd, rarpd, sendmail, and (maybe) rpc.statd, and skipping them using ^C will result in a message shortly being emitted like: Cannot register service: RPC: Unable to send; errno = Network is unreachable rpc.statd: unable to register service (SM_PROG, SM_VERS, udp) A boot that is "unhung" by using ^C's to progress past hanging initializations is useless - you cannot log (or remote log) into the machine and none of the net services mentioned above are provided. Eliminating activation of ypbind (by either commenting out the relevant rc.local lines or 'mv'ing /usr/etc/ypbind aside so it is not found) will circumvent all these problems but sacrifice access to the yellow pages. Starting ypbind after boot is completed in this situation compensates for this to some degree but is chancy - it will work but is more than normally susceptible to ypbind hanging at random times. Fix --- The fix simply entails transposing the initiation of routed (in.routed) in the /etc/rc.local from just after where netmask is set (rc.local ifconfig lines) to just before where netmask is set. All the above problems are alleviated and ypbind appears to operate normally. NOTE that it is probably essential to have /etc/gateways explicitly designate the local host as the default gateway for this to work, though i have not yet confirmed this. Conjectures ----------- Despite the fact that i have no certain knowledge about the mechanism involved (nor do i have access to source), i do have a conjecture about what's causing the problem (with credit to Barry Warsaw for helping hash this provisional explanation out). The essence of my suspicion is that just after the netmask change (or perhaps during, what with the snazzy new features in ifconfig that engage yp for setting it) but before routed is initted ypbind is called upon to establish a connection that it is not able to make, and it hangs trying to do so. Consequently, subsequent dependencies on queries to ypbind hang, hence the behaviors of mount, nfsd, and so forth. Sun has released upgrades to its OS before (3.3, 3.4) that had obvious unexercised subnetworking faults. These have caused myriad and grievous problems for those of us that rely on subnetwork partitioning to deal with what would otherwise be unmanageably large networks. This appears to be a very similar if not the same problem that occurred before. I hope that if this does, in fact, turn out to be the case, more pre-distribution attention will be paid to it in the future. Aargh, Ken Manheimer klm@cme.nbs.gov or ..!uunet!cme-durer!klm National Institute of Standards and Technology (Formerly "National Bureau of Standards") CME Factory Automation Systems, Software Support