gnb@bby.oz (Gregory N. Bond) (09/26/89)
Here is what seems to be a bug in the PC-NFS V3.0 - either implementation or protocol. Background: We have 2 yp servers (melba and sid), with melba as the master. Melba (a Sun 3/260 running SunOs 3.5Export) has 2 ethernet cards, ie0 is the main Sun net, ie1 is a net with about 6 PCs running PC-NFS V3.0. Sid is on the main net (ie0). All PCs use RARP and YP, with only the YP domain name being configured on the PC (i.e. everything else is determined dynamically). Both nets are class C nets, no subnets. This morning, the main net was ruptured by workmen, and the portmapper died on melba in the ensuing net.chaos. We shut melba down to single user and rebooted. Problem: When the main server was back up, we rebooted the PCs. All of them got as far as "NET START RDR" and hung. Much probing with etherfind etc found that all machines were sending a UDP packet to melba-gw (the ie1 interface of melba), port 1027, about 1 per second. The sequence was 4 packets to port 1027, 1 to port 5379, which was ypbind on melba. Port 1027 was NOT registered in the melba portmapper, nor is it in /etc/services or /etc/servers. Cause: During the net.chaos, melba had bound its yp program to sid, even though melba was the YP master. In fact, port 1027 on sid was the actual ypserv port on sid (as shown by rpcinfo -p). The fix was simple - use /usr/etc/ypset to bind melba to melba. The PCs then continued the boot sequence without any further intervention. Analysis: This is my tentative analysis of what happens: During the boot sequence, the PC broadcasts a request to find a yp server. The ypbind on melba replies, saying that the server is on port x, but don't mention that it is host sid. The PC then assumes that they mean port x on melba and hang waiting on YP service. Hete is the output of etherfind during a boot attempt: icmp type lnth proto source destination src port dst port 64 icmp rsm3 0xffffffff 17 64 icmp rsm3 0xffffffff 17 64 arp rsm3 broadcast 100 udp rsm3 melba-gw 1500 sunrpc 64 udp rsm3 broadcast discard discard 92 udp rsm3 melba-gw 1500 5371 100 udp rsm3 melba-gw 1501 sunrpc 92 udp rsm3 melba-gw 1501 5379 124 udp rsm3 melba-gw 1501 1027 124 udp rsm3 melba-gw 1501 1027 124 udp rsm3 melba-gw 1501 1027 124 udp rsm3 melba-gw 1501 1027 [ Last 5 lines repeaed indefinitly, 1 per sec (approx) ] (I have a fuller dump if people are interested). Can anyone confirm my analysis? Is the bug in the SunOs side or the PC side? Greg. -- Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb