mullen@NRL-CSS.ARPA (Preston Mullen) (08/08/87)
Here's a new one to add to the list of things that can induce broadcast storms and other serious problems. It involves an unpleasant interaction between SunOS 3.3 and 3.4 and Wollongong's WIN/VX software version 3.0. (Ironically, this problem was noticed when the Suns and VAXes were upgraded to what is generally considered to be better networking software.) When a diskless Sun 3 workstation running SunOS 3.3 or 3.4 boots, at some point it sends out a broadcast ICMP Address Mask Request. This is in accordance with RFC950; unfortunately, an incorrect reply from any machine on the network can be accepted by the workstation, and some incorrect masks can induce the workstation to start sending all packets as Ethernet broadcasts, which instantly leads to a broadcast storm. If this happens, the workstation will probably fail to finish booting completely, usually stopping during the NFS mount with "RPC: not registered" messages, but sometimes sooner. Also, the workstation may itself then generate an incorrect reply (sent as an Ethernet broadcast) to a subsequent ICMP Address Mask Request from some other machine, thus spreading the virus. Other symptoms that may be observed include extremely sluggish operation of diskless machines and "no carrier" and "Ethernet jammed" notices from Suns (aside from those generated during broadcast storms). This happened here on a non-subnetted Class B network when the 3.0 release of Wollongong's IP/TCP software was installed on two VAXes running VMS. Both VAXes replied to a Sun's ICMP Address Mask Request with network masks of 0000FFFF instead of the correct FFFF0000. The diskless Sun gullibly swallowed this and began to encapsulate every IP packet in an Ethernet broadcast packet. Even after we isolated our network from the offending VAXes, any Sun in this state would respond with the bogus netmask to a broadcast from another Sun and thus continue the problem. The solution was to disconnect the offending VAXes from the network, halt ALL the diskless Suns, then reboot them. After that, we could let the VAXes back on the network, but it was not really safe since any diskless Sun reboot would start the cycle over again. The people who own the VAXes that started the problem have told me that the problem is really in the Wollongong software (i.e., not a configuration error) and is a holdover from some "broken 4.3bsd code". Evidently there are also problems with setting the address mask. They report that Wollongong has sent them a fix. ==> Potential users of Wollongong 3.0 software should get a fix from Wollongong before running 3.0 on a network where a machine might broadcast an ICMP address mask request. ==> Sun should make their networking code smarter. There is no way a netmask of 0000FFFF could ever be valid, since it must include the normal network part of the address as well as the subnet part. The Suns should have ignored the faulty replies instead of being driven berserk by them. I've reported this to Sun. The problem doesn't affect Suns running 3.2 or earlier releases since those releases have no subnetting support. It would probably be better if the diskless Sun did not broadcast the Address Mask Request but instead sent it directly to the server. I think that the request is broadcast after the ifconfig of the diskless Sun's Ethernet interface (anyway, an "address mask set to FFFF0000" report appears on the console quite a while before the problem shows itself). If that is so, then it's not clear why an ICMP Address Mask Request needs to be sent at all, since the network mask can be specified in the ifconfig in the rc.boot file. By the way, our class B network is partitioned by level 2 bridges (Digital DEBETs). Needless to say, they passed every bad packet and broadcast right through. (The VAXes were on the other side of a bridge from my Suns.) Yep, I'll be moving my stuff behind an IP gateway now. Many, many thanks to Van Jacobson for 'tcpdump', which proved instrumental in tracking this down. I hope Sun will let Van release the source code for tcpdump. Preston Mullen Computer Science and Systems Branch Information Technology Division Naval Research Laboratory Washington DC 20375-5000 P.S. Why do diskless Sun workstations running SunOS 3.4 broadcast an ARP request for IP address 0.0.60.216 very early in the boot sequence? (The ARP packet asks that replies go to 0.0.60.216.) This appears to be wired into the Sun networking software. It's harmless enough, but it should not be there. Maybe someone forgot to take out some debugging code.
hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (08/09/87)
I believe your ARP request for 0.0.60.216 is used during the boot sequence, at a point when the Suns don't know their Internet address. They seem to use their serial number as an address. You'll also find such addressing in the routing tables in certain versions of SunOS. I suggest that you take 60 * 256 + 216, and see if you don't have a Sun with that serial number.
timk@NEWTON.NCSA.UIUC.EDU (Tim Krauskopf) (08/10/87)
We encountered the exact same problem (diagnosed with tcpdump) between our diskless Sun 3/50 (SunOS 3.3) and a Vaxstation (TWG under VMS). The icmpmask response from TWG is backwards. Easy work-around while waiting for bug fixes from TWG: Install the netmask parameter in the rc.boot files (ifconfig lines) of as many diskless Suns as you feel like. Each of the Suns with the mask installed will operate correctly. For the rest of the machines, the russian roulette has much better odds: When the ICMP request goes out, there are several Suns waiting to respond and they are faster at responding than the VAXen. Whoever gets there first wins. Isn't this fun? Tim Krauskopf National Center for Supercomputing Applications University of Illinois timk%newton@uxc.cso.uiuc.edu
jerry@twg.arpa ("Jerry Scott") (08/14/87)
On date, 8 Aug 87, Preston Mullen <mullen@nrl-css.arpa> wrote: >>==> Potential users of Wollongong 3.0 software should get a fix from >>Wollongong before running 3.0 on a network where a machine might >>broadcast an ICMP address mask request. A fix may be obtained for this problem by using anonymous ftp to host twg.arpa (26.5.0.73) and obtaining image ip_icmp.o. The file is in binary/record format and should be transferred from a Wollongong host running either release 2.3 or 3.0. Also, image named472.sav, can be obtained at this time. This is the new version of the bind name server from D. Kingston, et al. The image is in VMS Backup format and should be obtained using binary/record mode from a Wollongong host. Regards, Jerry Scott ------