cpw%sneezy@LANL.GOV (C. Philip Wood) (08/26/87)
This subject may have been discussed before. But, LANL was experiencing broadcast storms on an ethernet. About 100 hosts on the ether would send a message back to a lone Ultrix V2.0-1 (Microvax) system running the rwho daemon every few minutes. It was broadcasting to 128.165.255.255 and all these hosts were sending back some kind of arp to find out who the guy was so they could probably send him some kind of response (Connection refused?). Anyhow, as in the problem with reverse arp responses of zero (a previous missive) I spent some time tracking down the culprit. Once I found him, the fix was to terminate the rwho daemon (and remove entry from the startup file). Network management in a heterogeneous environment, anyone? Phil Wood (cpw@lanl.gov
sy.Ken@CU20B.COLUMBIA.EDU (Ken Rossman) (08/27/87)
Please forgive my ignorance in these matters (I'm still pretty new to a lot of this), but shouldn't the decision as to whether to address broadcasts using an IP address like X.Y.255.255, as opposed to X.Y.0.0 be done by the kernal and not some daemon (rwho)? Again, I'm not clear on this, as I only got part of an explanation of this awhile back, but isn't the X.Y.255.255 format an old style (obsolete?) broadcast format, and X.Y.0.0 the current way to broadcast? Or are they both valid but used differently? /Ken -------
hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (08/27/87)
Maybe we need to keep a collection of known causes of broadcast storms, and send them to the list once a month. Almost certainly what is going on is that most of your machines are configured to expect 128.165.0.0 as a broadcast address (this would be the default for 4.2), and they have ipforwarding turned on. Thus the 128.165.255.255 looks to them like an attempt to send a message to a host with this address. Since ipforwarding is on, they try to be nice and forward the message. Thus they ARP 128.165.255.255. The real fix is to make sure that every one of your machines has the stormproofing code in it that I have posted several times. (In brief, 1) in ipintr, make sure that all possible broadcast addresses are recognized. 2) in udp_input, fix the code that sends unreachables so that its test for broadcast addresses includes all possible addresses 3) in ip_forward, when ipforwarding is off, discard the packet in all cases with no error message. [4.3 has fixed this already, but not 4.2.] Leave ipforwarding off except on actual gateways, and if you use Unix hosts as gateways, make sure that proper Martian filtering, etc., is done. ) However it is often impossible to modify the code on every one of your machines. In that case, a reasonable approach is to make sure that every one of your machines agrees about the broadcast address. 4.2 systems will use net.0.0. 4.3 systems and Ultrix will default to net.255.255, but allow an option -broadcast in ifconfig to set a different address. I suggest that you set your Ultrix machine to use 128.165.0.0 as its broadcast address. This is a violation of the standards, but it's better to have everyone on your network agree than to have one lone machine be right. I do not recommend rwho on big networks in any case. But it should not cause storms. There are enough other uses of broadcasts that you should make sure they are safe. I would set the broadcast address to 128.165.0.0 and turn rwho back on. Now use Etherfind on a Sun (or netwatch on a PC, but if you have 100 machines on an Ethernet, it is worth buying a Sun just to run Etherfind) and verify that you are not seeing any ARP's for 128.165.0.0, nor any ICMP unreachables. If you see either of these, start tracking down the hosts one by one and fixing them. If you are using level 2 Bridges, you are asking for this sort of thing. In that case, you should do this sort of test periodically to make sure no new problems have crept into your network. If there are any hosts that insist on sending garbage in response to broadcasts, isolate them from the rest of your network with a gateway.
hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (08/27/87)
X.Y.255.255 is the new form. X.Y.0.0 is the old form. Unfortunately it is easier to convince new software to use the old than old software to use the new. So until everybody updates, it may be easiest to stick with the old format. The decision is in fact made in the kernel, not rwho. You use ifconfig to choose the broadcast address. Ifconfig is a program that primarily does system calls to set up various options in the kernel.
sy.Ken@CU20B.COLUMBIA.EDU (Ken Rossman) (08/27/87)
X.Y.255.255 is the new form. X.Y.0.0 is the old form. Unfortunately it is easier to convince new software to use the old than old software to use the new. So until everybody updates, it may be easiest to stick with the old format. The decision is in fact made in the kernel, not rwho. You use ifconfig to choose the broadcast address. Ifconfig is a program that primarily does system calls to set up various options in the kernel. I guess the reason I asked about this was because we've seen those broadcast storms here a number of times, and so apparently have other sites, yet in answer to this problem, folks keep saying the "fix" is to kill the rwho daemon, as though rwho was directly responsible for net storms. Rwho is doing its job properly -- it's the kernal that's goofing up. Rwho ain't the only thing around that's gonna cause ARP broadcast anyway, is it? /Ken -------
david@elroy.Jpl.Nasa.Gov (David Robinson) (08/28/87)
In article <8708270922.AA06419@topaz.rutgers.edu>, hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) writes: > X.Y.255.255 is the new form. X.Y.0.0 is the old form. Unfortunately > it is easier to convince new software to use the old than old software > to use the new. So until everybody updates, it may be easiest to > stick with the old format. Agreed, until you have a net where 100% can understand new style broadcasts you should leave it as the old style. > The decision is in fact made in the > kernel, not rwho. You use ifconfig to choose the broadcast address. > Ifconfig is a program that primarily does system calls to set up > various options in the kernel. WRONG! All of the programs in the 4.3bsd distribution that broadcast make an ioctl call to get the kernel broadcast address which is set via /etc/ifconfig. SunOS 3.[3-4] suffers from the fact that none of their programs that broadcast check the kernel broadcast address and insist on using old style 4.2bsd broadcasts. -- David Robinson elroy!david@csvax.caltech.edu ARPA david@elroy.jpl.nasa.gov (new) seismo!cit-vax!elroy!david UUCP Disclaimer: No one listens to me anyway!
ron@TOPAZ.RUTGERS.EDU (Ron Natalie) (08/28/87)
More than likely they were arping because they were attempting to forward the datagram. -Ron