[comp.protocols.tcp-ip] Broadcast Storms

cpw%sneezy@LANL.GOV (C. Philip Wood) (08/26/87)

This subject may have been discussed before.  But,
LANL was experiencing broadcast storms on an ethernet.
About 100 hosts on the ether would send a message back to
a lone Ultrix V2.0-1  (Microvax) system running the rwho daemon 
every few minutes.  It was broadcasting to 128.165.255.255 and all
these hosts were sending back some kind of arp to find out who the
guy was so they could probably send him some kind of response
(Connection refused?).

Anyhow, as in the problem with reverse arp responses of zero (a previous
missive) I spent some time tracking down the culprit.  Once I found
him, the fix was to terminate the rwho daemon (and remove entry from
the startup file).

Network management in a heterogeneous environment, anyone?

Phil Wood  (cpw@lanl.gov

sy.Ken@CU20B.COLUMBIA.EDU (Ken Rossman) (08/27/87)

Please forgive my ignorance in these matters (I'm still pretty new to a lot
of this), but shouldn't the decision as to whether to address broadcasts
using an IP address like X.Y.255.255, as opposed to X.Y.0.0 be done by the
kernal and not some daemon (rwho)?  Again, I'm not clear on this, as I only
got part of an explanation of this awhile back, but isn't the X.Y.255.255
format an old style (obsolete?) broadcast format, and X.Y.0.0 the current
way to broadcast?  Or are they both valid but used differently?  /Ken
-------

hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (08/27/87)

Maybe we need to keep a collection of known causes of broadcast
storms, and send them to the list once a month.  Almost certainly what
is going on is that most of your machines are configured to expect
128.165.0.0 as a broadcast address (this would be the default for
4.2), and they have ipforwarding turned on.  Thus the 128.165.255.255
looks to them like an attempt to send a message to a host with this
address.  Since ipforwarding is on, they try to be nice and forward
the message.  Thus they ARP 128.165.255.255.  The real fix is to make
sure that every one of your machines has the stormproofing code in it
that I have posted several times.  (In brief,
 1) in ipintr, make sure that all possible broadcast addresses
	are recognized.
 2) in udp_input, fix the code that sends unreachables so that its
	test for broadcast addresses includes all possible addresses
 3) in ip_forward, when ipforwarding is off, discard the packet
	in all cases with no error message.  [4.3 has fixed this
	already, but not 4.2.]  Leave ipforwarding off except on
	actual gateways, and if you use Unix hosts as gateways,
	make sure that proper Martian filtering, etc., is done.
) However it is often impossible to modify the code on every one of
your machines.  In that case, a reasonable approach is to make sure
that every one of your machines agrees about the broadcast address.
4.2 systems will use net.0.0.  4.3 systems and Ultrix will default to
net.255.255, but allow an option -broadcast in ifconfig to set a
different address.  I suggest that you set your Ultrix machine to use
128.165.0.0 as its broadcast address.  This is a violation of the
standards, but it's better to have everyone on your network agree than
to have one lone machine be right.

I do not recommend rwho on big networks in any case.  But it should
not cause storms.  There are enough other uses of broadcasts that you
should make sure they are safe.  I would set the broadcast address to
128.165.0.0 and turn rwho back on.  Now use Etherfind on a Sun (or
netwatch on a PC, but if you have 100 machines on an Ethernet, it is
worth buying a Sun just to run Etherfind) and verify that you are not
seeing any ARP's for 128.165.0.0, nor any ICMP unreachables.  If you
see either of these, start tracking down the hosts one by one and
fixing them.

If you are using level 2 Bridges, you are asking for this sort of
thing.  In that case, you should do this sort of test periodically to
make sure no new problems have crept into your network.

If there are any hosts that insist on sending garbage in response to
broadcasts, isolate them from the rest of your network with a gateway.

hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (08/27/87)

X.Y.255.255 is the new form.  X.Y.0.0 is the old form.  Unfortunately
it is easier to convince new software to use the old than old software
to use the new.  So until everybody updates, it may be easiest to
stick with the old format.  The decision is in fact made in the
kernel, not rwho.  You use ifconfig to choose the broadcast address.
Ifconfig is a program that primarily does system calls to set up
various options in the kernel.

sy.Ken@CU20B.COLUMBIA.EDU (Ken Rossman) (08/27/87)

  X.Y.255.255 is the new form.  X.Y.0.0 is the old form.  Unfortunately it
  is easier to convince new software to use the old than old software to
  use the new.  So until everybody updates, it may be easiest to stick with
  the old format.  The decision is in fact made in the kernel, not rwho.
  You use ifconfig to choose the broadcast address.  Ifconfig is a program
  that primarily does system calls to set up various options in the kernel.

I guess the reason I asked about this was because we've seen those
broadcast storms here a number of times, and so apparently have other
sites, yet in answer to this problem, folks keep saying the "fix" is to
kill the rwho daemon, as though rwho was directly responsible for net
storms.  Rwho is doing its job properly -- it's the kernal that's goofing
up.  Rwho ain't the only thing around that's gonna cause ARP broadcast
anyway, is it?  /Ken
-------

david@elroy.Jpl.Nasa.Gov (David Robinson) (08/28/87)

In article <8708270922.AA06419@topaz.rutgers.edu>, hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) writes:
> X.Y.255.255 is the new form.  X.Y.0.0 is the old form.  Unfortunately
> it is easier to convince new software to use the old than old software
> to use the new.  So until everybody updates, it may be easiest to
> stick with the old format. 
Agreed, until you have a net where 100% can understand new style
broadcasts you should leave it as the old style.

> The decision is in fact made in the
> kernel, not rwho.  You use ifconfig to choose the broadcast address.
> Ifconfig is a program that primarily does system calls to set up
> various options in the kernel.

WRONG!  All of the programs in the 4.3bsd distribution that broadcast
make an ioctl call to get the kernel broadcast address which is
set via /etc/ifconfig.  SunOS 3.[3-4] suffers from the fact that
none of their programs that broadcast check the kernel broadcast
address and insist on using old style 4.2bsd broadcasts.



-- 
	David Robinson		elroy!david@csvax.caltech.edu     ARPA
				david@elroy.jpl.nasa.gov (new)
				seismo!cit-vax!elroy!david UUCP
Disclaimer: No one listens to me anyway!

ron@TOPAZ.RUTGERS.EDU (Ron Natalie) (08/28/87)

More than likely they were arping  because they were attempting
to forward the datagram.

-Ron