[comp.sys.sun] Broadcast Storms on Ethernet

hollandm@prl.philips.co.uk (Martin Holland) (02/14/90)

We have three ethernet cables here. One main backbone and two smaller
networks. Each smaller network is joined to the main backbone by a SUN
workstation with two ethernet cards in it. We use subnetting with a mask
of 255.255.255.0.  We have found that if any UNIX node on the main
backbone sends a broadcast, usually from a RWHO daemon it causes the SUN
gateways to send a storm of broadcasts onto the network each containing
the nodename of the node that sent the origional broadcast.  These storms
cause the ethernet performance to degrade badly.  Is there anything we can
to stop the gateway nodes from sending broadcast storms.  These storms
only origionate from gateway machines, other SUN workstations on the main
backbone are unaffected.

Martin C. Holland       Internal e/mail: HOLLANDM@PHIRHV1
Philips Research Labs.  External e/mail: HOLLANDM@prl.philips.co.uk
Redhill, Surrey. U.K.
Tel: 0293 785544 X 5911 

spurgeon@emx.utexas.edu (Charles Spurgeon) (02/16/90)

In article <4957@brazos.Rice.edu> hollandm@prl.philips.co.uk (Martin Holland) writes:
>X-Sun-Spots-Digest: Volume 9, Issue 33, message 10
>
>We have found that if any UNIX node on the main
>backbone sends a broadcast, usually from a RWHO daemon it causes the SUN
>gateways to send a storm of broadcasts onto the network each containing
>the nodename of the node that sent the origional broadcast.  

Right.  SunOS 4.n has a serious bug such that it will forward datagrams
sent to an IP broadcast address it doesn't understand, back out the
interface it heard them on, over and over until the Time To Live field in
the packet finally expires.  In the case of the usual rwho packet, the TTL
appears to be 255.  Another recent example I saw was a timed broadcast
from a Silicon Graphics machine which resulted in 30 new broadcasts. This
has to be the world's most painful way to discover what the TTL of a
datagram is. :-(

The bug is triggered when the Sun is set up as a router.  It doesn't need
to have two interfaces, just playing with SLIP and adding a second IP
address will set things off, as we learned to our regret here at UT Austin
over the weekend.  On a large bridged network, such as UT Austin has, the
havoc this can raise is considerable.  Proving once again, should anyone
still need proof, that large bridged campus networks are a lousy idea.

Here's a quick fix, known locally as the Ghastly Routing Hack.  Add a
route to the unknown IP broadcast address, and point the route to the
host's own loopback interface.  Since UT Austin is bridged and has lots of
hosts running old SunOS, we try to get everyone to use the old IP
broadcast address of net.zeros.  On our network that translates to
128.83.0.0.  That means that the Sun gateway machine can't figure out what
to do with packets sent to 128.83.255.255 which results in the damaging
forwarding behavior.  So the following command gives the broke Sun gateway
software some place relatively safe to send the problem packet: (like in
its own ear!)

route add 128.83.255.255 127.0.0.1 0

This seems to work for us.  Your mileage may vary...

Much better would be for Sun to fix the bug, of course.  I've tried
several times to let people at Sun know about it, but since this place
operates without Sun software support, I think my complaints have fallen
on deaf ears at Sun.