[comp.protocols.tcp-ip] Ethernet Overload Problem

jeffreyb@dasys1.UUCP (Jeffrey L Bromberger) (04/06/88)

This posting is for a friend on a machine with no outgoing newsfeed.
He/I/we have no idea why this happens (no outgoing, yet incoming
news), but that goes in a different group :-)

Here is his article
===========   cut here  ==============

I am new to the problems of TCP/IP on an ethernet and am slowly
learning my way around.  We have a problem which I can "fix" but which
I don't understand.  The observable problem is that the ethernet
becomes so filled with packets that none of the computers connected to
it can get any work done because they are busy servicing interrupts.
Since I don't know what the packets contain, I hesitate to label it a
broadcast storm.

Let me describe our setup more completely.  We have a Vax/780, a
Vax/750, a Celerity 1260, and a Bridge Communications CS/100 connected
together by a DELNI.  We run 4.3BSD Unix on the 3 computers and use
TCP/IP.

The problem first occured when we connected an Evans&Sutherland PS350
to the DELNI.  Booting up the PS350 would send the interrupt rate on
the 780 to a level where it wasn't even an adequate single user
machine.  I eventually persuaded myself that it was related to
trailers and the way to get rid of the problem was to use arp -s to
define the name and ethernet address for the PS350 (leaving off the
trailer attribute).  None of the other hosts on the "ethernet" had arp
entries in the boot startup file (/etc/rc.local).  I did not add them.

I think now that the above analysis of the problem was incorrect.

The reason that I changed my mind is that the problem came back after
being gone for about 9 months.  The occasion of the reappearance was a
rebuilding of the system disk for the CS/100.  The CS/100 had been a
bit flakey recently and so it seemed like a good idea to start with a
fresh system diskette.  It was rebuilt with all the same parameters as
the original.  Booting up the CS/100 caused the problem.  Rebooting
seems to have eliminated the problem for the time being.  Both the
Vaxen have been down since the change and the problem has not come
back.  (It has been 2 weeks now.)

Now, the questions.  What tools do I have in BSD Unix to diagnose this
problem?  Should I use arp -s to define all of the hosts on the
ethernet?  Why?  Any ideas what is causing the problem?

I would like to have a better understanding of what is giong on here
because in a short while we will be running a real ethernet with at
least another Bridge hung off it.

Thanks for your help.  If these questions are too elementary to have
answers of general interest then mail to me instead of posting.

Dan Schlitt
UUCP: backbone!cmcl2!{phri,cucard}!ccnysci!dan
BITNET: dan@ccnysci.BITNET

=========  cut here  ================

There it is.  Can you help us please, as this bogging down of the
ethernet is making life/work here abominable.

Thanks in advance!!
-- 
*---Jeffrey Bromberger -- 2847 West 22nd Street Brooklyn, NY 11224---*
|   Compu$erve:  71171,730                   /dasys1!jeffreyb        |
|   UUCP:                 cmcl2!{cucard,phri}!ccnysci!jeffrey        |
*---Disclaimer: "My school disavows any knowledge of my actions!" ---*

eshop@saturn.ucsc.edu (Jim Warner) (04/06/88)

You problem was described in a message from Tom Ferrin at UCSF:

+The ARP code for negotiating the use of trailers with another host
+is broken.  If the remote host cannot understand trailers, the
+algorithm can get stuck in a loop continually exchanging ARP_REPLY
+packets with the remote host.  This floods the ethernet with LOTS
+of packets for several minutes until the algorithm gives up trying.

+The scenario is as follows: 1) the local host tries to resolve a
+ethernet address that is not in it's arp table by sending out a
+ARPOP_REQUEST packet; 2) the remote host sends a ARPOP_REPLY with
+the ETHERTYPE_IP protocol field set indicating that it does
+not wish to receive trailers; 3) local host sends a ARPOP_REPLY
+announcing that it does wish to receive trailers; 4) remote host
+sends another ETHERTYPE_IP reply indicating it does not wish to
+send trailers either.  This reply is indistinguishable from #2
+above and the whole cycle repeats.

Tom goes on to describe mods to VAX unix to break the cycle.  
Revised code is now available, however, from E&S that fixes the
problem on the PS3xx end.  That is probably the preferable solution.
Contact E&S for an upgrade.

jim