jeffreyb@dasys1.UUCP (Jeffrey L Bromberger) (04/06/88)
This posting is for a friend on a machine with no outgoing newsfeed. He/I/we have no idea why this happens (no outgoing, yet incoming news), but that goes in a different group :-) Here is his article =========== cut here ============== I am new to the problems of TCP/IP on an ethernet and am slowly learning my way around. We have a problem which I can "fix" but which I don't understand. The observable problem is that the ethernet becomes so filled with packets that none of the computers connected to it can get any work done because they are busy servicing interrupts. Since I don't know what the packets contain, I hesitate to label it a broadcast storm. Let me describe our setup more completely. We have a Vax/780, a Vax/750, a Celerity 1260, and a Bridge Communications CS/100 connected together by a DELNI. We run 4.3BSD Unix on the 3 computers and use TCP/IP. The problem first occured when we connected an Evans&Sutherland PS350 to the DELNI. Booting up the PS350 would send the interrupt rate on the 780 to a level where it wasn't even an adequate single user machine. I eventually persuaded myself that it was related to trailers and the way to get rid of the problem was to use arp -s to define the name and ethernet address for the PS350 (leaving off the trailer attribute). None of the other hosts on the "ethernet" had arp entries in the boot startup file (/etc/rc.local). I did not add them. I think now that the above analysis of the problem was incorrect. The reason that I changed my mind is that the problem came back after being gone for about 9 months. The occasion of the reappearance was a rebuilding of the system disk for the CS/100. The CS/100 had been a bit flakey recently and so it seemed like a good idea to start with a fresh system diskette. It was rebuilt with all the same parameters as the original. Booting up the CS/100 caused the problem. Rebooting seems to have eliminated the problem for the time being. Both the Vaxen have been down since the change and the problem has not come back. (It has been 2 weeks now.) Now, the questions. What tools do I have in BSD Unix to diagnose this problem? Should I use arp -s to define all of the hosts on the ethernet? Why? Any ideas what is causing the problem? I would like to have a better understanding of what is giong on here because in a short while we will be running a real ethernet with at least another Bridge hung off it. Thanks for your help. If these questions are too elementary to have answers of general interest then mail to me instead of posting. Dan Schlitt UUCP: backbone!cmcl2!{phri,cucard}!ccnysci!dan BITNET: dan@ccnysci.BITNET ========= cut here ================ There it is. Can you help us please, as this bogging down of the ethernet is making life/work here abominable. Thanks in advance!! -- *---Jeffrey Bromberger -- 2847 West 22nd Street Brooklyn, NY 11224---* | Compu$erve: 71171,730 /dasys1!jeffreyb | | UUCP: cmcl2!{cucard,phri}!ccnysci!jeffrey | *---Disclaimer: "My school disavows any knowledge of my actions!" ---*
eshop@saturn.ucsc.edu (Jim Warner) (04/06/88)
You problem was described in a message from Tom Ferrin at UCSF: +The ARP code for negotiating the use of trailers with another host +is broken. If the remote host cannot understand trailers, the +algorithm can get stuck in a loop continually exchanging ARP_REPLY +packets with the remote host. This floods the ethernet with LOTS +of packets for several minutes until the algorithm gives up trying. +The scenario is as follows: 1) the local host tries to resolve a +ethernet address that is not in it's arp table by sending out a +ARPOP_REQUEST packet; 2) the remote host sends a ARPOP_REPLY with +the ETHERTYPE_IP protocol field set indicating that it does +not wish to receive trailers; 3) local host sends a ARPOP_REPLY +announcing that it does wish to receive trailers; 4) remote host +sends another ETHERTYPE_IP reply indicating it does not wish to +send trailers either. This reply is indistinguishable from #2 +above and the whole cycle repeats. Tom goes on to describe mods to VAX unix to break the cycle. Revised code is now available, however, from E&S that fixes the problem on the PS3xx end. That is probably the preferable solution. Contact E&S for an upgrade. jim