Sun-Spots-Request@RICE.EDU (William LeFebvre) (04/01/88)
SUN-SPOTS DIGEST Thursday, 31 March 1988 Volume 6 : Issue 40 Today's Topics: Re: Ethernet problems (2) Re: Ethernet problems/ collision rates Re: Strange Ethernet error messages Re: Mysterious Ethernet problems Re: TCP packet size bug in 3.4 AND 3.5 Re: Sun-3/2xx user level vs. kernel level bcopy Send contributions to: sun-spots@rice.edu Send subscription add/delete requests to: sun-spots-request@rice.edu Bitnet readers can subscribe directly with the CMS command: TELL LISTSERV AT RICE SUBSCRIBE SUNSPOTS My Full Name Recent backissues are stored on "titan.rice.edu". For volume X, issue Y, "get sun-spots/vXnY". They are also accessible through the archive server: mail the word "help" to "archive-server@rice.edu". ---------------------------------------------------------------------- Date: Fri, 18 Mar 88 20:48:04 PST From: Craig Leres <leres%lbl-helios@lbl-rtsg.arpa> Subject: Re: Ethernet problems (1) Reference: v6n20,v6n28 Actually, this problem is the result of the collaboration of two bugs in the SunOS kernel. One bug is the failure to correctly recognize all possible ip broadcast addresses. But this wouldn't hurt if it wasn't for another bug that makes the system think it's ok to forward a packet when it isn't. Clearly, you shouldn't forward a packet if you only have one network interface. Nor should you forward a packet out the same interface it came in on. > Hosts simply should not forward packets of any sort, and they certainly > should not *under any circumstances* forward a broadcast packet. I don't I don't think you really mean this; if hosts stopped forwarding packets, the internet would cease to exist! Perhaps your definition of a host is a system with only one network interface as opposed to a gateway which has more than one network interface? In any case, neither hosts nor gateways should forward broadcast packets. > Of course, there is this nice kernel variable "ipforwarding" which can be > used to disable forwarding and which you might think can be used to stop > this antisocial behavior. Guess again. In a 4.2BSD system, if you turn > off ipforwarding, all that will happen is that you'll swap ICMP Network > Unreachable messages for ARPs (at a possible packet savings, as you'll Arp requests are broadcasts; they must be received by all stations but are of interest only to the one station that is being arp'ed for. Bogus arp requests must be received and discarded by all stations. So as it turns out, turning off ipforwarding is a BIG win. Instead of wasting cycles on all systems, you only waste cycles on the system that did the broadcast. Another way to reduce this problem is to reduce the number of broadcasts that occur on your ethernet. One way we've done this at lbl is to outlaw rwho. It's just too expensive when you have more than a handful of hosts participating. Our Net Police have worked overtime to disable ipforwarding and turn off rwho; as a result, our lab-wide ethernet is pretty healthy. Craig ------------------------------ Date: Sun, 20 Mar 88 12:14:25 EST From: steve@cs.umd.edu (Steven D. Miller) Subject: Re: Ethernet problems (2) Reference: v6n20,v6n28 From: Craig Leres <leres%lbl-helios@lbl-rtsg.arpa> > ...One bug is the failure to correctly recognize all possible ip > broadcast addresses. Agreed. I wonder... is there any time when one might want to forward a packet that was sent to the local broadcast address? I can't think of any, but someone else may have different ideas. If one never wants to forward local wire broadcasts, it would be nice if the device drivers got modified to pass back an indication of whether or not a particular incoming IP packet had been sent to the local wire broadcast address. If so, one could hack the code so that it would never be forwarded. All this mucking about with IP addresses and guessing whether the sender was broadcasting would then go away. > Clearly, you shouldn't forward a packet if you only have one network > interface. Nor should you forward a packet out the same interface it came > in on. Agreed. The "forward iff > 1 IP interface" rule holds in 4.3BSD. I disagree that the second should hold; what if you're playing gateway, and someone sends you a packet that should have gone to another gateway on the local net? You should send a redirect, but it would be nice to forward the packet anyway. > ...Perhaps your definition of a host is a system with only one network > interface as opposed to a gateway which has more than one network > interface? Same meaning, different terminology. > ...So as it turns out, turning off ipforwarding is a BIG win. Instead of > wasting cycles on all systems, you only waste cycles on the system that > did the broadcast. Processing an incoming ARP doesn't take a whole lot of code, and should not take much CPU. I think that trashing the net is the real problem here, not wasting a few cycles. > ...One way we've done this at lbl is to outlaw rwho.... This sounds reasonable, but I admit that I'm hooked on rwho. > Our Net Police have worked overtime to disable ipforwarding and turn off > rwho... This, too, sounds familiar. One must be ever-vigilant to keep this sort of behavior from cropping up; one well-meaning OS upgrade or system configuration change, and a quiet Ethernet can turn noisy again. -Steve ------------------------------ Date: Fri, 18 Mar 88 10:59:43 PST From: celeste@coherent.com (Celeste C. Stokely) Subject: Re: Ethernet problems/ collision rates Reference: v6n20,v6n28 This is in reply to the person with the le0 error messages, and also the person asking about collision rates. 1. Concerning the messages: le0: Received packet with ENP bit in rmd cleared le0: Received packet with STP bit in rmd cleared Under normal operation the LANCE driver should never encounter receive descriptors with either the ENP or STP bit cleared. The driver sets up its buffers to be large enough to hold the maximum size packets allowed by the Ethernet spec. This means that it has no need to chain receive buffers together so that an individual packet straddles multiple receive buffers. Translating this into receive descriptor bits, there should be exactly one descriptor for each incoming packet, and that descriptor should have both the start-of- packet (STP) and end-of-packet (ENP) bits set. However, if there's traffic on the net in violation of the Ethernet spec, it's possible for an incoming packet to be too big to fit into a single receive buffer. In this case, the packet will span multiple descriptors, with the ENP bit clear on all but the last and the STP bit clear on all but the first. That's where the first two error messages are coming from. The error message about babbling confirms the condition indicated by the first two messages. The two error messages indicate that the chip has decided that things are screwed up enough that it should stop its transmitter and receiver sections and has done so. The driver will restart them upon getting this error. The bottom line is that it is very likely that there's other equipment on the net that's operating in violation of the Ethernet spec by sending out giant packets. 2. Concerning what are reasonable ethernet collision rates: Collisions are completely expected on ethernet. Collision is one of the ways that more than 1 machine can live on the cable. The problem comes when there are too many collisions. Here are my rules of thumb for "how many": 0% - 2% --All is well. textbook perfect, healthy (collision-wise) net. 2.5%-5% --Not super, but ok. I expect this with a lot of nd clients on a net. 5% - 10%--Uh-oh, bad problems developing. Get out the diagnostic tools. Find where the problem is, and fix it. (Has a machine lost the ability to detect collisions, and so is blabbering whenever it feels like it?) > 10% --Serious trouble. Users probably complaining loudly. You should have fixed the problem before it got this bad, but at least it should be easy to find by now. Fix it now. Of course, these are my guidelines, but they've worked well for me over the years. ALSO, please remember that the formula for computing the collision rate is: (Collisions/Opkts)*100=collision rate [Opkts is the number you get in netstat -i] ..Celeste Stokely Coherent Thought Inc. UUCP: ...!{ames,sun,uunet}!coherent!celeste Domain: celeste@coherent.com Internet: coherent!celeste@ames.arpa or ...@sun.com or ...@uunet.uu.net VOX: 415-493-8805 SNAIL:3350 W. Bayshore Rd. #205, Palo Alto CA 93404 ------------------------------ Date: Fri, 18 Mar 88 20:56:44 PST From: Craig Leres <leres%lbl-helios@lbl-rtsg.arpa> Subject: Re: Strange Ethernet error messages Reference: v6n28 Here's a rehash of a posting I made to sun-spots last June. The Ethernet driver gives the LANCE chip a block of memory large enough to hold 40 full sized packets. The errors: le0: Received packet with STP bit in rmd cleared le0: Received packet with ENP bit in rmd cleared are indications that the LANCE received packets that were bigger than the driver was expecting. The error: le0: Receive buffer error - BUFF bit set in rmd indicates that the LANCE chip ran out of memory to put incoming packets in. The Intel interface will spew the message: ie0: giant packet when it receives packets that are too big. One source of large packets are devices that violate the minimum inter-packet spacing. Some combinations of transceivers and interfaces see these too-closely-spaced back-to-back packets as a single large packet. Another source large packets are old DEQNA Ethernet interfaces. When they receive more packets than they can handle, they transmit all one's for a short spell. This garbage looks like an impossibly large broadcast packet. Craig ------------------------------ Date: Mon, 21 Mar 88 20:31:32 PST From: paula@boeing.com Subject: Re: Mysterious Ethernet problems Reference: v6n30 In Sun-Spots v6n30, leonid@TAURUS.BITNET described a problem that looks identical to what we're seeing with five of our 3/280 servers. With one exception (see below), these machines are running 3.4. Our building is wired with thick Ethernet, and most machines connect to the cable through at least one IsoLan fan-out unit. We have ~60 Suns (80% diskless) and perhaps another 60 other Ethernet boxes ranging from Ungermann-Bass NIU's and PC's with 3-Com cards to Xerox and Symbolics Lisp machines. The 'traffic' tool consistently shows a steady 30% background load with frequent much higher spikes. For the past month or so, we've averaged about one server per day going down with a continuous stream of ie0: lost interrupt: resetting console messages. Once a machine gets into this state, the only recourse is to reboot. The accumulating evidence seems to be pointing to a problem with these servers' connection to the net. - One of the servers was moved to another room about two weeks ago and has not crashed since. - Replacement of the cpu in one of the servers did not prevent that machine from crashing. - I vaguely remember hearing that this problem results from a bug in the 3.4 ethernet driver. I have installed 3.5 on one of the machines, but it is too soon to tell if that fixed it. - The man who maintains our building's Ethernet tells me that 4-6 weeks ago he changed the way those five servers connect to the backbone cable. Previously, the servers were connected to an IsoLan fan-out unit which connected to the backbone. Now, the fan-out unit connects to the backbone through another IsoLan. He's working on rearranging things back to the old configuration. I have been talking to Sun about this. I initially called it in as a software problem. The fellow who took the call was very helpful, but really didn't think it was a software problem. The call was re-directed to hardware and our local field engineer was out the next day to try a new cpu in one of the machines. That now appears not to have corrected the problem, which seems to be a software bug exacerbated by some configuration of cables and/or fan-out units. As I learn more, I will let you know. Is there anyone else out there who has seen this problem? Paul Allen Boeing Advanced Technology Center paula@boeing.com ...!uw-beaver!ssc-vax!bcsaic!paula ------------------------------ Date: Sun, 20 Mar 88 15:11:55 cst From: grunwald%guitar.cs.uiuc.edu@a.cs.uiuc.edu (Dirk Grunwald) Subject: Re: TCP packet size bug in 3.4 AND 3.5 Reference: v6n28 I applied the patch to allow larger TCP packets & measured the results. The server is a 3/280 (idle during the tests) and a 3/50 (which was used to run tests). Here's what I found: Test w/512Bytes w/1024Bytes -------------------------------------------------------------------------- cp latex /dev/null 11 (9 -> 13) 15(14 -> 19) cp latex /usr/tmp 1:04 1:03 latex paper.tex 2:22 -> 2:03 2:28 -> 2:05 rcc 1:32 1:36 Times are in seconds, with the range (if available) marked as low -> high. The first two tests just measure disk throughput. As you can see, the change seemed to actually degrade things for the simple test, but when you put some contention on the wire, the difference seems meaningless. The third test basically checks paging & more random disk traffic. Latex is a big program on our hosts & the paper is pretty big -- lots of files get read in. However, the difference doesn't seem very great. The 'rcc' task uses 'rsh' to do a remote 'cc' on an Intel 310 system. It should use the ethernet a lot, since files get copied there & back. Again, the difference is in the noise. Because the difference isn't that great, I left everything at 512 byte packets, mainly because test #1 ran faster that way & everything else seemed about the same. I didn't measure the performance change on the 3/280. I doubt that it's all that great. Dirk Grunwald Univ. of Illinois grunwald@m.cs.uiuc.edu ------------------------------ Date: Fri, 18 Mar 88 11:01:08 PST From: root@lll-crg.llnl.gov (Gluteus Vaximus) Subject: Re: Sun-3/2xx user level vs. kernel level bcopy > From: suneast!ozone!murph@sun.com (Joe Murphy, Manager ECD Hardware) > One thing to be wary of BTW on the Sun3 when considering user level > "bcopy"'s is that the 3/2xx series has special bcopy hardware that the > kernel takes advantage off to keep the large amount of non repeating > sequential accesses from trashing the cache. That's a fine observation, but what's the solution? We had a group at UC Berkeley trying to do Astronomical Image Processing on a 3/260. They required the ability to copy 1/4 Mb 30 times a second. Their first effort gave them 1/3 the performance of a 3/160. I got into the picture when they started asking around about how to turn the cache off. (They were pragmatic about the problem - they were stuck with the machine, now they needed to make it usable.) Eventually I was able to back engineer a semi-solution from a description of the cache addressing algorithm: it happened that the source and destination arrays in the image processing system were a multiple of 64Kbytes offset from each other. By moving one of the arrays 24 bytes relative to the other we were actually able to get slightly better than 3/160 performance (only a few percent better). There's some interesting periodic math about why 24 bytes offset (and any offset with mod(abs(D-S),65536)>16 and mod(abs(D-S),16)=8) is optimal, but the real question is: how do we get even better performance? Is there any way to get the kernel to do our copies for us using its internal bcopy? The whole reason the group at Berkeley bought the 3/260 instead of a 3/160 was because they'd determined that the 3/160 just wouldn't be fast enough. Now they find themselves stuck with this slow memory bandwidth. It should be pointed out by the way, that no cache should evidence this kind of brain damaged behavior. Worse case performance should be memory bandwidth, not sub-memory bandwidth. The behavior of the 3/2xx cache is totally unacceptable in this case. I'm also curious, does the 4/xxx series suffer the same cache problems? It's trivial to test, just write a program that copies a 1/4 Mb a 100 times and time it from the shell. Eg: #define N (1024*256) char src[N]; /* char spacer[24]; /**/ char dst[N]; main() { int i; for (i=100; i; i--) bcopy(src, dst, N); } Casey ------------------------------ End of SUN-Spots Digest ***********************