[comp.dcom.lans] Latest news on our ether errors

david@ms.uky.edu (David Herron -- Resident E-mail Hack) (02/20/88)

Well, a little bit of sleuthing uncovered the fact that we were
having a broadcast storm, and didn't even know it.  I guess you
wouldn't call it anything worse than a light rainshower -- but
anyway, it was there.

If you remember our configuration, we do have 4.3 hosts along
with hosts which have 4.2 derived software.  Well, turning on
tcpdump and looking for arp gave me an eyeful -- a constant
stream of 3b2s/3b1s and a sequent all arping for 128.163.255.255.

Do you know how boring it is to add almost the same line to
20-30 /etc/rc files?  Anyway, I told 'em all to use .0.0 as the
broadcast address and rebooted 'em all.  That, at least, cleared
up the broadcast storm.

However we are still seeing ether errors, but not nearly as
badly as before.  Something I noticed today is that our most
error-full machine is also the one that serves most of the home
directories for the people with workstations.  Strange coincidence
there.  Fortunately we have some more uVaxen arriving to use
as servers... so this load'll be spread out some.

um, one last thing.  I mention this in order to find out the
truth of the matter and I certainly don't want to make anybody
mad ... but in an earlier posting I related a memory from a DEC
salesman that Sun ethernet equipment had some sort of problem ...

We now have better word on what this problem is.  The claim is
that Sun ethernet drivers will shove packets out with too small
of a time-gap between them.  Specifically 1 micro-second, and that
the 802.3 spec wants a 10 microsecond gap.  How true is this?  I
vaguely recall reading something along those lines recently --
it seems it was a Sun person being proud that their hardware is
able to keep up a sustained rate on input AND output at the
max speed allowed by the spec.

-- 
<---- David Herron -- The E-Mail guy            <david@ms.uky.edu>
<---- or:                {rutgers,uunet,cbosgd}!ukma!david, david@UKMA.BITNET
<----
<---- It takes more than a good memory to have good memories.

eshop@saturn.ucsc.edu (Jim Warner) (02/21/88)

In article <8403@g.ms.uky.edu> david@ms.uky.edu
	(David Herron -- Resident E-mail Hack) writes:
>
>We now have better word on what this problem is.  The claim is
>that Sun ethernet drivers will shove packets out with too small
>of a time-gap between them.  Specifically 1 micro-second, and that
>the 802.3 spec wants a 10 microsecond gap.  How true is this?
>
The heartbeat test takes place in the interpacket gap.  The window
for the test is between 4 and 8 microseconds after the Sun finishes
each packet.  If Suns were to violate the spec and shorten the
interpacket gap, the net news would be full of complaints that
Suns don't work with IEEE transceivers.  This is not the case.

casey@lll-crg.llnl.gov (Casey Leedom) (02/23/88)

In article <8403@g.ms.uky.edu> david@ms.uky.edu (David Herron) writes:
>Well, a little bit of sleuthing uncovered the fact that we were
>having a broadcast storm, and didn't even know it. ... a constant
>stream of 3b2s/3b1s and a sequent all ARPing for 128.163.255.255.

  I'm curious, I've noticed this behavior with 4.2BSD based networking
implementations also.  It's my understanding that the gratuitous ARP
responses for packets send to the local-network/all-ones-host-part
address is an attempt to negotiate trailer encapsulation on a global basis
instead of the current 4.3BSD method which does the trailer encapsulation
negotiation when an ARP request is received.  Am I talking out my hat?
For my own and others' edification would someone explain exactly why
4.2BSD based networking responds with gratuitous ARPs for packets
addressed to xxx.xxx.255.255, etc.?  Thanks in advance.

Casey

romkey@kaos.UUCP (John Romkey) (02/24/88)

In article <4097@lll-winken.llnl.gov> casey@lll-crg.llnl.gov.UUCP (Casey Leedom) writes:
>For my own and others' edification would someone explain exactly why
>4.2BSD based networking responds with gratuitous ARPs for packets
>addressed to xxx.xxx.255.255, etc.?  Thanks in advance.

When 4.2BSD was released there was no defined standard telling how to 
broadcast IP datagrams. Berkeley followed an informal standard that said
to set the host part (the part that's not net and not subnet) of the
IP address to all 0's. So you might see 128.127.0.0 as a broadcast IP address
from 4.2, and the 4.2 drivers would know not to bother ARP'ing this packet but
to just send it to the ethernet broadcast address instead.

Now...later on, in an RFC whose number escapes me at the moment, it
was specified that a broadcast datagram should have the host part of
its address set to all 1's (128.127.255.255). Since that RFC was made
a part of the TCP/IP specification, this is the correct way to do
things now.  And most systems released since then support that
properly. But if you have an application that uses broadcast and
follows that standard of using all 1's and you backport it to 4.2BSD,
then it will tell the 4.2 kernel to send packets to 128.127.255.255 and
4.2 won't know that's the IP broadcast address and will ARP it.

That's why 4.2 might ARP 128.127.255.255 on its own.

4.2BSD might ARP in response to correctly formatted broadcast packets
because tries very hard to be an IP router, even if it has only one
network interface.  When it receives a packet that's not for it (and
it won't recognize 128.127.255.255 as an IP broadcast that it should
process itself) it tries to forward it. The IP routing code says "Yes,
this is for my local net", so the kernel then tries to ARP
128.127.255.255... 

That should only happen if you have a mix of 4.2 machines and later systems
which broadcast according to spec on the same ethernet. There's a variable
in the kernel which controls IP forwarding and you can use adb to turn
it off, but I don't remember the name of the variable.
-- 
			- john romkey
		...harvard!spdcc!kaos!romkey
		       romkey@kaos.uucp
		    romkey@xx.lcs.mit.edu

casey@lll-crg.llnl.gov (Casey Leedom) (02/24/88)

  I got the following reply to my question about why packets addressed to
xxx.xxx.255.255 cause broadcast storms from 4.2BSD based networking
implementations.

Casey

-----
Date: Tue, 23 Feb 88 07:25:13 PST
From: Jim Warner <eshop%saturn.UCSC.EDU@ucscc.UCSC.EDU>

In article <4097@lll-winken.llnl.gov> you write:
> For my own and others' edification would someone explain exactly why
> 4.2BSD based networking responds with gratuitous ARPs for packets
> addressed to xxx.xxx.255.255, etc.?  Thanks in advance.

These packets were sent as ethernet broadcasts.  They were received at
the 4.2BSD hosts.  When the IP layer opens the packet and looks at the
destination address, it sees that the packet is not addressed to this
host.  It also does not recognize the 255.255 as being the IP broadcast
address.  It concludes (falsely) that there is a real host at address
255.255 which should have received this misdelivered packet.  The host
would like to deliver this packet to its proper destination.  To do that,
the host will need the ethernet address of the destination.  An Address
Resolution (ARP) request is issued.  But there is no host at this IP
address and there is no response.  The ARP request is therefore repeated
by each 4.2BSD machine once for misunderstood ethernet broadcast.

Hope that answers your question.

Jim Warner

pdb@sei.cmu.edu (Patrick Barron) (02/24/88)

In article <678@kaos.UUCP> romkey@kaos.UUCP (John Romkey) writes:
>That should only happen if you have a mix of 4.2 machines and later systems
>which broadcast according to spec on the same ethernet. There's a variable
>in the kernel which controls IP forwarding and you can use adb to turn
>it off, but I don't remember the name of the variable.

It's called, oddly enough, "ipforwarding".  There's another variable called
"ipprintfs" which, if set to 1 while ipforwarding is set to 1, will print
a message on the console every time the machine attempts to forward a packet.

--Pat.

dudek@ubglue.ksr.com (Glen Dudek) (02/25/88)

In article <678@kaos.UUCP> romkey@kaos.UUCP (John Romkey) writes:
>
>There's a variable
>in the kernel which controls IP forwarding and you can use adb to turn
>it off, but I don't remember the name of the variable.
>

Unfortunately, if I remember my 4.2BSD ip code correctly, turning off
"ipforwarding" will cause the host to send an ICMP error to the
broadcasting host for each broadcast packet.  A complete fix requires
patching ip_forward() to free the ip packet and return without sending
the ICMP error.  I did this on my pre-3.4SunOS Suns when we brought up
subnetting at Harvard - you need to patch in a jump at the beginning of
ip_forward() to the location in ip_forward() which calls m_freem() and
returns.

--
Glen Dudek
Kendall Square Research
Disclaimer: #include <canonical_disclaimer.h>

ron@topaz.rutgers.edu (Ron Natalie) (02/26/88)

THESE AND OTHER PROBLEMS CAN BE SOLVED EASILY!

TURN OFF IP FORWARDING ON THINGS THAT ARE NOT GATETWAYS.

If machines didn't try to forward apparently misaddressed
packets (or trully broken ones misdirected to them), these
cycles wouldn't occur.  A machine that has one interface that
is not performing some gateway function should just consider these
packets an error and discard them.

Below is a sample ADB which will show you how to turn off ip
forwarding on machines that you don't have source for (provided
they are 4.2 like).  If you have source, set the ipforwarding
variable to zero.


$ su				<--  You need to be root
Password:			<---   Can't help you here :-)
# adb -w -k /vmunix /dev/mem	<--  ADB the kernel
sbr f0711fc slr      649	<--  Crud output by ADB
physmem 1fe

_ipforwarding/X			<--  Find the current state
_ipforwarding:   1		<--  was turned on
_ipforwarding/W 0		<--  Not any more!
_ipforwarding:  0x0             =       0x0
_ipforwarding?W 1		<--  Fix it for the next reboot.
_ipforwarding:  0x0             =       0x0

hans@umd5.umd.edu (Hans Breitenlohner) (02/26/88)

In article <678@kaos.UUCP> romkey@kaos.UUCP (John Romkey) writes:
  [ he explains why some machines will ARP for addresses x.x.x.255.
    Then he states: ]
> ... There's a variable
>in the kernel which controls IP forwarding and you can use adb to turn
>it off, but I don't remember the name of the variable.
>-- 


I have no first-hand experience with this, but I have been told that you
lose either way.  If you turn forwarding off, then you will get ICMP
unreachable messages instead of the ARPs.

kre@munnari.oz (Robert Elz) (02/28/88)

> >There's a variable in the kernel which controls IP forwarding
> 
> If you turn forwarding off, then you will get ICMP
> unreachable messages instead of the ARPs.

If you don't want to, or can't hack your IP code, then ..

There's one hack that you can do if you have a host that can publish
proxy ARP's .. arrange to have the "bad" IP address (the thing with the
trailing 255's) published by some ARP server, with a totally bogus
ethernet address for it (anything that doesn't exist on your cable).

Hosts that know 255 is broadcast will never arp for it, others will
learn the bogus address and forward future packets to that.

This doesn't save any ethernet traffic, but keeps it out of the way
of all the hosts that neither want to receive a hundred ARP requests
nor a hundred ICMP's when 100 old 4.2 hosts geceive a new broadcast.

kre