[mod.protocols.tcp-ip] more oddities from the swamp

HEDRICK@RED.RUTGERS.EDU (Charles Hedrick) (05/21/86)

I have come up with a somewhat better kludge for notifying my hosts
about the gateway than using broadcast ICMP redirects. The machines I am
worried about are normally configured to run routed as they come out of
the box.  It turns out that it is  possible for a gateway to advertise
via routed that network 0 is one of the destinations it can forward to.
This will set up the gateway as a default gateway on the routed clients.
This seems a more controlled way of doing what I want then unsolicited
ICMP redirects, since routed is the protocol that gateways are supposed
to use to announce themselves.

However there is another interesting problem with either approach. They
both require broadcasts, but it is not clear whether it is safe to do
broadcasts on our networks.  Our problem is that half of our hosts know
about subnetting and the other half don't.  We have to make our Pyramid
and Sun kernels  know about subnetting, because we have machines of both
kinds with more than one interface.  But we have lots of machines from
other vendors that do not support subnets and do not supply source.
Generally this is not a big deal. They can talk to each other.  Routing
entries have to be a bit different for the two classes of machines,
since one thinks all of 128.6 is one network, whereas the other knows
about 128.6.n.  But our gateway can handle the ARP hack, so if a
non-subnet machine ARPs a machine on a different subnet, the gateway
handles it.  The problem comes in when we send a broadcast.  the only
broadcast address that Unix will understand is network.0   (I think this
is probably a bug.  But it doesn't do me any good to send a broadcast
which meets all the RFC's but nobody will receive.  So I'm stuck.)  The
question is, what network number should I use.  If I send to 128.6.0.0,
the non-subnet machines accept it just fine.  But the subnet machines
all think I am trying to send a broadcast on subnet 0.  Since that is
not their subnet, they do not process the broadcast.  For some reason
that I don't quite understand, we get back a bunch of  ICMP ttl exceeded
messages.  As far as I can tell, some of the machines try to forward the
message to subnet zero, and get into a routing loop. If I send to
128.6.4.0, the subnet machines accept it just fine.  But the non-subnet
machines think I am trying to send to host 4.0 on network 128.6.  They
helpfullly attempt to forward it.  Thus I suddenly get an ARP request
from every non-subnet machine, asking for the address of 128.6.4.0.  Our
Arpanet gateway, which also does not know we have subnets, also thinks
we are trying to get it to forward packets to 128.6.4.0.  But it notices
that this is on the same network, pronouces it an utterly bogus request,
and sends back an ICMP redirect to the original sender, saying "do it
yourself, idiot" [by the way, is this a standard use of ICMP redirect?
What is a gateway supposed to do when a host asks it to forward a packet
back onto the same network that it came in from?  Intuitively, it seems
reasonable that an ICMP redirect should occur, but it is not clear what
gateway address one should use.  Our Arpanet gateway uses the
originator's address.  This is somehow morally correct and esthetically
satisfying, but no implementation that I know of knows what to do when
given an ICMP redirect pointing back at itself.]

The net effect of all of this is the any broadcast results in a  flurry
of other network traffic.  This makes it seem unreasonable to use any
protocol that requires a broadcast every 30 sec. 

By the way, this is not the worst symptom of networks containing a mix
of subnet and nonsubnet machines.  Now and then somebody brings up a Sun
kernel that does not have the subnet patches.  If you try to boot a
diskless Sun on a network containing a mix of  subnet and nonsubnet
machines, the entire network comes to a halt for a period of a minute or
so.  (Actually to be fair, the only machines that come to a halt are
other Suns and Celerities.  Our Pyramids and DEC-20s seem unaffected.)


-------

JNC@XX.LCS.MIT.EDU.UUCP (05/21/86)

	Sorry, my bogometer just tripped when I got to the phrase
'routed is the protocol that gateways are supposed to use to announce
themselves'. First, this protocol is not documented anywhere. Second,
it's not an official Internet protocol. Third, it is a gross violation
of IP design philosophy to have hosts know about a gateway routing
protocol for any reason, let alone just to find out where gateways
are.
	The Internet Engineering committee has talked about a new ICMP
message for hosts to find gateways; I'll try to get this written up
soon.

	I sympathize with your problems with Unix and broadcast
addresses.  We get shafted by the same thing; the upward compatability
with all the nonstandard stuff and gratuitous functionality (such as
non-subnet hosts fowarding packets to 128.6.4.0) in 4.2 is an
incredible drag.
	It is to prevent confusion about exactly who is being
broadcast to that I maintain that the 'default' broadcast address to
use is all 1's, rather than use net.1's or subnet.1's, etc. It's much
harder to lose using all 1's; the only possible meaning is 'local
wire'.
	Unfortunately, it's way to late to put the 4.2 genie back in
the bottle. Maybe we should all quit and become wood carvers?

	Noel
-------

HEDRICK@RED.RUTGERS.EDU (Charles Hedrick) (05/21/86)

I have been through this with Mills also.  Everything that you say
is right.  But it is also nearly irrelevant for those of us out
in the swamp surrounded by binary-only alligators.  I would be
quite happy with an ICMP who is my gateway or ICMP I am the default
gateway.  But until such a thing shows up, I use what I have, and
that is routed.  I think this list needs to discuss both what the
standards should be and what we should do until it becomes practical
to use the standards.  My best estimate is that it will take about
3 years between the time you define something and when we can
depend upon using it.  At the moment it is likely to take longer
than 3 years, since anything that is done now will have just missed
a new release of Berkeley Unix.  A number of vendors don't do anything
until a features shows up in BSD.  So any new ICMP things are likely
to have to wait for 4.4, then another year or so until all the vendors
incorporate the 4.4 features.  Lest you think I am being overly
pessimistic, look at the slow spead of subnets.  A change in broadcast
address is going to be particularly messy, since unless everybody does
it at the same time (a manifest impossibility), we could be in for
some incompatibility.  I would recommend that implementations should be
prepared to accept any of 
  net.0
  net.subnet.0
  -1
for the moment.  Once everyone recognizes -1, all senders would start
using it, and the 0's could be phased out.  Alternatively, your
subnet RFC could have specified that the correct broadcast address
for implementations that use 0's is net.0, not net.subnet.0.  then
we would have one less incompatibility to deal with.

By the way, a month or so ago we got this shiny new collection of
Internet protocol documents.  I assume this is what most vendors
use to do their implementations.  I didn't see subnetting it in.
I didn't see any signs of -1 being specified as the broadcast address.
How many vendors do you think know that they are supposed to be
changing the broadcast address?
-------

mar@MONK.PROTEON.COM (Mark A. Rosenstein) (05/22/86)

There is something that can be done to cut down on the flurry of
packets in response to a broadcast.  On all of the 4.2 Unix machines
which are not serving as gateways, turn off IP forwarding in the
kernel.  This will cause them to no longer send ICMP messages or
forward packets that were not specifically addressed to them in the
first place.  Forwarding is controlled by the variable _ipforwarding.
If this is set to zero, these extra packets will not be generated.  On
binary-only systems, you still can turn it off using adb on /vmunix.
					-Mark

JNC@XX.LCS.MIT.EDU ("J. Noel Chiappa") (05/22/86)

	The reason the subnet RFC didn't speak about net.0 versus
net.subnet.0 is a) we wanted net.broadcast to mean something different
from net.subnet.broadcast, and b) that spec was written, although
unfortunately not released, before 4.2 came out.
	I have already complained fairly strongly to Jon Postel about
the lack of subnet stuff in the IP protocol documents. I didn't
realize that broadcast was also missing.

	The whole problem of out of date implementations attached to
the network system is one of the key organizational (as opposed to
technical) problems the system faces right now. You are correct that
currently the time around the loop is several years. The problem is
that this kind of delay is not acceptable if the system is going to
continue to grow successfully at the rate that it has been growing.
Things are going to break and need to be replaced, and we can't wait
around for years until vendors feel like fixing things.
	I think we need two things: a certification process, and
pressure from buyers. The need for the first is clear; right now there
is no comprehensive test of whether or not something correctly obeys
all the protocols. It's hard to beat up vendors when you don't have a
definite measuring stick for them. Second, people have to get militant
about timely support. Don't buy something if you don't get either a)
source, or b) a written commitment to pass the testing suite within N
months of changes.
	Such products may cost more, but look at the other side: you
will waste countelss skilled people hours pasting things together if
you buy whatever's cheapest. But people *have* to exercise discipline
and say to vendors "No, your stuff does not meet the spec, *go away*".

	Noel
-------

louden@MITRE-GATEWAY.ARPA.UUCP (05/22/86)

The "shiny new collection of Internet protocol documents" does have
the "-1" address. See volume 3 page 177 last paragraph.
It does not say "-1" because that makes assumptions about your hardware
it is described as 'addresses of all ones are to be interpreted as meaning
"all"'.  The word broadcast is also not used because it makes assumptions
about your hardware.

Subnets are mentioned in volume 3 page 244 but I did not see any detailed
discussion.

The documents make an excelent starting point but may not be complete for
all applications.