[mod.protocols.tcp-ip] a couple of implementation issues

hedrick@TOPAZ.RUTGERS.EDU (Charles Hedrick) (01/12/86)
I would like to describe Rutgers current network configuration, and then
mention some of the problems we are looking into at the moment.  I would
like to see whether my ideas seem reasonable to this community, and 
whether others have any better approaches.  The major issues will involve
addressing in an environment that uses a mix of Ethernet-level and
IP-level gateways, and how to set up a system with redundant IP gateways
so that it will survive gateway failures.

First, the configuration.  We have 5 Ethernets currently in operation,
with several others coming on line shortly.  Four of them are
connected by an IP gateway built using a design from Stanford.  It is
a 68000 multibus system (Forward Technology SUN board), with 3Com
Ethernet interfaces.  The software handles PUP, IP, and XNS.  It is a
full PUP gateway, handling PUP directories and routing protocols.  IP
support is more limited, including only ARP and ICMP echo.  The IP
support assumes that subnetting is in use, with 8-bit host addresses
and 8-bit subnet addresses.  It implements the "ARP hack", so that
hosts can use it even if they don't know about subnets.  Stanford
estimates a capacity of about 250 packets per second.  However recent
tweaking of the code has probably increased this.  (We haven't pushed
it hard enough to see this limit yet. The only limit we have seen is
that Sun 3's that use NFS through the gateway have to have some
non-default parameter settings.  This is a known problem with the 3Com
Ethernet interface, which also affects some older Sun 2's.) [For
those who may be interested in duplicating this, there are now
commercial equivlents of this gateway.  Proteon sells one that should
be fairly similar, though with higher performance and more IP support.
It should handle EGP.  Len Bosack from Stanford has apparently started
a company that will market a re-engineered version of the Stanford
gateway.  You might also check Bridge Communications and DEC.]

For hosts in isolated buildings, we are installing a broadband cable
system.  We plan to use Applitek Ethernet bridges.  That is, each
building will have an Ethernet.  The Ethernets will be connected via
the broadband cable.  The Applitek bridges work at the Ethernet level.
That is, they watch every packet on the Ethernet.  They dynamically
build a list of all machines on the local Ethernet.  When they see a
packet addressed to a machine that is not on the local Ethernet, they
forward it to the proper Ethernet via the broadband.  (Actually, there
is somewhat more control available if you need it.)  They forward all
broadcast packets to all Ethernets.  We do not yet have throughput
data on it, as the system is new and is still in test.  It does seem
to be able to handle Sun 3 NFS transmissions with default parameter
settings on the Sun.  The Applitek bridges are 68000-based systems,
with a fair amount of hardware in them.  I'm fairly sure there is more
than one 68000 in there.  It uses a modern Ethernet interface, with
its own processor.  The broadband communications use one 6MHz channel,
and can handle 10Mbits/sec.  (Yes, it is possible to get more bits in
a channel than its bandwidth.  This has always seemed to me to violate
some basic principle, but sophisticated communications technology can
get more bits/sec than Hz.)  Our first setup, which will probably be
put in operation this week, will connect two Ethernets, one of which
is also on the gateway described in the previous paragraph.  [If you
are in the market for one of these, other vendors that I know of with
similar products are Proteon and possibly Bridge Communications.  Both
of these products will use IP gateways between the local Ethernet and
their long-haul network.  This has both advantages and disadvantages.
It allows some improvements in support of TCP/IP, but it also means
that you can't handle DECnet and other protocols.]

The first issue is how to set up IP addresses for the Ethernets to be
connected via the Applitek bridges.  Initially we figured that each
Ethernet would be a subnet, just like those connected by the IP
gateway.  However on second thought, I believe that is a mistake.
Consider the following situation.
   subnets 6 and 7 are connected via Applitek bridge
   subnets 4 and 6 are connected via IP gateway
   a host on subnet 6 wants to talk to a host on subnet 7.
The conversation will have to go through the Applitek bridge.  Recall
that this operates at Ethernet level.  That means that the source host
will have to send an Ethernet packet with the final destination's
Ethernet address in it.  In order to find this address, it will have
to issue an ARP.  If the host on 6 knows about subnets, it will
consider subnet 7 to be a separate network.  It will not issue an ARP
to try to find the host.  Rather, it will expect to find a gateway in
its gateway table (or use its default gateway).  With all subnet
implementations that I know, there is no way to tell a host to use a
gateway to talk to subnet 4, but to issue ARP's and talk directly to
subnet 7.  Once you turn on subnetting, it will expect to find
gateways for all subnets.  Obviously we could change this behavior.
But we are reluctant to adopt a network design that violates the
subnetting RFC's, and requires us to make kernel changes to systems that
use it.  Thus we reluctantly conclude that all of the Ethernets that
are connected by the Applitek bridge must be considered a single 
subnet.  I don't much like this, because I think eventually we are
going to end up using IP gateways.  In order to install an IP gateway
between two Ethernets that are currently connected by the Applitek
bridges, we would have to remove the Applitek bridge from one of them,
give it a different subnet number, change the addresses of all of
its hosts, and then install the IP gateway.  Does anyone see something
I am missing?

The second issue involves gateway reliability.  This is not a problem
that is immediately pressing.  The gateway code from Stanford is the
only piece of software I have used that has never crashed.  But now
and then we do take it down for development work, and we do get
complaints from people who are suddenly disconnected.  We have several
Unix systems with more than one Ethernet interface.  These hosts could
act as gateways.  While their performance as gateways would not be as
good as a dedicated 68000 gateway, they would be fine as backup
gateways.  The question is, how do we set things up so that a
connection will move from one gateway to an alternate when the first
one goes down.  4.3 has some hint of the basic ability needed.  When
TCP is about to time out a connection, it first tries to compute a new
route.  However in order for this to help, two things must be true:
  - the system has to know that a gateway is in use.  This means
	that we can't use the ARP hack.  We have to install subnet
	support on all the hosts.
  - something has to change in the system's routing database, or it
	will choose the same bad route again.  This seems to imply
	that all of the hosts must be running routed or EGP, and
	that the gateways must all support it.
Initially I had hoped that all of the intelligence could be put
into the gateways.  However this seems to be incompatible with the
current design of Unix.  Here's how I would do it with TOPS-20:
The gateways would know about each other.  They would exchange
EGP, so they know if the other is up.  Dual-homed hosts would
know that it is better to use the dedicated gateway if it is up.
So any attempt to use a dual-homed host as a gateway would result
in an ICMP redirect telling the sender to try the dedicated gateway,
unless the dedicated gateway is down.  Here is what a normal host
would do:
  - its gateway table would list both the dedicated gateway and
	the dual-homed host.  (If there were losts of gateways
	accessible to it, only 2 or 3 would need to be listed.)
  - when starting a connection, if the system didn't already have
	a route to the destination system, it would send the packet
	to a randomly chosen "prime" gateway.  If it chose the
	wrong one (e.g. a dual-homed host, when the dedicatd
	gateway is up), it would be directed to the right one
	via ICMP redirect.
  - it periodically pings all gateways that it knows about.  If
	one goes down, it is marked as such, and a new route is
	used in the future.
Since we have a mix of Unix and TOPS-20 systems, it looks like
we may have to do either
  - add routed support to TOPS-20
or
  - add EGP support to Unix and TOPS-20.  (This assumes that it is
	practical to use EGP on every host.  I have a suspicion that
	EGP was really only intended for use between gateways.)
or
  - add code to Unix to mark gateways as down when connections
	using them time out.  (It is not clear quite how we
	would find that they are up again.)
  - add code to Unix so that dual-homed gateways issue ICMP
	redirects if they are asked to forward a packet for which
	they know of a better gateway
Does anybody have reason to prefer one of the other approach?