[comp.sys.celerity] problem with RARP in newest celerity release

hedrick@topaz.rutgers.edu (Charles Hedrick) (10/22/87)

We have found a fairly serious problem with 3.4.88.  It attempts to
implement RARP.  This is a facility by which diskless machines can
find out their Internet address when they are booting.  RARP is used
by Suns as part of their boot process.  It is also used by our network
gateways when they boot.  On the Sun, at least in SunOS 3.2, RARP is
implemented as a program.  This program reads a file, /etc/ethers,
which contains a list of hosts for which the machine is prepared to
respond to RARP requests, and their Internet and Ethernet addresses.
This approach works fine (though the implementation has a problem that
causes it not to work for us; but that's another story).  Celerity
chose to implement RARP in the kernel.  Instead of reading
/etc/ethers, the kernel RARP support uses the kernel's ARP table.
From external behavior, it looks like RARP takes the Ethernet address,
looks for an entry in the ARP table with this Ethernet address, and
returns its Internet address.  Unfortunately, the ARP table is not
intended to provide mappings in this direction.  It is designed to go
from Internet address to Ethernet address.  If you are using proxy
ARP, using the table in reverse will result in bad information.

For example, our gateways use proxy ARP.  This is a technique by which
gateways respond to ARP requests for hosts on the other side of the
gateway.  Thus there are typically several entries in the ARP table
containing the Ethernet address of the gateway.  If we reboot the
gateway, when it is coming up, it issues an RARP request.  The
Celerities will respond to this request by giving the Internet address
of some system for which the gateway had been forwarding packets.
Thus the gateway comes up with a completely bogus Internet address.
In my opinion, it is a bad practice to do RARP based on the ARP table.
Indeed it is probably a bad idea to do RARP at all unless it is
requested.  I think the approach of a user-mode utility program is a
better one.  I have reported this bug to Celerity.  I'm posting it
here because it can create symptoms elsewhere on the network that are
nearly impossible to trace.  Now you are warned...

The other disadvantage of the using the ARP table is that when a whole
room of machines is coming up after a power failure, all the ARP
tables will start out empty.  In order to handle power failures, any
machine that acts as an RARP server must not depend upon the table
being dynamically built.  Thus rc.local would have to have calls to
/etc/arp to add the appropriate entries statically.  So this approach
does not get around the need for static tables.  I think /etc/ethers
is a better place for this information...

roger@celtics.UUCP (Roger B.A. Klorese) (10/22/87)

In article <15814@topaz.rutgers.edu> hedrick@topaz.rutgers.edu (Charles Hedrick) writes:
>We have found a fairly serious problem with 3.4.88.

To clarify, this is a release built out of source at Rutgers, based on
Celerity release 3.4.77, which did indeed contain the improper RARP
code.  Do not confuse this with a *Celerity* 3.4.88 or later.  The 
problem code was disabled in binary release 3.4.78 and all later 3.4
versions.  RARP will be implemented fully and properly in a later
release.
-- 
 ///==\\   (Your message here...)
///        Roger B.A. Klorese, CELERITY (Northeast Area)
\\\        40 Speen St., Framingham, MA 01701  +1 617 872-1552
 \\\==//   celtics!roger@necntc.nec.com - necntc!celtics!roger

ron@celerity.UUCP (Ron McDaniels) (10/23/87)

mea culpa! The code in if_ether.c implementing a RARP capability in the
kernel in Celerity UNIX field release 3.4.77 is not, and was not intended
to be, a general purpose implementation. It was included as a consequence
of some diskless Sun work I have been doing and became part of the release
by accident. It stayed part of the release because it wasn't supposed to
do any harm.  My apologies for the difficulty it caused. Using my highly
developed 20/20 hindsight, I rather wish it hadn't gone out. 

We became aware of the problem within a few days of the first installations
of 3.4.77 and we quickly released 3.4.78 with the "feature" (rhymes with
creature) disabled and distributed the corrected release to the complaining
sites.

It has always been our intention to implement and release a RARP capability
functionally equivalent to Sun's, including /etc/ethers.

An additional issue disclosed by this problem is the potential confusion
in identifying release levels for source-licensed customers. The RARP
problem existed in 3.4.77, was corrected in 3.4.78 and was reported on
the net as being in 3.4.88. We haven't gotten to 3.4.88 yet!
To avoid this confusion, I would suggest that source-license holders
adopt the convention of naming their kernels as:
'local-orgination.celerity-release-level.local-release-level',
e.g., vmunix.rutgers.3.4.77.88. This may be accomplished by setting the
'version' file in the root of the source directory to the string;

			rutgers 3.4.77 88

The last field in the 'version' file is incremented by the script, 'newvers.sh'
each time a build is done, so subsequent builds will have an ascending
revision number. If you want to change the punctuation, take a look at
'newvrs.sh'.



Ron McDaniels

CELERITY COMPUTING . 9692 Via Excelencia Way . San Diego, California . 92126
(619) 271-9940 . {decvax || ucbvax || ihnp4 || philabs}!sdcsvax!celerity!ron
"Yes, my Precious. . . we hates them socket(2)eses!"