[comp.protocols.nfs] RARP failure on subnets?

milne@ICS.UCI.EDU (Alastair Milne) (04/13/91)

   This is something we found a long time ago with PC-NFS 3.0.1 .
   I have in fact described it a couple of times on the net already,
   in answer to other posts, but it never occurred to me to see if
   Geoff could say something about it.

   We have a net of about 10 PC's, all on a thinnet connected to a
   Sun 3 or 4 in the next building over.  This thinnet is a subnet,
   and the Sun of course is the gateway to the backbone of the department's
   net.  When we originally installed (Geoff, telnet is scarcely letting
   me get 20 characters typed -- sometimes hardly 5 -- without freezing in its
   tracks for several seconds) we wanted to use RARP.  Very attractive,
   after all -- don't even need a simple hosts table on the individual PC.

   But it kept failing.  The boot would often hang at the net start, and
   nothing would happen.  Our network administrator did some investigating,
   and determined that the broadcast packet RARP was sending was asking not
   "are you a YP server?" but "who is your YP server?".  Whatever it was told
   it would try to assign as the YP server for the booting PC.

   Now, there were then a couple of Suns that could provide YP service within
   the subnet, but many more that could not.  But RARP didn't seem to be
   telling the difference.  So it often gave the client PC a YP server the PC
   couldn't use.

   After we stopped using RARP, these problems disappeared; instead, the PC
   makes it through that phase with a minimal 'host' table, then turns on YP
   and grabs a server.  Not bad, but I still wouldn't mind having a fully
   working RARP.  Among other things, our site had at least 2 changes of
   IP addresses, and if our network services hadn't told me about them,
   I need never have known.  RARP made them that smooth.


   So my question to Geoff: a) is our diagnosis correct?  Is this how RARP is
   working?  and b) are its problems being fixed?  I suspect the latter is
   rather slow, because I've just seen a user's reaction to PC-NFS 3.5 on the
   net, and it appears RARP is still not working properly.


   Thanks for whatever you can tell me.

   Alastair Milne,
   Educational Technology Center,
   UC Irvine

gah@hood.hood.caltech.edu (Glen Herrmannsfeldt) (04/14/91)

About RARP and subnets....

First, you have to run a separate rarpd for each ethernet interface.
This is obvious.

The second parameter to rarpd is the address, or name that translates
to the address, of the interface.  This is the address that rarpd
uses as the from address in its reply packets.

Some machines do not care where the reply packets come from.

We recently bought some Sun SLC's, and added them to a subnet that
had a diskless 4/110.  I could not get the SLC's to net boot.

A little etherfind would show that they would ignore the replies
from bootparamd.  It turned out that they came from what it thought
was the wrong address.  It expected the reply to come from the same
machine that it got its rarp reply from.  It did, but the rarp return
address was wrong, so it didn't believe it.

Anyway, check that address.  Some machines may ignore it, so you won't
notice anything wrong.  Some may not, and you won't know why.

I hope this helps someone.  It took me a while to track down, including
calls to sun.  They had no idea, even though I sent etherfind -x output.