[comp.protocols.nfs] Is RPC in SVR4 implemented in kernel or user level?

rock@cbnews.att.com (Y. Rock Lee) (07/10/90)

In the beta version of 3B2 SVR4 source directory, I found two rpc/ 
directories: one in the kernel area and another in the user lib area.
Files sys/rpc/svc.c and librpc/svc.c look extremely similar to me.
Does this mean SVR4 has two "duplicate" RPCs for different purposes?
I am totally lost here. Can some guru out there give me a hint?
Thanks.

Y. Rock Lee, att!cblph!rock
             rock@cblph.ATT.COM

thurlow@convex.com (Robert Thurlow) (07/11/90)

rock@cbnews.att.com (Y. Rock Lee) writes:

>In the beta version of 3B2 SVR4 source directory, I found two rpc/ 
>directories: one in the kernel area and another in the user lib area.
>Files sys/rpc/svc.c and librpc/svc.c look extremely similar to me.
>Does this mean SVR4 has two "duplicate" RPCs for different purposes?
>I am totally lost here. Can some guru out there give me a hint?

The kernel has to do RPC over UDP/IP for NFS accesses, so some files
are needed there.  More files are needed to do RPC for general user
level applications, and they live in the libraries.  The common files
in the Sun NFSSRC reference port are identical; our revision control
system has links in some underlying directories to ensure that the
changes made are made to both kernel and user level code.

Rob T
--
Rob Thurlow, thurlow@convex.com or thurlow%convex.com@uxc.cso.uiuc.edu
----------------------------------------------------------------------
A geek is a terrible thing to waste.  Help support the 1990 "Get A Life"
Initiative!  Send your contributions now! Operators are standing by!

guy@auspex.auspex.com (Guy Harris) (07/12/90)

>The kernel has to do RPC over UDP/IP for NFS accesses, so some files
>are needed there.  More files are needed to do RPC for general user
>level applications, and they live in the libraries.  
>The common files in the Sun NFSSRC reference port are identical;

In fact, in the SunOS source tree, the kernel RPC source directory is a
symlink to the "libc" RPC source directory (or maybe it's the other way
around, it's been a while).  That doesn't seem to be the case in NFSSRC,
for some reason.

Dunno why they aren't identical in S5R4 beta.  I sincerely hope it's for
a good reason, not just "we forgot they had to be kept in sync".

In fact, the only reason I'd consider acceptable for not making them
symlinks - which S5R4 *does* support, on both the UFS (BSD) and S5 (V7)
file systems - would be to let people read the source tree in on
machines that don't support symlinks.  (That might be the reason on
NFSSRC, if there aren't any other symlinks there.  4.3BSD, upon which
NFSSRC 4.0 is based, supports them, but they may have wanted to let
people read the source tree in under other UNIX systems.)

liam@cs.qmw.ac.uk (William Roberts) (07/12/90)

In <103795@convex.convex.com> thurlow@convex.com (Robert Thurlow) writes:

>The kernel has to do RPC over UDP/IP for NFS accesses, so some files
>are needed there.  More files are needed to do RPC for general user
>level applications, and they live in the libraries.  The common files
>in the Sun NFSSRC reference port are identical; our revision control
>system has links in some underlying directories to ensure that the
>changes made are made to both kernel and user level code.

Interesting. Back in NFS 3.0 days there was a significant
difference in that kernel RPC is assumed to be for NFS purposes
and so assumes idempotence, whereas that user-level RPC stuff
doesn't make this assumption.

The actual difference this makes is in the handling of the xid
when a request times out. In the kernel case (idempotent) the
xid for the retransmission is the same as the original request
and so a delayed reply to the original message would be
acceptable. In the user level case (non idempotent) the xid is
incremented so a delayed reply to the original request will
actually be rejected.

The non-idempotence assumption is why the dreadful slowness of
the mountd daemon actually causes severe problems (see annual
discussions on this group). I have been saying for years that
SunRPC should allow you to specify which behaviour you want,
but nobody ever seems to take any notice.

Back to Rob's message: if you have identcial source, do you
still have this distinction in the behaviour and if so, how do
you achieve it?
-- 

William Roberts                 ARPA: liam@cs.qmw.ac.uk
Queen Mary & Westfield College  UUCP: liam@qmw-cs.UUCP
Mile End Road                   AppleLink: UK0087
LONDON, E1 4NS, UK              Tel:  071-975 5250 (Fax: 081-980 6533)

guy@auspex.auspex.com (Guy Harris) (07/13/90)

 >Interesting. Back in NFS 3.0 days there was a significant
 >difference in that kernel RPC is assumed to be for NFS purposes
 >and so assumes idempotence, whereas that user-level RPC stuff
 >doesn't make this assumption.
 >
 >The actual difference this makes is in the handling of the xid
 >when a request times out. In the kernel case (idempotent) the
 >xid for the retransmission is the same as the original request

Really?  Do you mean a timeout internal to RPC - i.e., one where the RPC
layer does the retransmission - or a timeout that RPC reflects to its
caller, which does the retransmission itself?

The NFSSRC4.0 user-mode UDP transport code appears to bump the XID on
each call (i.e., when "clntudp_call" is first called, and whenever it
decides to refresh its credentials), but not on RPC-level
retransmissions.

The only difference in the NFSSRC4.0 kernel-mode UDP transport code
appears to be that it doesn't bump the XID when it decides to refresh
its credentials.

In other words, neither the user-mode nor the kernel-mode UDP client
code appears to transmit the same XID on retransmissions initiated by
the caller of the RPC layer; in particular, once you get "NFS server xxx
not responding still trying", and the NFS code is doing its own
retransmissions for a hard mount, the XID gets bumped.  (This is
especially nasty on create operations; somebody turned off RPC-layer
retransmissions for create operations, so no retransmitted RPC operation
*ever* goes out with the same XID as the original operation, and the
duplicate request cache can't do its job.)

brent@terra.Eng.Sun.COM (Brent Callaghan) (07/14/90)

In article <2503@sequent.cs.qmw.ac.uk>, liam@cs.qmw.ac.uk (William Roberts) writes:
> In <103795@convex.convex.com> thurlow@convex.com (Robert Thurlow) writes:
> 
> >The kernel has to do RPC over UDP/IP for NFS accesses, so some files
> >are needed there.  More files are needed to do RPC for general user
> >level applications, and they live in the libraries.  The common files
> >in the Sun NFSSRC reference port are identical; our revision control
> >system has links in some underlying directories to ensure that the
> >changes made are made to both kernel and user level code.
> 
> Interesting. Back in NFS 3.0 days there was a significant
> difference in that kernel RPC is assumed to be for NFS purposes
> and so assumes idempotence, whereas that user-level RPC stuff
> doesn't make this assumption.
> 
> The actual difference this makes is in the handling of the xid
> when a request times out. In the kernel case (idempotent) the
> xid for the retransmission is the same as the original request
> and so a delayed reply to the original message would be
> acceptable. In the user level case (non idempotent) the xid is
> incremented so a delayed reply to the original request will
> actually be rejected.
> 
> The non-idempotence assumption is why the dreadful slowness of
> the mountd daemon actually causes severe problems (see annual
> discussions on this group). I have been saying for years that
> SunRPC should allow you to specify which behaviour you want,
> but nobody ever seems to take any notice.

This isn't right.  User-level Sun RPC has always had two levels
of timeout.  In clnt_create() you specify a retry timeout
to be used withing the RPC code.  Retries based on this timeout
keep the same xid.  Prior to SunOs 4.1 the timeout was constant 
i.e. if you specified 2 sec then the client will retry every
two sec until a response is received.  In SunOs 4.1 this was
changed to be an exponential backoff i.e. if you specify 2 sec
then the retry intervals will be 2,4,8,.. with a limit of 30 sec.

For each clnt_call() you can specify a total timeout.  Within this
total timeout the lower level RPC code will retry at intervals
based on the retry timeout.  Since a new xid is allocated for
each clnt_call(), the server will treat each clnt_call() as a
new request i.e. you cannot make use of a xid-based duplicate
request cache on the server if you're retrying with clnt_call's.

In SunOs 4.1 a duplicate request cache was implemented on the
server to detect duplicate NFS retransmissions.  Some changes
were required in the NFS client-side code for 4.1 to keep the
xid constant across retransmissions so that the duplicate request
cache would be effective.  This same change was made in Ultrix
V3.0 for the same reason (see Chet Juszczak's winter '89
USENIX paper).
--

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 1051

thurlow@convex.com (Robert Thurlow) (07/14/90)

liam@cs.qmw.ac.uk (William Roberts) writes:

>Interesting. Back in NFS 3.0 days there was a significant
>difference in that kernel RPC is assumed to be for NFS purposes
>and so assumes idempotence, whereas that user-level RPC stuff
>doesn't make this assumption.

Oh, there's still a significant difference; there are four files
that do "kernel RPC" and do not get included in libc.  My point
was that the majority of the files were the same and so were
symlinked or replicated as appropriate.  As to mount/mountd,
I don't know enough about the user-level side to comment, but
I do know different call routines are used.

Rob T
--
Rob Thurlow, thurlow@convex.com or thurlow%convex.com@uxc.cso.uiuc.edu
----------------------------------------------------------------------
A geek is a terrible thing to waste.  Help support the 1990 "Get A Life"
Initiative!  Send your contributions now! Operators are standing by!