[comp.protocols.misc] A comparison of Commercial RPC Systems

mishkin@apollo.COM (Nathaniel Mishkin) (07/25/89)

In article <6569@joshua.athertn.Atherton.COM> joshua@Atherton.COM (Flame Bait) writes:
>I've finished work on a paper entitled "A Comparison of Commercial RPC
>Protocols" which I gave at NCF: The Network Computing Forum, and also
>as a work in progress at USENIX.
>
>It compares RPC (Remote Procedure Call) systems from Apollo, Sun and 
>Netwise for speed and dependablity.  I'm posting the paper (in troff 
>format) to comp.misc, so interested people can get it.  ...

I am seeking to end any cross-posting concerning this article by
posting my response to comp.protocols.misc.  I suggest all further
discussion take place there.

-- 
                    -- Nat Mishkin
                       Apollo Computer Inc., Chelmsford, MA
                       mishkin@apollo.com

mishkin@apollo.COM (Nathaniel Mishkin) (07/25/89)

In article <6569@joshua.athertn.Atherton.COM> joshua@Atherton.COM (Flame Bait) writes:
>I've finished work on a paper entitled "A Comparison of Commercial RPC
>Protocols" which I gave at NCF: The Network Computing Forum, and also
>as a work in progress at USENIX.

Below are some comments on Joshua Levy's "A Comparison of Commercial
RPC Systems" recently posted on comp.misc.

In section 3.1 Levy discusses the implications on performance of the
size of UDP/IP datagrams.  It is important to note that, in general, use
of 8K UDP datagrams is a losing proposition.  This is because systems
must fragment such datagrams into pieces that will fit into data-link
(MAC) level packets.  The size of such packets on ethernet is around
1500 bytes.  Thus, on ethernet, an 8K UDP datagram is actually sent as
5 (or so -- I'm not bothering to figure out the size of headers and such)
ethernet packets.  The problem with this is that, given the way IP does
fragmentation (and reassembly), if any of the 5 fragments is lost, they
might as well all have been lost -- neither UDP nor IP has any support
for figuring out what got lost and telling the sender what to resend.
The receiving application sees none of the fragments.

I have observed systems that are simply unable to handle 5 back-to-back
packets, as it would have to in order to successfully process a call
that had 8K of input parameters.  In such a situation, the caller could
retransmit its 8K UDP datagram until the cows come home and it might
never be received intact.  In general, fragmentation works only when
you're using TCP/IP, which has mechanisms for retransmitting lost pieces
of TCP streams.

The problem with large UDP datagrams is exacerbated if the path between
sender and receiver traverses one or more gateways.  There are two reasons
for this: (1) If many senders of large UDP datagrams are trying to use
the same gateway simultaneously, the gateway will get lots of fragments
sprayed to it at a time.  The odds that it will drop one is thus increased.
(2) As the number of gateways between sender and receiver increases,
so does the probability that a packet will be lost (the total probability
being a cumulative function of the probability of each single gateway
dropping a packet).

On the basis of the preceding, I would claim that the performance numbers
for Sun RPC on UDP are, in general, irrelevant, because, in general,
it can't work.  Sure, it'll work in some restricted cases, but people
need to understand the serious restrictions.

In section 3.3 Levy uses the example of (putative) machines with 9 bit
bytes and how this would present a problem for Apollo's data representation
scheme (NDR, not RMR as in Levy's posting that announced his paper).
In fact, in case it wasn't clear, Sun scheme (XDR) and (presumably) ASN.1
have the same problem -- none of the scheme has any special support for
9 bit bytes.  They would presumably represent them all in some larger
standard type.  Thus, the fact that NDR doesn't have a tag that says
"I use 9 bit bytes" produces no problems unique to NDR.

I want to take this opportunity to clear up one misconception about NDR
(which I know Levy doesn't have but which I know many people do):  NDR's
representation suite is not open-ended.  NDR is best thought of as a
"multi-canonical" scheme.  I.e. the sender of data can choose from a
variety (but fixed) set of representations.  We believe the set is usefully
large, but not so large as to be onerous for receivers.  If a sender's
native representation happens to not exist in the set, it is the sender's
responsibility to convert to a one of the NDR set before sending the
data.  Note however, that all the most common representation schemes
currently in existence are covered in NDR's suite. Thus, for systems
with those representations, no conversion is required on sending.  Further,
for such systems, the number of conversion routines required for any
single system is small: 1 for integers (from big to little endian, or
from little to big endian), 1 for characters (from ascii to ebcdic, or
from ebcdic to ascii), and 3 for floating point (from 3 of the 4 possible
floating representation to the native representation).  A total of 5
routines.  Not a big deal.  (Yes, the total number of conversion routines
across all systems is larger, but that's just a matter of coding them
up once, and you're done.)

In section 4.1 Levy discusses the "silent errors" possible with Sun RPC
over UDP as a result of duplicated UDP datagrams.  Two other problem
cases arise in this mode:  (1) If the input or output parameters are
larger than 8K, the UDP solution can not be used at all -- you must use
Sun RPC over TCP.  (2) The caller must be able to approximate the time
it will take for the call to execute at the server end.  In general,
this is a hard number to know (it may be a function of the speed and
load of the server and on the complexity of the input data presented
on each call).  Guessing the wrong value will lead to spurious call
failures.

-- 
                    -- Nat Mishkin
                       Apollo Computer Inc., Chelmsford, MA
                       mishkin@apollo.com

brent%terra@Sun.COM (Brent Callaghan) (07/29/89)

In article <449d9c67.12879@apollo.COM>, mishkin@apollo.COM (Nathaniel Mishkin) writes:
> of 8K UDP datagrams is a losing proposition.  This is because systems
> must fragment such datagrams into pieces that will fit into data-link
> (MAC) level packets.  The size of such packets on ethernet is around
> 1500 bytes.  Thus, on ethernet, an 8K UDP datagram is actually sent as
> 5 (or so -- I'm not bothering to figure out the size of headers and such)
> ethernet packets.  The problem with this is that, given the way IP does
> fragmentation (and reassembly), if any of the 5 fragments is lost, they
> might as well all have been lost -- neither UDP nor IP has any support
> for figuring out what got lost and telling the sender what to resend.
> The receiving application sees none of the fragments.
> 
> I have observed systems that are simply unable to handle 5 back-to-back
> packets, as it would have to in order to successfully process a call
> that had 8K of input parameters.  In such a situation, the caller could
> retransmit its 8K UDP datagram until the cows come home and it might
> never be received intact.  In general, fragmentation works only when
> you're using TCP/IP, which has mechanisms for retransmitting lost pieces
> of TCP streams.
> 
> The problem with large UDP datagrams is exacerbated if the path between
> sender and receiver traverses one or more gateways.  There are two reasons
> for this: (1) If many senders of large UDP datagrams are trying to use
> the same gateway simultaneously, the gateway will get lots of fragments
> sprayed to it at a time.  The odds that it will drop one is thus increased.
> (2) As the number of gateways between sender and receiver increases,
> so does the probability that a packet will be lost (the total probability
> being a cumulative function of the probability of each single gateway
> dropping a packet).
> 
> On the basis of the preceding, I would claim that the performance numbers
> for Sun RPC on UDP are, in general, irrelevant, because, in general,
> it can't work.  Sure, it'll work in some restricted cases, but people
> need to understand the serious restrictions.

I can`t follow this argument at all.  You seem to be saying that 8K UDP
datagrams are no good because you *might* have problems.  I agree
with all your observations on the problems that might happen - but in
practice the problems that you described are rare.

For RPC's it's a big performance win if you can encapsulate multiple
request-response packets into a single UDP packet.  Even if this
packet is fragmented, it's still much more efficient if the network
is reliable enough that packet drops are not a problem.  In practice
local area networks ARE reliable enough to support 8K UDP packets
efficiently.  

Where packet drops are a problem (NFS across slow gateways and unreliable
links) the answer is obvious - just reduce the packet size so that you
take less of a hit from drops.  NFS users can control this with "rsize"
and "wsize" mount options.  NFS users will also complain that performance
is noticeably worse when the the packet size limit is reduced from 8K.


Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 1051

wesommer@athena.mit.edu (William Sommerfeld) (07/30/89)

Brent,

Are you claiming that it's acceptable for a protocol to have to be
tuned by the user in order to avoid congesting gateways or even to
work *at all*?

Are you also claiming that it's acceptable for someone writing an
application using a remote procedure call package to make packet sizes
user-visible?

There's no excuse for explicitly using UDP fragmentation.  It will get
you into trouble.

By the way, Brent, please check your facts.  SUN RPC does NOT
encapsulate multiple request or response packets into a single UDP
packet.. NFS merely makes larger single requests if told to do so.

					- Bill
--

brent%terra@Sun.COM (Brent Callaghan) (07/31/89)

In article <WESOMMER.89Jul29153901@anubis.athena.mit.edu>, wesommer@athena.mit.edu (William Sommerfeld) writes:
> Are you claiming that it's acceptable for a protocol to have to be
> tuned by the user in order to avoid congesting gateways or even to
> work *at all*?

No, I'm not claiming that it's a desireable property of an NFS implementation
that a sysadmin has to fudge around with read and write sizes and timeouts.
Such tuning would be unneccessary if provided by the transport layer (where
it properly belongs) - unfortunately, UDP doesn't offer anything useful here.
There's nothing to prevent an NFS implementation from varying timeouts and UDP
packet sizes automatically based on server response statistics.  I agree,
congestion control should be automatic.

> Are you also claiming that it's acceptable for someone writing an
> application using a remote procedure call package to make packet sizes
> user-visible?

No, I'm not claiming that. It depends on the transport you use.  If you run
the RPC on UDP then you must acknowledge an upper limit on the message size.
If you use TCP instead then you don't really have to worry. 

If I'm using an unreliable datagram protocol like UDP then I have to acknowledge
a tenet of datacomm theory that short messages have a better chance of getting
through OK than long ones.  If you have problems sending long messages then it's
worth trying shorter ones.

> There's no excuse for explicitly using UDP fragmentation.  It will get
> you into trouble.

Yes, it will get you into trouble if you have a lot of drops between client
and server. If drops are negligible then you're a fool not to exploit big
packets.

I can only argue from experience - it seems that NFS runs quite happily in
most networks with fragmentation.  Users complain about poor performance
when you don't fragment.  What would you rather have ?

> By the way, Brent, please check your facts.  SUN RPC does NOT
> encapsulate multiple request or response packets into a single UDP
> packet. NFS merely makes larger single requests if told to do so.

I'm sorry if I was unclear.  I was trying to say that a single
fragmented UDP packet can encapsulate the "effect" of multiple smaller
packets that are not fragmented.  If you have to move lots of data
between client and server with RPC's it's much more efficient to do it
in a few big requests than in lots of little ones. Even if the big
requests get fragmented - the fragments don't have to be acknowledged
individually by the destination.  This presupposes an acceptably small
number of drops which appears to be true for most users of our NFS
implementation.

Are there any NFS implementations that constrain their users from using
fragmented UDP packets ?

Made in New Zealand -->  Brent Callaghan  @ Sun Microsystems
			 uucp: sun!bcallaghan
			 phone: (415) 336 1051

mishkin@apollo.HP.COM (Nathaniel Mishkin) (07/31/89)

In article <118445@sun.Eng.Sun.COM> brent%terra@Sun.COM (Brent Callaghan) writes:
>In article <449d9c67.12879@apollo.COM>, mishkin@apollo.COM (Nathaniel Mishkin) writes:
>> of 8K UDP datagrams is a losing proposition.  ...
>
>I can`t follow this argument at all.  You seem to be saying that 8K UDP
>datagrams are no good because you *might* have problems.  I agree
>with all your observations on the problems that might happen - but in
>practice the problems that you described are rare.

I guess I have to disagree with the "rare".  I have seen some systems
systematically fail to handle 5 back-to-back packets.

>Where packet drops are a problem (NFS across slow gateways and unreliable
>links) the answer is obvious - just reduce the packet size so that you
>take less of a hit from drops.  NFS users can control this with "rsize"
>and "wsize" mount options.

The idea of reducing the packet size (i.e. the number of packets sent
in a burst) is fine.  What's not fine is expecting users or applications
to do the reducing.  It's the business of the underlying RPC system to
handle this.  How would people react to a TCP implementation that had
the property that you'd have to notice that your "send"s were failing
and then make some call to reduce (say) the TCP window size?
-- 
                    -- Nat Mishkin
                       Apollo Computer Inc., Chelmsford, MA
                       mishkin@apollo.com

beepy%commuter@Sun.COM (Brian Pawlowski) (07/31/89)

In article <449d9c67.12879@apollo.COM>, 
mishkin@apollo.COM (Nathaniel Mishkin) writes:
 
> Below are some comments on Joshua Levy's "A Comparison of Commercial
> RPC Systems" recently posted on comp.misc.
 
> In section 3.1 Levy discusses the implications on performance of the
> size of UDP/IP datagrams.  It is important to note that, in general, use
> of 8K UDP datagrams is a losing proposition.

I have to object to the term "in general". I would respond that "in general,
the use of 8K UDP datagrams is a win for transferring large amounts
of data, and has shown to be a win in applications such as NFS."
Performance for smaller transfer sizes (blocks sizes) entails greater
processing overhead in the application layer, and traverses through
the networking layers to accomplish the same overall data transfer.
This would imply to me that the total time to transfer the same
amount of data using smaller packets is greater.

> ethernet packets.  The problem with this is that, given the way IP does
> fragmentation (and reassembly), if any of the 5 fragments is lost, they
> might as well all have been lost -- neither UDP nor IP has any support
> for figuring out what got lost and telling the sender what to resend.
> The receiving application sees none of the fragments.

For "noisy" or "loaded" networks with lots of packet loss, 
smaller packets could make more sense. However, it is pessimistic 
to put this forward as the "general" case.

> I have observed systems that are simply unable to handle 5 back-to-back
> packets, as it would have to in order to successfully process a call
> that had 8K of input parameters.  In such a situation, the caller could
> retransmit its 8K UDP datagram until the cows come home and it might
> never be received intact.  In general, fragmentation works only when
> you're using TCP/IP, which has mechanisms for retransmitting lost pieces
> of TCP streams.

Hmmmm... again, the in general. In general, systems incapable of handling
back to back packets (sometimes only two - I've seen) have problems.
Again, smaller packets would make sense so as not to thrash the
poor performing system. In general, over an extensive network at
Sun, 8K UDP packets provide greater throughput, less overhead.
(You know: Tastes better - less filling.) I would argue against
proposing "least common denominators" as the general case; this
is a pessimistic strategy. I think this points to "the general"
need for flexibility in a distributed application, using a datagram
protocol, to allow static or dynamic (preffered) configuration of
parameters such as UDP packet size (and retransmissions, with backoff
strategies).
 
> On the basis of the preceding, I would claim that the performance numbers
> for Sun RPC on UDP are, in general, irrelevant, because, in general,
> it can't work.  Sure, it'll work in some restricted cases, but people
> need to understand the serious restrictions.

I like your argument, EXCEPT FOR THE FACT THAT IT FLIES IN THE FACE
OF WHAT WE SEE IN LARGE nfs NETWORK INSTALLATIONS. I think you are
approaching this from a pessimistic, worst case scenario which is
a bad way to deal with network throughput. In general, I have to assure
you - it works.

> -- 
>                     -- Nat Mishkin
>                        Apollo Computer Inc., Chelmsford, MA
>                        mishkin@apollo.com


			Brian Pawlowski <beepy@sun.com> <sun!beepy>
			Sun Microsystems, NFS Development

craig@bbn.com (Craig Partridge) (07/31/89)

In article <118603@sun.Eng.Sun.COM> beepy%commuter@Sun.COM (Brian Pawlowski) writes:
>I have to object to the term "in general". I would respond that "in general,
>the use of 8K UDP datagrams is a win for transferring large amounts
>of data, and has shown to be a win in applications such as NFS."

I think this argument about "in general" is missing the key point.
Fragmentation is known not be robust -- see Mogul and Kent's article
on this in 1987 Proceedings of SIGCOMM.  The basic rule is that *if you rely
on fragmentation to work you are building protocols that are guaranteed
to fail in many common situations.*  Indeed, this is why many of us
were pleased to see Sun experiment with methods for dynamically determining
the transmission unit size for NFS (see Bill Nowicki's article in the April '89
issue of ACM SIGCOMM Computer Communication Review).

>I like your argument, EXCEPT FOR THE FACT THAT IT FLIES IN THE FACE
>OF WHAT WE SEE IN LARGE nfs NETWORK INSTALLATIONS. I think you are
>approaching this from a pessimistic, worst case scenario which is
>a bad way to deal with network throughput. In general, I have to assure
>you - it works.

This is a poor counter argument.  Saying "we haven't seen problems"
when there are demonstrable research results that show you will have
problems in certain situations is an ostrich approach.  It leads to
cruddy protocols.

Craig

PS: I understand that there's an element of SUN vs. Apollo protocols here.
All I'm asking is let's stick with agreed upon basic principles of network
protocol design in the debate -- one such principle is that fragmentation is
a bad idea.

TIHOR@CMCL1.NYU.EDU (Stephen Tihor) (08/01/89)

If ywe are talking real world you bet your business sorts of issues then
protocols must not fail in WORST practical case.    From observations of events
in the Internet I assert that to the extent taht the SUN internal network
convinces you that 8K UCP barrages are reasonable it is not a good model for
much of the Internet.


It would  have been straightforwards to design simple feedback loops with a
hysterisis parameter to avoid the known network problems or too direct a
feedback loop into such protocols as the NFS packet size/multi packet code or
RWHO.

Networking protocols should BE DESIGNED NOT TO FAIL IN THE GENERAL CASE.
Protocol designs have to be PESIMISTIC because you can not insure that the
world is a simple catenet of ethernets with moderately compatible machines on
them. Degrade performance if you can't figure out how to autotune down but
don't just fail anmd require an end user to solve an internetwork level problem.

The more widely used a protocol is intended to be the better it should be at
handling bad enviornments.  [I did not find the original explaination for using
UDP o instead of TCP for NFS RPC to be very convincing once SUN started pushing
NFS as a standard for remote files systems.

-------

TIHOR@cmcl1.nyu.edu (Stephen Tihor) (08/01/89)

When SUN chose not to use TCP which is inteded to do much of this work
they had to choose to either close their eyes and pray or to implement
the missing functionality in other ways.

It seems that they relaced a sledge hammer with a ballpeen and a prayer to save
weight.

-------

joshua@athertn.Atherton.COM (Flame Bait) (08/02/89)

Summary of the RPC wars to date:
    I posted a paper showing that Apollo RPC was slower than Sun RPC.
    Tony (of Netwise) posted a paper showing the same thing.
    Nat (of Apollo) pointed out that Sun's UDP based RPC used 8K packets
        which causes fragmentation to occur at the IP layer.  Apollo uses
        1K packets which never fragment. He says this fragmentation is a 
        major problem.
    Other folks pointed out that (in general) fragmentation is a bad thing.
    Brian et al. (of Sun) says that fragmentation is not a problem in most
        (or almost all) configurations, and thinks that Apollo is needlessly
        slowing everyone down to a least common denominator speed.

I have three comments on this:

    1. If for any reason UDP does not work, a person using Sun's RPC system
       can use TCP.  (This option is not available to Apollo's RPC users,
       but is available to Netwise RPC users.)  Except for very small packets
       Sun's TCP based RPC protocol is faster than Apollo's RPC protocol.

       Because of the differences in reliability described in my paper,
       I always compare Apollo to Sun TCP.  Since they are both equally
       reliable I think this is the fairest comparison.  

    2. The fragmentation which Nat is worried about is happening at the IP
       layer, which is two protocol layers below RPC.  I think it is
       unfair to blame the RPC implementor for fragmentation which is 
       happening outside of his control in the protocol stack.  I might 
       complain to Sun's UDP implementation group about the fragmentation, 
       but not to their RPC group.

    3. Apollo is much slower than Sun:

           For half K packets Sun TCP is about the same speed as Apollo.
           For 8K packets Sun TCP is about three times faster than Apollo.
           For 16K packets Sun TCP is about four times faster than Apollo.

The exact numbers are in my paper.  If you want a copy look through the back 
issues of comp.misc, or email me.  I respond to all email, so if you have not 
heard from me, try again with a different path, or give me a call.

Joshua Levy
--------                Quote: "You can stand me up at the gates of Hell,
Addresses:                      but I wont back down!"  -- Tom Petty
joshua@atherton.com          
{decwrl|sun|hpda}!athertn!joshua    work: (408)734-9822   home: (415)968-3718

mishkin@apollo.HP.COM (Nathaniel Mishkin) (08/03/89)

In article <9574@joshua.athertn.Atherton.COM> joshua@atherton.com writes:
>    1. If for any reason UDP does not work, a person using Sun's RPC system
>       can use TCP.  (This option is not available to Apollo's RPC users,
>       but is available to Netwise RPC users.)  

Look:  The theory is that the option is simply not necessary.  You pose
the point as if it is a deficiency with that the application isn't obliged
to make the consideration as to what underlying protocol to use.  I just
don't see it that way.  If NCS/RPC (on UDP) is slower -- is intrinsically
slower -- than some other RPC running on TCP, then that's bad for NCS.
I claim that NCS/RPC is not intrinsically slower.  There's no reason
it can't be at least comparable to (if not faster than) TCP/IP in bulk
throughput.  And in short interchanges (a single client making single calls
in turn to a number of servers), it should be noticably better than TCP/IP.

>    2. The fragmentation which Nat is worried about is happening at the IP
>       layer, which is two protocol layers below RPC.  I think it is
>       unfair to blame the RPC implementor for fragmentation which is 
>       happening outside of his control in the protocol stack.  I might 
>       complain to Sun's UDP implementation group about the fragmentation, 
>       but not to their RPC group.

As I thought had been made clear, the problem isn't the UDP implementation --
it's in the nature of UDP and in IP fragmentation.      

>    3. Apollo is much slower than Sun:
>
>           For half K packets Sun TCP is about the same speed as Apollo.
>           For 8K packets Sun TCP is about three times faster than Apollo.
>           For 16K packets Sun TCP is about four times faster than Apollo.

I know you make it clear in your paper, but I want to make it explicit
here:  Your tests were run with a version of NCS that simply did bulk
data throughput in the most naive way conceivable, under the assumption
that the primary use of RPC was not to move tons of data.  I recognize
this to be a very small-minded assumption, but that's life.  In any case,
I've repented for my sins and my observations are that the latest version
of NCS (released on Apollos, in beta on non-Apollos) has increased its
bulk data throughput by 2-3 times.  Performance work continues and I
hope to be able to provide some new numbers soon.

-- 
                    -- Nat Mishkin
                       Apollo Computer Inc., Chelmsford, MA
                       mishkin@apollo.com

joshua@athertn.Atherton.COM (Flame Bait) (08/04/89)

Nat mishkin@apollo.com writes:
>In article <9574@joshua.athertn.Atherton.COM> joshua@atherton.com writes:
>>    1. If for any reason UDP does not work, a person using Sun's RPC system
>>       can use TCP.  (This option is not available to Apollo's RPC users,
>>       but is available to Netwise RPC users.)  
>
>Look:  The theory is that the option is simply not necessary.  You pose
>the point as if it is a deficiency with that the application isn't obliged
>to make the consideration as to what underlying protocol to use.  I just
>don't see it that way.  

This is an important difference between us. I say that some applications are
better suited for UDP, others for TCP or VMTP.  Since the application 
programmer understands his application the best, he should choose which 
protocol to use.  You say that NCS's beefed up UDP protocol is always the 
best, for all (or almost all) applications.  This may be true in the future.
(I doubt it, though).  It is certainly not true now.

If we ever do find a single communications protocol which is best for all 
(or almost all) applications we should call it HGP, standing for "Holy
Grail Protocol." :-)  In the meantime vendors should provide flexible RPC 
systems.


Joshua Levy
--------                Quote: "The Street finds its own uses for technology."
Addresses:                                  -- William Gibbson
joshua@atherton.com          
{decwrl|sun|hpda}!athertn!joshua    work:(408)734-9822    home:(415)968-3718