[comp.unix.wizards] How to use Sun RPC for large slow procedure calls

gnb@bby.oz (Gregory N. Bond) (10/04/89)

This is on Sun 3/60 and Sun 3/260, running SunOs 3.5, soon to be on a
Solbourne server, running SunOs 4.0.3 all round.

I have an application that occasionally requires lookups to a
database.  These lookups are often quite lengthy, taking up to a
minute or more.  As the database code is quite large, and a lot of
processing code is associated with it, I'd rather not include all this
code in every copy of the application.  (Not to mention that a
database backend for each copy of the application would murder the
server in no time.)

So I have used the Sun RPC compiler to create a database server task
that runs on the main machine, and an rpc client library to link into
the application.

Now this works quite well for the small fast calls, as they are
usually answered within the timeouts.  However, on the more complex
queries (and it is not easy to tell what's complex from the client as
a lot of complexity is hidden in the server), the RPC times out and
resends.  This banks up extra work for the server that isn't
necessary, doing one difficult job 5 times.  What is worse, the rpc
client routines often timeout totally, and return an error, even
though the server is happily chugging away on the request.

Also, if the server machine is busy, or the server process has been
swapped etc, or there is lock contention for the tables, then even the
short queries time out and fail, only to work when immediately
retried.  This is quite un-friendly for the (largely non-technical)
user base!

I have tried using the TCP transport, which one would have thought
meant that timeouts were not appropriate, but that doesn't seem to be
the case.  Setting very large timeouts is also not an option as I
would like quick indication that the server is unavailable.

It seems that Sun RPC is designed for many small and quick calls (e.g.
NFS), rather than large slow calls (as in my example, or for things
like numerical compute engines inverting large matricies).  This is
shown by the heritage of using UDP and timeouts rather than the flow
control and reliability in TCP.

Does anyone have any idea how to approach this sort of RPC
application?  Is there some trick or section of TFM that I have
overlooked?  Does anyone have (oh joy oh bliss) some code for this
type of RPC work?  Or do I drop Sun RPC entirely as the wrong tool and
handcraft something using "raw" TCP sockets?

Greg.  The network sort of computes!
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb

Kemp@DOCKMASTER.NCSC.MIL (10/06/89)

Gregory Bond writes:
 > [description of RPC based database application that sometimes
 >  takes a long time for lookups]
 >
 > Does anyone have any idea how to approach this sort of RPC
 > application?  [...] Or do I drop Sun RPC entirely as the wrong
 > tool and handcraft something using "raw" TCP sockets?

No need to use sockets.  However you are correct in stating that
lengthening the RPC timeouts is a bad idea.

The accepted solution to this very common situation is to use callback
RPC's, in a scenario something like this:

 1. Client requests an action
 2. Server does whatever can be guaranteed to finish quickly (checking
    request validity, user authentication, whatever) and acknowledges
    the request
 3. Server does the lengthy job (computation or database lookup) and
    when it's finished, sends a callback RPC to the client

This is analagous to asynchronous I/O, and requires a similar amount of
care by the programmer to do the job right.  Nonetheless, it *is* the
right way to do it.  I am constantly annoyed by people who write Sunview
programs that do lots of computation in notifier event routines, instead
of just starting a job and returning.  What you are left with is a
window that just sits there for seconds or minutes, refusing to refresh
itself.

If you want to get fancy, you could have a 'progress meter' that gives
the user some feedback as to how his database query is progressing (and
lets him know if the database server or the net goes down), and gives
him some opportunity to abort the job if it is taking too long or not
going in the right direction.

   Dave Kemp <Kemp@dockmaster.ncsc.mil>

mishkin@apollo.HP.COM (Nathaniel Mishkin) (10/06/89)

In article <GNB.89Oct4171744@baby.bby.oz> gnb@bby.oz (Gregory N. Bond) writes:
>It seems that Sun RPC is designed for many small and quick calls (e.g.
>NFS), rather than large slow calls (as in my example, or for things
>like numerical compute engines inverting large matricies).  This is
>shown by the heritage of using UDP and timeouts rather than the flow
>control and reliability in TCP.
>
>Does anyone have any idea how to approach this sort of RPC
>application?  Is there some trick or section of TFM that I have
>overlooked?  Does anyone have (oh joy oh bliss) some code for this
>type of RPC work?  Or do I drop Sun RPC entirely as the wrong tool and
>handcraft something using "raw" TCP sockets?

You could use NCS RPC (available for Suns from HP/Apollo).  NCS uses
UDP, but you don't supply timeout values.  It pings the server periodically
(and with lower frequency as the call proceeds without problems) itself.
If the server becomes unresponsive, the client finds out about it.  There
are various other technical difference which I won't enumerate, for fear
of being accused of self-promotion.
                    -- Nat Mishkin
                       Hewlett Packard Company / Apollo Systems Division
                       mishkin@apollo.com

ka@cs.washington.edu (Kenneth Almquist) (10/08/89)

> Gregory Bond writes:
>> [description of RPC based database application that sometimes
>>  takes a long time for lookups]

Kemp@DOCKMASTER.NCSC.MIL replies:
> The accepted solution to this very common situation is to use callback
> RPC's, in a scenario something like this:
>
>  1. Client requests an action
>  2. Server does whatever can be guaranteed to finish quickly (checking
>     request validity, user authentication, whatever) and acknowledges
>     the request
>  3. Server does the lengthy job (computation or database lookup) and
>     when it's finished, sends a callback RPC to the client

Accepted by whom?  The technical problems of implementing RPC systems
are basicly solved.  The political problem of keeping SUN from foisting
lousy software on the world is harder to solve, but it may be that
Gregory can manage to get hold of another RPC system.  (Nat Mishkin
mentions one such system.)  Lacking a good RPC system, I would be
inclined to use sockets rather than SUN RPC.  Kemp acknowledges of
the kludge he describes:

> This is [analogous] to asynchronous I/O, and requires a similar amount
> of care by the programmer to do the job right.

Care of course translates into programmer time.  It won't necessarily
take any longer to in effect build your own specialized RPC mechanism
on top of sockets.  And this latter approach won't force you to mangle
your code to mesh with the IPC mechanism.
				Kenneth Almquist

weiser.pa@xerox.com (10/08/89)

An alternative approach to callbacks is to keep things under the control of
the client using "status" calls.  For instance, we use this in a document
retrieval application here, like this:

Client initiates action, gets back a session handle and a completion
estimate.  The completion estimate is used by the client for a time to
callback and see how things are going (using that same session handle), and
get another completion estimate.  And so on.  Clients that want progress
reports more frequently call back more often--the completion estimate is
just a hint.

This way RPC clients don't have to also know how to be servers, and/or pass
around procedure call handles through RPC's (which would be the extension
of the usual callback method to RPCs).  Server code is also relatively
simple--it generally pretty easy to keep some kind of status report about
how things are going, and to return completion estimate hints that not only
take into account how long the thing is actually going to take but how busy
your are (and so how often you want to hear from the client).

Completion estimates are a three part field: seconds until done, integer
that increases while there is progress, and percent done.  Clients that
want to give progress feedback to their users use the percent done field to
show work getting completed.

Naturally, clients that can't take a hint can bring the system to its
knees.  But that is a universal.

-mark-

Kemp@DOCKMASTER.NCSC.MIL (10/10/89)

Kenneth Almquist writes:
 > Dave Kemp (me) replies:
 >> The accepted solution is to use callback RPC's [...]
 >
 > Accepted by whom?  The technical problems of implementing RPC systems
 > are basicly solved.  The political problem of keeping SUN from foisting
 > lousy software on the world is harder to solve, but it may be that
 > Gregory can manage to get hold of another RPC system.  (Nat Mishkin
 > mentions one such system.)  Lacking a good RPC system, I would be
 > inclined to use sockets rather than SUN RPC.  Kemp acknowledges of
 > the kludge he describes:
 >
 >> This is analagous to asynchronous I/O, and requires a similar amount
 >> of care by the programmer to do the job right.
 >
 > Care of course translates into programmer time.

Or at least programmer thought.  I certainly wouldn't characterize
callbacks as a "kludge", any more than device interrupts are a "kludge"
to get around the problems of polled I/O.  Interrupts, and signals, and
callbacks, (the concepts, not necessarily the particular
implementations) are the elegant solution to the problem of maximizing
the productivity of a multi-threaded system.

The database application can be handled synchronously without much
thought:
 1) Issue a query
 2) Wait for the answer (for as long as it takes)
 3) Continue

or it could be handled asynchronously, with a little more thought:
 1) Issue a query
 2) Do something else (like update the progress meter)
 3) Get signalled when the query is finished
 4) Continue (use the results of the query)

The programmer has to decide what to do while waiting for his answer,
regardless of the mechanism used to get it (sockets, Sun RPC, HP/Apollo
RPC, or whatever).  The synchronous method is a no-brainer.
Asynchronous is not a kludge, it's a matter of good careful human
engineering.

I'll take good tools wherever I can get them.  If GNU RPC :-) does the
job better than SUN RPC, I'll use it.  If you find raw sockets easier to
use and document and port and maintain, then by all means use sockets.
What I don't understand is your crack about "the political problem of
keeping Sun from foisting lousy software on the world".  If you don't
like SunOS, then you certainly don't have to pay for it; you can buy DEC
or HP or Data General workstations instead.  Or, as Mr.  Mishkin
suggests, buy HP RPC to run on your Sun.

The original poster was looking for an answer to his particular problem;
i.e.  he has an application running *now* on Suns, and he needs a
solution.  To say that Sun RPC is 'lousy', that the technical problem is
solved, and that you would use sockets (without giving any examples) is
not very constructive.

   Dave Kemp <Kemp@dockmaster.ncsc.mil>

rsalz@bbn.com (Rich Salz) (10/11/89)

In <21090@adm.BRL.MIL> Kemp@DOCKMASTER.NCSC.MIL writes:
>  I certainly wouldn't characterize
>callbacks as a "kludge", any more than device interrupts are a "kludge"
>to get around the problems of polled I/O.
If the RPC system decides to do time-outs, and you have to do something
to circumvent that, then yes, it is a gross hack.

Sun RPC should check to see if the (remote) CPU and the receiving process
have died before blindly resubmitting the request.

Anything else requires the application level to work-around the IPC level,
and that's just about the best definition of a kludge I've ever heard of.
	/r$
-- 
Please send comp.sources.unix-related mail to rsalz@uunet.uu.net.
Use a domain-based address or give alternate paths, or you may lose out.