[comp.sys.sun] Deadlock in rpc interprocess communication

thomas%apgraph%ap542@ztivax.siemens.com (Thomas Oeser (DI AP 313)) (09/16/90)

Hy RPC Specialists!

Can anyone help me to solve the folowing deadlock situation I have:

		             1 calls 2
                     +-----------------------+
		     |                       |
                     |                      \ /
		+----+----+		+----+----+
		|         |		|         |
    Start ----->| Process |		| Process |
		|    1    |		|    2    |
		|         |		|         |
		+---------+		+---------+
		    / \                      |
                     |                       |
                     +-----------------------+
			     2 calls 1

I have a set of interworking processes communicating via RPCs. They
implement a distributed object oriented environment, where the methods of
a particular object are implemented in several processes (e.g. Mailing,
Editing, Storing, etc.). In addition (most of these) are driven by a user
interface (i.e. are interactive and may act as the initator of an action).

In this scenario the following situation may happen:

	- Process 1 calls Process 2 and this call returns;
	- initiated by this call Process 2 continues processing and calls 
	  Process 1 and waits for a return;
	- Now, if another cycle of this type is started (i.e. call of 2 by 1)
	  when process 2 is just calling process 1, both processes hang in
	  the call_rpc() procedure waiting for the reply of the other process.

The point is that within this scenario I have no longer processes which
are pure clients or servers. Both aspects are found in in all processes,
since they are calling each other.

	NOTE - The multiplexing of the channels is done (of cause) using
	select() and poll(), respectively.

The best solution of this problem would be, if the sending part of
call_rpc() could be separated from the recieving part. I.e. a function

	call_rpc_nowait( ......, returnHandler )

could be constructed that returns immediately after the sending of the rpc
is done and the return handler is called after a select indicates that
data (i.e.  the reply) is to be expected on the corresponding channel.

A fork of the calling process for processing the rpc is not possible since
I have already a couple of processes interworking and I don't want to
stress this resource too much (BTW, I believe this should not be done
anyway except you hace threads :-) ).

Most likely would be to solve the problem with standard functions from the
rpc toolkit, since I have to handle 3 different Unix operatinmg systems
(SINIX -- System 3.2 based --, SCO and Unix System V Release 4.0).

Thanks in advance

	Thomas Oeser

Internet:	thomas%apgraph%ap542@ztivax.siemens.com
Europe:		thomas%apgraph%ap542@ztivax.UUCP
UUCP:		...!uunet!mcsun!unido!ztivax!ap542!apgraph!thomas
Phone:		+ 49 89 636 47537
Fax:		+ 49 89 636 45522
Postal Mail:	Siemens AG, DI AP 313, Carl-Wery-Str. 22, D-8000 Munich 83
		West Germany

lewis@bevsun.bev.lbl.gov (Steve Lewis) (10/08/90)

In article <1990Sep16.130726.16148@rice.edu> thomas%apgraph%ap542@ztivax.siemens.com (Thomas Oeser (DI AP 313)) writes:
>
>I have a set of interworking processes communicating via RPCs. ...
>In this scenario the following situation may happen:
>
>	- Process 1 calls Process 2 and this call returns;
>	- initiated by this call Process 2 continues processing and calls 
>	  Process 1 and waits for a return;
>	- Now, if another cycle of this type is started (i.e. call of 2 by 1)
>	  when process 2 is just calling process 1, both processes hang in
>	  the call_rpc() procedure waiting for the reply of the other process.

I solved this problem in the following way.  If the `timeout' parameter in
clnt_call() holds the value {0,0} then the function returns immediately
with the the status RPC_TIMEDOUT.  It calls only the low-level sendto()
and does not enter the select.  You have sent a simple datagram.

The receiving program eventually does a `call-back'.  The caller's address
can be decoded from the transport handle using svc_getcallerp(x) (although
I redefined it as ``#define svc_getcallerp(x) ((x)->xp_raddr)''.
Alternatively, you can pass your IP address.  You will, in any case, have
to pass (or know) the {version, program} tuple and set up a reverse client
handle.  (Although the former technique seems simpler, it fails if the
caller used clnt_broadcast(); the callee will find his local portmap's
address (which is his own), not the distant caller's.)

Note that it is sufficient for ONE of the two processes to use this
technique; the deadlock will not occur if the other waits for a reply.

	Steve Lewis, Project Leader		SALewis@LBL.gov
	Bevalac Controls Group			Mail Stop 64-121
	Lawrence Berkeley Laboratory		Tel: 415/486-7702
	Berkeley, CA 94720			Fax: 415/486-5788