[comp.lang.fortran] Distributed processing in FORTRAN

mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (03/08/91)

The recent post of a "Play Linda" implementation using files to
hold the tuples got me thinking about trying the same trick in a
FORTRAN distributed application.

(1) The application is an explicitly integrated, three-dimensional
geophysical fluid dynamical model using multiple domains each being
integrated by an independent process.  On a single cpu, these
processes communicate via files, using a code section that looks
like:
	DO TIME=1,INFINITY
	    Integrate the equations one time step

	    OPEN(out_unit,file=out_file,form='unformatted')
	    REWIND(out_unit)		! don't count on this by default!
	    WRITE(out_unit) boundary data for other processes
	    CLOSE(out_unit)		! to force flushing the buffers

	    OPEN(in_unit,file=in_file,form='unformatted')
	    REWIND(in_unit)
 8888	    READ(in_unit,ERR=8888,END=8888) boundary data from other processes
	    CLOSE(in_unit)

	    Swap time levels
	END DO

Now this is a really simple, and apparently completely portable
approach to multi-process synchronization.  So How Well Does It Work?
This naive approach gives about 95% efficiency for two processes.
This is defined as the wall time of the single job doing all the work
divided by the wall time of the machine running the two jobs splitting
the work. At first glance, 95% looks OK, but it probably is not going
to scale well to lots of cpus (where lots means maybe 8 to 16).

(2) My next approach was to modify the tight READ loop (statement 8888).
This statement is going to waste lots of cpu time waiting to read the
data, when it should really release the CPU to the other process.
On the Silicon Graphics machine, there is a nifty FORTRAN function
called 'sginap', used as follows:

	    OPEN(in_unit,....
	    REWIND(in_unit)
 8888	    idummy = sginap(0)
	    READ(in_unit,ERR=8888,END=8888) boundary data
	    CLOSE(in_unit)

With an argument of 0, the sginap() function releases control of the
cpu to let any process with equal or higher priority take over.
Because of the dynamic nature of UNIX priorities, this is not
guaranteed to do anything, but the proof is in the pudding....

This modified version of the code runs with an efficiency of in excess
of 98% for two processes on the same cpu.  The next step is to find
out if similar functions are available on other O/S's and to start
running the code using NFS to share the files between machines.
--
John D. McCalpin			mccalpin@perelandra.cms.udel.edu
Assistant Professor			mccalpin@brahms.udel.edu
College of Marine Studies, U. Del.	J.MCCALPIN/OMNET

patrick@convex.COM (Patrick F. McGehearty) (03/08/91)

In article <MCCALPIN.91Mar7205707@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes:
Some info on a generalized method of using Fortran File I/O
to share data in an application.

>(1) The application is an explicitly integrated, three-dimensional
>geophysical fluid dynamical model using multiple domains each being
>integrated by an independent process.  On a single cpu, these
>processes communicate via files, using a code section that looks
>like:
>	DO TIME=1,INFINITY
>	    Integrate the equations one time step
>
>	    OPEN(out_unit,file=out_file,form='unformatted')
>	    REWIND(out_unit)		! don't count on this by default!
>	    WRITE(out_unit) boundary data for other processes
>	    CLOSE(out_unit)		! to force flushing the buffers
>
>	    OPEN(in_unit,file=in_file,form='unformatted')
>	    REWIND(in_unit)
> 8888	    READ(in_unit,ERR=8888,END=8888) boundary data from other processes
>	    CLOSE(in_unit)
>
>	    Swap time levels
>	END DO


My questions are: 
(1) How much data is being written? (kilobytes or megabytes?)
	If kilobytes, then the data should be able to stay in the
	buffer cache for most Unix file systems.
(2) How long does this "integrate the equations" time step take?
	The combined answer of 1 & 2 strongly determine whether
	overhead will dominate computation.  If the WRITE/READ can
	be done in a small fraction ( < 0.01 ) of the integration
	time, then there is a reasonable opportunity for moderate
	degrees of parallelism.
(3) A properly implemented parallel Unix OS will implicitly provide
	the functionality of the "sgiap(0)" syscall whenever the
	READ operation would block.  The call should be unnecessary.


>This modified version of the code runs with an efficiency of in excess
>of 98% for two processes on the same cpu.  The next step is to find
>out if similar functions are available on other O/S's and to start
>running the code using NFS to share the files between machines.

NFS has higher overheads and lower throughput than direct mounted disk I/O.
If the ratio computed in (2) is small enough, and the granularity of
computation is greater than seconds, then great parallelism is available.

>--
>John D. McCalpin			mccalpin@perelandra.cms.udel.edu
>Assistant Professor			mccalpin@brahms.udel.edu
>College of Marine Studies, U. Del.	J.MCCALPIN/OMNET