mccalpin@perelandra.cms.udel.edu (John D. McCalpin) (03/08/91)
The recent post of a "Play Linda" implementation using files to hold the tuples got me thinking about trying the same trick in a FORTRAN distributed application. (1) The application is an explicitly integrated, three-dimensional geophysical fluid dynamical model using multiple domains each being integrated by an independent process. On a single cpu, these processes communicate via files, using a code section that looks like: DO TIME=1,INFINITY Integrate the equations one time step OPEN(out_unit,file=out_file,form='unformatted') REWIND(out_unit) ! don't count on this by default! WRITE(out_unit) boundary data for other processes CLOSE(out_unit) ! to force flushing the buffers OPEN(in_unit,file=in_file,form='unformatted') REWIND(in_unit) 8888 READ(in_unit,ERR=8888,END=8888) boundary data from other processes CLOSE(in_unit) Swap time levels END DO Now this is a really simple, and apparently completely portable approach to multi-process synchronization. So How Well Does It Work? This naive approach gives about 95% efficiency for two processes. This is defined as the wall time of the single job doing all the work divided by the wall time of the machine running the two jobs splitting the work. At first glance, 95% looks OK, but it probably is not going to scale well to lots of cpus (where lots means maybe 8 to 16). (2) My next approach was to modify the tight READ loop (statement 8888). This statement is going to waste lots of cpu time waiting to read the data, when it should really release the CPU to the other process. On the Silicon Graphics machine, there is a nifty FORTRAN function called 'sginap', used as follows: OPEN(in_unit,.... REWIND(in_unit) 8888 idummy = sginap(0) READ(in_unit,ERR=8888,END=8888) boundary data CLOSE(in_unit) With an argument of 0, the sginap() function releases control of the cpu to let any process with equal or higher priority take over. Because of the dynamic nature of UNIX priorities, this is not guaranteed to do anything, but the proof is in the pudding.... This modified version of the code runs with an efficiency of in excess of 98% for two processes on the same cpu. The next step is to find out if similar functions are available on other O/S's and to start running the code using NFS to share the files between machines. -- John D. McCalpin mccalpin@perelandra.cms.udel.edu Assistant Professor mccalpin@brahms.udel.edu College of Marine Studies, U. Del. J.MCCALPIN/OMNET
patrick@convex.COM (Patrick F. McGehearty) (03/08/91)
In article <MCCALPIN.91Mar7205707@pereland.cms.udel.edu> mccalpin@perelandra.cms.udel.edu (John D. McCalpin) writes: Some info on a generalized method of using Fortran File I/O to share data in an application. >(1) The application is an explicitly integrated, three-dimensional >geophysical fluid dynamical model using multiple domains each being >integrated by an independent process. On a single cpu, these >processes communicate via files, using a code section that looks >like: > DO TIME=1,INFINITY > Integrate the equations one time step > > OPEN(out_unit,file=out_file,form='unformatted') > REWIND(out_unit) ! don't count on this by default! > WRITE(out_unit) boundary data for other processes > CLOSE(out_unit) ! to force flushing the buffers > > OPEN(in_unit,file=in_file,form='unformatted') > REWIND(in_unit) > 8888 READ(in_unit,ERR=8888,END=8888) boundary data from other processes > CLOSE(in_unit) > > Swap time levels > END DO My questions are: (1) How much data is being written? (kilobytes or megabytes?) If kilobytes, then the data should be able to stay in the buffer cache for most Unix file systems. (2) How long does this "integrate the equations" time step take? The combined answer of 1 & 2 strongly determine whether overhead will dominate computation. If the WRITE/READ can be done in a small fraction ( < 0.01 ) of the integration time, then there is a reasonable opportunity for moderate degrees of parallelism. (3) A properly implemented parallel Unix OS will implicitly provide the functionality of the "sgiap(0)" syscall whenever the READ operation would block. The call should be unnecessary. >This modified version of the code runs with an efficiency of in excess >of 98% for two processes on the same cpu. The next step is to find >out if similar functions are available on other O/S's and to start >running the code using NFS to share the files between machines. NFS has higher overheads and lower throughput than direct mounted disk I/O. If the ratio computed in (2) is small enough, and the granularity of computation is greater than seconds, then great parallelism is available. >-- >John D. McCalpin mccalpin@perelandra.cms.udel.edu >Assistant Professor mccalpin@brahms.udel.edu >College of Marine Studies, U. Del. J.MCCALPIN/OMNET