[comp.lang.fortran] Parallelizing Techniques

dpl@cisunx.UUCP (David P. Lithgow) (04/04/89)

To all compiler gurus:

	I am looking for references to Cray/VAX/FPS (and any other system)
and their FORTRAN (or other, perhaps Ada) compilers' ability to detect
opportunities for parallelism or vectorization inside and outside the compiler.
Since many languages can be used to write code for parallel systems, but they
support it outside the compiler (by providing explicit library calls to
the programmer to implement it for him/herself), I am not restricting my
search.

	I know of the VAX/VMS PPL$ library routines, and I'd like to
find a pointer or two to Cray Micro/Macro tasking, and other compilers'
means of detecting parallelism (or vectorizable code).  Note that some
of the most productive work seems to have gone ahead in VLIW systems...

	Are there any current Ada compilers which automatically vectorize or
parallelize?

					-David P. Lithgow
--
David P. Lithgow                Sr. Systems Analy./Pgmr., Univ. of Pittsburgh
USENET:  {allegra,bellcore,ihpn4!cadre,decvax!idis,psuvax1}!pitt!cisunx!dpl
CCnet(DECnet): CISVM{S123}::DPL       BITnet(Jnet):  DPL@PITTVMS

bron@bronze.SGI.COM (Bron Campbell Nelson) (04/07/89)

In article <17287@cisunx.UUCP>, dpl@cisunx.UUCP (David P. Lithgow) writes:
> To all compiler gurus:
> 
> 	I am looking for references to Cray/VAX/FPS (and any other system)
> and their FORTRAN (or other, perhaps Ada) compilers' ability to detect
> opportunities for parallelism or vectorization inside and outside the compiler.

Silicon Graphics is now building a symmetric multi-processor system.  We
offer a Fortran compiler option that has automatic parallelism detection.
The existing product detects parallelism at the DO loop level, and will
run different iterations of the loop on different processors.  We have
gotten very good results for compute intensive loops.

I could speak endlessly on this topic, but I'd start to sound like a
commercial.  If you have specific questions, feel free to email to me.

--
Bron Campbell Nelson
bron@sgi.com  or possibly  ..!ames!sgi!bron
These statements are my own, not those of Silicon Graphics.

urjlew@ecsvax.UUCP (Rostyk Lewyckyj) (04/09/89)

In article <17287@cisunx.UUCP>, dpl@cisunx.UUCP (David P. Lithgow) writes:
@ To all compiler gurus:
@ 
@ 	I am looking for references to Cray/VAX/FPS (and any other system)
@ and their FORTRAN (or other, perhaps Ada) compilers' ability to detect
@ opportunities for parallelism or vectorization inside and outside the compiler.
@ Since many languages can be used to write code for parallel systems, but they
@ support it outside the compiler (by providing explicit library calls to
@ the programmer to implement it for him/herself), I am not restricting my
@ search.
@ 
@ 	I know of the VAX/VMS PPL$ library routines, and I'd like to
@ find a pointer or two to Cray Micro/Macro tasking, and other compilers'
@ means of detecting parallelism (or vectorizable code).  Note that some
@ of the most productive work seems to have gone ahead in VLIW systems...
@ 
@ 	Are there any current Ada compilers which automatically vectorize or
@ parallelize?
@ 
@ 					-David P. Lithgow
@ --
@ David P. Lithgow                Sr. Systems Analy./Pgmr., Univ. of Pittsburgh

Hie thee to your local IBM representative and let him inform you
about IBMs parallel FOrtran products for the 3090 supercomputers:
compiler, debugging tools (PTOOL) etc., libraries (ESSL v3.) etc.
Contact the good folk at the Cornell National Supercomputer Facility.
They can tell you about non IBM tools.
Contact Rice University (Dr. Kemeny ?)

Parallelization is a compiler option that the user must ask for. 
Just as different levels of optimization, vectorization, or a machine code 
listing, must be asked for. In addition to the automatic parallelization there
are IBM extensions to the Fortran language which allow the programmer
to explicitly control parallelism: fork processes, set up private vs
shared strage, synchronize, divide work among available processors,
etc.
I don't think that current IBM 3090 architecture is capable of
scheduling individual arithmetic operations within a single Fortran
statement across multiple processors economically, so the slest
granularity of parallelization is across do loops. i.e. there is
nothing equivalent to CRAY micro/auto tasking.
-----------------------------------------------
  Reply-To:  Rostyslaw Jarema Lewyckyj
             urjlew@ecsvax.UUCP ,  urjlew@tucc.bitnet
       or    urjlew@tucc.tucc.edu    (ARPA,SURA,NSF etc. internet)
       tel.  (919)-962-9107

urjlew@ecsvax.UUCP (Rostyk Lewyckyj) (04/11/89)

> In article <17287@cisunx.UUCP>, dpl@cisunx.UUCP (David P. Lithgow) 
> Sr. Systems Analy./Pgmr., Univ. of Pittsburgh asked
> @ 
> @ 	 for references to Cray/VAX/FPS (and any other system)
> @ and their FORTRAN (or other, perhaps Ada) compilers' ability to detect
> @ opportunities for parallelism or vectorization inside and outside the compiler.
> @  ... 
> @ 	I know of the VAX/VMS PPL$ library routines, and I'd like to
> @ find a pointer or two to Cray Micro/Macro tasking, and other compilers'
> @ means of detecting parallelism (or vectorizable code).  
> @ --
In article <6790@ecsvax.UUCP>, I urjlew@ecsvax.UUCP (Rostyk Lewyckyj) wrote:
> 
> Hie thee to your local IBM representative and let him inform you
> about IBMs parallel FOrtran products for the 3090 supercomputers:
> compiler, debugging tools (PTOOL) etc., libraries (ESSL v3.) etc.
  
> Contact Rice University (Dr. Kemeny ?)
   Should have been - Contact Dr. Ken Kennedy (ken@rice.edu) on whose
work IBM's parallel compilers and tools are based.
  
>       ........ 
> statement across multiple processors economically, so the smallest
> granularity of parallelization is across do loops. i.e. there is
> nothing equivalent to CRAY micro/auto tasking.
 Actually CRAY micro/auto tasking are also parallelization across
DO loops just as on the IBM. I think that microtasking requires
specific compiler control statements inserted in the code and
autotasking is like a compiler switch. I don't know of any CRAY
Fortran language extensions for parallelization.
On CRAY YMPs and XMPs the hardware is capable of chaining together
operations of the vector processing units so that for a loop such as
   DO ... I=1,bigN
    D(I)=A(I)*B(I) + C(I)
    ......
the addition of the results of A(I)*B(I) to the C(I) is started
in the adder pipe before all the multiplications are out of the
multiply pile. This gives effective within statement parallelism
for even medium length vectors. I don't know how the details
of dependance checking are done. Perhaps the compiler analysis for 
vectorization is enough, and there are no further checks needed for
chaining
-----------------------------------------------
  Reply-To:  Rostyslaw Jarema Lewyckyj
             urjlew@ecsvax.UUCP ,  urjlew@tucc.bitnet
       or    urjlew@tucc.tucc.edu    (ARPA,SURA,NSF etc. internet)
       tel.  (919)-962-9107