dpl@cisunx.UUCP (David P. Lithgow) (04/04/89)
To all compiler gurus: I am looking for references to Cray/VAX/FPS (and any other system) and their FORTRAN (or other, perhaps Ada) compilers' ability to detect opportunities for parallelism or vectorization inside and outside the compiler. Since many languages can be used to write code for parallel systems, but they support it outside the compiler (by providing explicit library calls to the programmer to implement it for him/herself), I am not restricting my search. I know of the VAX/VMS PPL$ library routines, and I'd like to find a pointer or two to Cray Micro/Macro tasking, and other compilers' means of detecting parallelism (or vectorizable code). Note that some of the most productive work seems to have gone ahead in VLIW systems... Are there any current Ada compilers which automatically vectorize or parallelize? -David P. Lithgow -- David P. Lithgow Sr. Systems Analy./Pgmr., Univ. of Pittsburgh USENET: {allegra,bellcore,ihpn4!cadre,decvax!idis,psuvax1}!pitt!cisunx!dpl CCnet(DECnet): CISVM{S123}::DPL BITnet(Jnet): DPL@PITTVMS
bron@bronze.SGI.COM (Bron Campbell Nelson) (04/07/89)
In article <17287@cisunx.UUCP>, dpl@cisunx.UUCP (David P. Lithgow) writes: > To all compiler gurus: > > I am looking for references to Cray/VAX/FPS (and any other system) > and their FORTRAN (or other, perhaps Ada) compilers' ability to detect > opportunities for parallelism or vectorization inside and outside the compiler. Silicon Graphics is now building a symmetric multi-processor system. We offer a Fortran compiler option that has automatic parallelism detection. The existing product detects parallelism at the DO loop level, and will run different iterations of the loop on different processors. We have gotten very good results for compute intensive loops. I could speak endlessly on this topic, but I'd start to sound like a commercial. If you have specific questions, feel free to email to me. -- Bron Campbell Nelson bron@sgi.com or possibly ..!ames!sgi!bron These statements are my own, not those of Silicon Graphics.
urjlew@ecsvax.UUCP (Rostyk Lewyckyj) (04/09/89)
In article <17287@cisunx.UUCP>, dpl@cisunx.UUCP (David P. Lithgow) writes:
@ To all compiler gurus:
@
@ I am looking for references to Cray/VAX/FPS (and any other system)
@ and their FORTRAN (or other, perhaps Ada) compilers' ability to detect
@ opportunities for parallelism or vectorization inside and outside the compiler.
@ Since many languages can be used to write code for parallel systems, but they
@ support it outside the compiler (by providing explicit library calls to
@ the programmer to implement it for him/herself), I am not restricting my
@ search.
@
@ I know of the VAX/VMS PPL$ library routines, and I'd like to
@ find a pointer or two to Cray Micro/Macro tasking, and other compilers'
@ means of detecting parallelism (or vectorizable code). Note that some
@ of the most productive work seems to have gone ahead in VLIW systems...
@
@ Are there any current Ada compilers which automatically vectorize or
@ parallelize?
@
@ -David P. Lithgow
@ --
@ David P. Lithgow Sr. Systems Analy./Pgmr., Univ. of Pittsburgh
Hie thee to your local IBM representative and let him inform you
about IBMs parallel FOrtran products for the 3090 supercomputers:
compiler, debugging tools (PTOOL) etc., libraries (ESSL v3.) etc.
Contact the good folk at the Cornell National Supercomputer Facility.
They can tell you about non IBM tools.
Contact Rice University (Dr. Kemeny ?)
Parallelization is a compiler option that the user must ask for.
Just as different levels of optimization, vectorization, or a machine code
listing, must be asked for. In addition to the automatic parallelization there
are IBM extensions to the Fortran language which allow the programmer
to explicitly control parallelism: fork processes, set up private vs
shared strage, synchronize, divide work among available processors,
etc.
I don't think that current IBM 3090 architecture is capable of
scheduling individual arithmetic operations within a single Fortran
statement across multiple processors economically, so the slest
granularity of parallelization is across do loops. i.e. there is
nothing equivalent to CRAY micro/auto tasking.
-----------------------------------------------
Reply-To: Rostyslaw Jarema Lewyckyj
urjlew@ecsvax.UUCP , urjlew@tucc.bitnet
or urjlew@tucc.tucc.edu (ARPA,SURA,NSF etc. internet)
tel. (919)-962-9107
urjlew@ecsvax.UUCP (Rostyk Lewyckyj) (04/11/89)
> In article <17287@cisunx.UUCP>, dpl@cisunx.UUCP (David P. Lithgow) > Sr. Systems Analy./Pgmr., Univ. of Pittsburgh asked > @ > @ for references to Cray/VAX/FPS (and any other system) > @ and their FORTRAN (or other, perhaps Ada) compilers' ability to detect > @ opportunities for parallelism or vectorization inside and outside the compiler. > @ ... > @ I know of the VAX/VMS PPL$ library routines, and I'd like to > @ find a pointer or two to Cray Micro/Macro tasking, and other compilers' > @ means of detecting parallelism (or vectorizable code). > @ -- In article <6790@ecsvax.UUCP>, I urjlew@ecsvax.UUCP (Rostyk Lewyckyj) wrote: > > Hie thee to your local IBM representative and let him inform you > about IBMs parallel FOrtran products for the 3090 supercomputers: > compiler, debugging tools (PTOOL) etc., libraries (ESSL v3.) etc. > Contact Rice University (Dr. Kemeny ?) Should have been - Contact Dr. Ken Kennedy (ken@rice.edu) on whose work IBM's parallel compilers and tools are based. > ........ > statement across multiple processors economically, so the smallest > granularity of parallelization is across do loops. i.e. there is > nothing equivalent to CRAY micro/auto tasking. Actually CRAY micro/auto tasking are also parallelization across DO loops just as on the IBM. I think that microtasking requires specific compiler control statements inserted in the code and autotasking is like a compiler switch. I don't know of any CRAY Fortran language extensions for parallelization. On CRAY YMPs and XMPs the hardware is capable of chaining together operations of the vector processing units so that for a loop such as DO ... I=1,bigN D(I)=A(I)*B(I) + C(I) ...... the addition of the results of A(I)*B(I) to the C(I) is started in the adder pipe before all the multiplications are out of the multiply pile. This gives effective within statement parallelism for even medium length vectors. I don't know how the details of dependance checking are done. Perhaps the compiler analysis for vectorization is enough, and there are no further checks needed for chaining ----------------------------------------------- Reply-To: Rostyslaw Jarema Lewyckyj urjlew@ecsvax.UUCP , urjlew@tucc.bitnet or urjlew@tucc.tucc.edu (ARPA,SURA,NSF etc. internet) tel. (919)-962-9107