[comp.lang.fortran] Missing the whole point

ajayshah@almaak.usc.edu (Ajay Shah) (12/01/90)

I think this thread is missing the whole point in this debate
in arguing about C vs. Fortran on efficiency alone.  Most of the
people involved in this debate get annual paychecks > $50k
(conservative) and therein lies the major point.  The way things
are going, hardware is doubling in speed every 2 to 2.5 years
while holding costs roughly constant.  Languages and systems
which help me get my next Maximum Likelihood program debugged and
running in the shortest possible time are of essence here, not
grubby differences between optimisers.

Face it: optimisers can give you 2x gains *at best*.  Hardly the
kind of thing to be basing an entire computational strategy on.

The essential reason why I get repelled fortran, dusty decks 
in fortran and the mindset of traditional (read: went to graduate
school before 1975) fortran programmers is the terrible
look-and-feel.  Beautiful (read: efficient on my time)
programming comes with a rich appreciation of how algorithms +
data structures makes programs, in careful attention to the way
modules interact, in fine-tuning the scope and visibility of data
across modules, in building really reuseable modules with sterile
interfaces, etc.  This whole mindset is not supported nor
encouraged by fortran and old fortran code and old fortran
programmers, to whom everything is a problem for a few for loops
and common-equivalence parameter passing.  I've seen
single-file-programs in fortran written two years ago (not the
dark ages of the 60s) of 10000 lines!  (not more than a few
hundred lines of comments, obviously).  That is disastrous, and
that is the essence of the problem to me.

-- 
_______________________________________________________________________________
Ajay Shah, (213)734-3930, ajayshah@usc.edu
                              The more things change, the more they stay insane.
_______________________________________________________________________________

jlg@lanl.gov (Jim Giles) (12/02/90)

From article <28548@usc>, by ajayshah@almaak.usc.edu (Ajay Shah):
> [...]
> The essential reason why I get repelled fortran, dusty decks 
> in fortran and the mindset of traditional (read: went to graduate
> school before 1975) fortran programmers is the terrible
> look-and-feel.  [...]

You are right in changing the subject line.  This has little to do
with the Fortran vs. C debate.  Especially since the look-and-feel
of C is much worse for the problem domain appropriate to Fortran.

> [...]           Beautiful (read: efficient on my time)
> programming comes with a rich appreciation of how algorithms +
> data structures makes programs, in careful attention to the way
> modules interact, in fine-tuning the scope and visibility of data
> across modules, in building really reuseable modules with sterile
> interfaces, etc.  [...]

You are thinking of programming as only a self-directed activity.
If _you_ are the only end user, then your time is the only human
efficiency consideration.  Most programmers are concerned with
providing a useful capability for a large set of end users (including
themselves).  The effect of a slow program on those end users is
too complicated to be simple measured by the additional CPU time
cost.  If a program takes an hour to run, I will use it differently
(and less flexibly) than if it takes only a few minutes to run.
Fast calculations are changing the way people work in all sorts
of problem domains.  Fast simulations, for example, allow designers
to iterate several ideas through the code (basing each new try on
the results of the previous one) and are improving the design of
everything from aircraft to computers.

If there were a widely available language that allowed the programmer
to do all the things you mention above and _still_ generated fast
code, there would be no question about switching to it.  But the
economics of computing still values speed over programmer time
for most of the interesting problem domains.  The increase in
raw speed of computers is not likely to change that much since
this only expands the domain of viable problems to a larger
set.  There are few problems for which code is already as fast
as anyone will ever need.  End-user expectations rise as fast
as the hardware improves.

> [...]                                      I've seen
> single-file-programs in fortran written two years ago (not the
> dark ages of the 60s) of 10000 lines!  (not more than a few
> hundred lines of comments, obviously).  That is disastrous, and
> that is the essence of the problem to me.

This is ambiguous.  A single file may contain any number of
independent Fortran procedures (so your apparent assumption
that the program in question is not modular is faulty).
Further, you didn't tell us what the program _does_, so
we can't determine whether 10000 lines is overly large or
remarkably compact.  Complex problems require large programs:
there is no "Royal road" to good programs.

But, let us assume that you mean that the program was really a single
procedure.  This still doesn't mean that it was disastrous.  It may
have been carefully crafted using all the techniques that you support
and _then_ all the procedure calls were "inlined" for speed.  This is
not a bad technique - it will only disappear when "inlining" becomes
a widely available feature of programming languages.  There is no
a priori reason that Fortran could not be extended to do this (it is
one of the things that Fortran Extended's 'internal' procedures should
be able to do).

In any case, I am in complete agreement that language designers
should try to include features that promote more efficient coding
and maintenance.  But so far, no language is widely available
which does so AND is as efficient as Fortran-like languages.

J. Giles

gl8f@astsun.astro.Virginia.EDU (Greg Lindahl) (12/02/90)

In article <28548@usc> ajayshah@almaak.usc.edu (Ajay Shah) writes:

>The essential reason why I get repelled fortran, dusty decks 
>in fortran and the mindset of traditional (read: went to graduate
>school before 1975) fortran programmers is the terrible
>look-and-feel.

This isn't necessarily true. There are lots of "modern" fortran
programmers out there who write reuseable modules and use modern
programming techniques. What annoys me about this whole discussion are
people who stereotype everyone else, and make general claims which
might be true for them but aren't for others. The "best language" is
relative to not only the problem but the programmer.

>  Beautiful (read: efficient on my time)
>programming comes with a rich appreciation of how algorithms +
>data structures makes programs,

Aglorithms + Data Structures = Quiche

Slogans are just words.

>Face it: optimisers can give you 2x gains *at best*.  Hardly the
>kind of thing to be basing an entire computational strategy on.

This has not been my experience. The thing I like about FORTRAN is
that I can put down a formula in a loop, and I don't have to worry
about little things like vector directives, common sub-expression
elimination, register assignment, pointer aliasing, and unrolling. Not
having to worry about such things makes me a more efficient
programmer.

Your mileage may differ.

shenkin@cunixf.cc.columbia.edu (Peter S. Shenkin) (12/03/90)

In article <28548@usc> ajayshah@almaak.usc.edu (Ajay Shah) writes:
>The essential reason why I get repelled fortran, dusty decks 
>in fortran and the mindset of traditional (read: went to graduate
>school before 1975) fortran programmers is the terrible
>look-and-feel...
>
>....  I've seen
>single-file-programs in fortran written two years ago (not the
>dark ages of the 60s) of 10000 lines!  (not more than a few
>hundred lines of comments, obviously).  That is disastrous, and
>that is the essence of the problem to me.

Well, if that's the essence of the problem, you might as well give up
now.  If you could convince these same guys (who are probably brilliant
engineers and applied mathematicians) to write in C, I would put money on
the proposition that their C code would look like their Fortran code.  
Admittedly, C has a tradition of structured coding (my, how old-fashioned 
that word sounds these days), whereas Fortran does not, but structured
programming can be done in Fortran, and my guess is that most contributors 
to this newsgroup program this way.

Here's *my* idea of what the essence of the problem is.  I would like to
phrase it as a followup to the following dialog, which appeared earlier
in the discussion.

I had written:
>>... the above observations would seem to imply that if the programmer
>>simply restricts him/herself to a Fortran-like "highly optimizable subset"
>>of C, then he/she can expect Fortran-like performance out of any reasonably
>>good C compiler.

gwyn@smoke.brl.mil (Doug Gwyn) replied:
>It doesn't matter whether that is true or not; such crippled programming
>would negate much of the advantage of using C in the first place.  Use
>the right tool for the job and stop worrying about code optimization!

First, I think my idea -- that there might be a "highly optimizable subset"
of C which would give code that runs as fast as Fortran code given current
compiler technology -- has been amply refuted in the subsequent discussion; so
I'm no longer proposing that.

But in response to Doug's comment, I can cite my own experience as follows.
I wrote a package a few years ago that spent 90% of its time doing numerically
intensive calculations, which is where Fortran excels.  However, the code that
coded the numerical functionality only constituted about 10% of the program.
Most of the program involved interacting with the user, doing book-keeping
on the internal data structures and on the state of the program, parsing
user commands, issuing reports, doing io, and so on.  Now, I found C much more 
suitable for this 90% of the code, and when it came down to the other 10%, 
which is where the program spends most of its time, I said, "Hell, I'll 
just do that in C, too."  My alternatives were (1) write the whole thing in 
Fortran, or (2) write just the numerical part in Fortran, or (for completeness
only -- not seriously considered!) (3) use some other language or combination
of languages.  Any of these possibilities would have been possible, but under 
the circumstances I took the path of least resistance, while still wishing I 
could have had my cake and eaten it too.

So what this whole discussion is really about, for me, is, "Isn't there, or
at least couldn't there, be a way for me to have my cake and eat it, too,"
or at least to simultaneously have and eat a larger fraction of it than
is now possible.  Now, I realize, some would answer, "Yes, the answer is
Fortran90," and some might answer "Yes, do inter-language procedural calls,"
and some might answer "Yes, just get people to put conformant arrays and
noalias into C," but whatever the answer is, this is the question, for me.

	-P.
************************f*u*cn*rd*ths*u*cn*gt*a*gd*jb**************************
Peter S. Shenkin, Department of Chemistry, Barnard College, New York, NY  10027
(212)854-1418  shenkin@cunixf.cc.columbia.edu(Internet)  shenkin@cunixf(Bitnet)
***"In scenic New York... where the third world is only a subway ride away."***

tmb@bambleweenie57.ai.mit.edu (Thomas M. Breuel) (12/03/90)

In article <7552@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
   But the economics of computing still values speed over programmer time
   for most of the interesting problem domains.

This is the economics of computing for some people. If you pay
for your Cray time and if your programs take days to run even
on a Cray, then you may be concerned with speeding them up
by a factor of 2 or 3.

In our environment, we use workstations (and the Connection Machine).
Computer time is essentially free. For many applications, it makes
little difference to me whether my program finishes in 1 hour or 1
day, and if I have really serious number crunching to do, I can often
split up the work among a number of workstations. I think many
scientists are in similar situations.

It does make a significant difference, however, how long it takes me
to write the program in the first place, how long it takes me to debug
it, and how confident I can be that it is correct. If I really care
about efficiency, I can identify the short section of code that does
the real work with a profiler and help the compiler in generating
better code (using pragmas, etc.).

   In any case, I am in complete agreement that language designers
   should try to include features that promote more efficient coding
   and maintenance.  But so far, no language is widely available
   which does so AND is as efficient as Fortran-like languages.

The major opportunities for optimization that exist in FORTRAN and
that are absent in many other programming languages are ones that
result from assumptions about the lack of aliasing. These lead
to significant speedups only on machines with some kind of
parallelism.

You can easily add this as a pragma to a language like Modula or Ada.
In many languages, a compiler can also generate separate versions for
the aliased and unaliased case and use runtime dispatching. From a
purely technical point of view, opportunity for optimization is no
reason to stick with FORTRAN, since the difference to other languages
is slight.

From a practical point of view, it is, of course, true that in
most numerically oriented computing environments, FORTRAN compilers
generate the best code. But if you are using workstations, the
situation is often reversed.

jlg@lanl.gov (Jim Giles) (12/03/90)

From article <TMB.90Dec2200247@bambleweenie57.ai.mit.edu>, by tmb@bambleweenie57.ai.mit.edu (Thomas M. Breuel):
> In article <7552@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>    But the economics of computing still values speed over programmer time
>    for most of the interesting problem domains.
>
> [...]
> In our environment, we use workstations (and the Connection Machine).
> Computer time is essentially free. For many applications, it makes
> little difference to me whether my program finishes in 1 hour or 1
> day, and if I have really serious number crunching to do, I can often
> split up the work among a number of workstations. I think many
> scientists are in similar situations.

The subject line of this discussion is still correct although it is now
you who is missing my point.

The cost of computer time is irrelevant compared to the cost of the
time spent by the end-users of a program.  If a program takes an hour
or a day to run, the end-user's use of the code will be different
than if the code runs in minutes or seconds.  If a code requires a
full day to run, a one semester research project can only do (at
most) 126 runs.  (I am disregarding here the possibility of several
runs in parallel on many different workstations.  Your colleague
down the hall _may_ have a lot of idle time he's willing to let
you use or he may have the same ravenous appetite for resources
that you do.  You can't count on being able to split up the work
onto several machines.)

A faster program would permit several runs per day - to try out more
variations of the problem for example (again, you can't necessarily
make these several runs on different machines - even if the other
machines are available - each variation you try is usually suggested
by the results of the previous run).  Or it would permit scaling up
the problem so that the same number of runs would yield more
accuracy or wider applicability.  This lack of flexibility in the
use of slow programs is much more costly than the savings of a few
weeks coding time on the part of the programmer.  Note that the
advantage of a fast code is just as applicable to end-users on
workstations as it is on supercomputers.

You are maintaining that the main cost of computing is the _human_
effort involved and that this is the cost that should be reduced.
I agree.  But my point is that the majority of this human cost is
the productivity of the end-user of a code and not that of the
programmer.  It may be that this is not true in individual research
projects in colleges or small consulting firms where the end-user
and the programmer are the same person and codes run are only once
(or a very few times) before they are rewritten or discarded.  But I
don't agree that this is the domain in which most programmers work.

Note that the above is my only objection to your position.  It is
simply not true that using modern languages and coding practices is
a priori worth a speed degradation in the resulting code.  On the
other hand, I am not saying that programmer productivity is
irrelevant.  After all, most programming _is_ done in high-level
languages and not in assembly (in spite of the speed degradation
involved with that choice).  I do not recommend that Fortran
programmers (or anyone else) just bury their heads in the sand and
refuse to consider new features (and even new languages) which would
make their own jobs easier.  But the decision whether new features
are actually an improvement or not is best made by the programmers
themselves.  After all, only they can balance their end-user's
demand for speed with their own abilities and needs.

john@ghostwheel.unm.edu (John Prentice) (12/03/90)

In article <TMB.90Dec2200247@bambleweenie57.ai.mit.edu> tmb@bambleweenie57.ai.mit.edu (Thomas M. Breuel) writes:

[in responding to a comment by Jim Giles...]
>
>This is the economics of computing for some people. If you pay
>for your Cray time and if your programs take days to run even
>on a Cray, then you may be concerned with speeding them up
>by a factor of 2 or 3.
>
>In our environment, we use workstations (and the Connection Machine).
>Computer time is essentially free. For many applications, it makes
>little difference to me whether my program finishes in 1 hour or 1
>day, and if I have really serious number crunching to do, I can often
>split up the work among a number of workstations. I think many
>scientists are in similar situations.
>

I wish things were so simple.  I run on a SunSparc, a Cray YMP, a
Cray 2, a Connection Machine, and a couple miscellaneous parallel
machines (mostly research machines).  I have plenty of computer time
and in general the actual cost is not the major problem I face with
having jobs run a long time.  The problem is that while a calculation
is crunching away, I am essentially idle waiting for an answer.  I
can't decide what I need to do next until this one finishes.  When
calculations take hours or days, this is a real impediment to my
productivity.  Unfortunately, calculations often take weeks and this
can be a show stopper.  Making codes run faster is a major priority
for any scientific researcher who has long running calculations and
actually looks at his results before proceeding.

John K. Prentice
john@unmfys.unm.edu

tbc@juniper04.cray.com (Tom Craig) (12/04/90)

In article <28548@usc> ajayshah@almaak.usc.edu (Ajay Shah) writes:
>
 [Valid points on the relative importance of programmer productivity
  deleted]
>
>Face it: optimisers can give you 2x gains *at best*.  Hardly the
>kind of thing to be basing an entire computational strategy on.

Hmmm, I have to differ here. On multiple-cpu vector architectures, 
the difference between best case (vector code running on multiple cpus) 
and the worst case (single threaded scalar code) is more like 100x. There 
are still many very interesting problems that can be practically modeled on a 
super-computer ONLY by squeezing every ounce of speed out of the computer.
When the hardware speeds up, there will be new interesting problems that
require that primary attention be given to performance. Scientific 
programmers ALWAYS want 10% more speed that the system can deliver!

 [Valid tirade against bad code deleted in the interest of bandwidth]

This argument doesn't fly either. You can write good code or bad code
in any language. At this point, (at least on Crays), if you want to
get the best performance out of a large numeric code, you either have
to use FORTRAN or restrict yourself to a FORTRAN-like subset of C, in
which case, you might as well use FORTRAN. On second thought, if you're
more comfortable with C, use it with a keen eye to vetorization and
parallelization. It`s just easier to avoid loops that don't vectorize
with FORTRAN.

Regards,
--
   Tom Craig     tbc@cray.com     Insert standard disclaimers here...

john@ghostwheel.unm.edu (John Prentice) (12/06/90)

In article <TMB.90Dec5014523@bambleweenie57.ai.mit.edu> tmb@bambleweenie57.ai.mit.edu (Thomas M. Breuel) writes:
>But it would be nice if people now stated explictly in their code
>when they want vectorization and when they don't want it.

When I first read this, I didn't like this idea because it is extra work
for the programmer.  However, thinking about it, it is perhaps not such
a bad idea.  I would be perhaps more interested in it for parallelism than
vectorization however.

>
>It would be even easier if there were a couple of standard constructs
>in FORTRAN or C that state explicitly to the compiler that a loop may
>be parallelized or vectorized.

Of couse, Cray Fortran has had constructs (the CDIR directives) for telling
the compiler to vectorize a loop since the beginning.  However, in general
I agree.  With regard to expressing parallelism, the people at Myrias had
the easiest expression in Fortran that I have encountered yet.  If you
wanted a parallel do-loop, you said pardo instead of do.  That was it.  They
also had ways to break non do-loops up into parallel processes with similiar
constructs.  There is an article about this in the November 1990 issue of
Supercomputing Review (pages 49 - 51).  I have done a fair amount of work
on the Myrias in the past, though the company just folded a few weeks ago.
The initial system Myrias fielded was very interesting as a research machine
for parallelism, but was terribly slow.  I suspect that was their undoing.
It is a pity because they were way ahead in terms of making it easy to
express parallelism in higher languages (at least that is my impression,
speaking as a computational physicist and not a computer scientist.  If
there are other opinions on that, I would be really interested to learn.
This is a really, really important issue both in what Amparo Corporation
is doing in computational fluid/solid dynamics and in research I am
doing at the University of New Mexico in quantum mechanics).  Any way,
to get back to parallelism, the other way I have seen paralleism expressed
in Fortran is on the Connection Machine where they use the Fortran Extended
array construct to signify that an operation is parallel.  I actually
don't like that way of doing it, though on a SIMD machine like the CM2
it may be reasonable.  I have more experience on MIMD machines.  However,
the main problem I have with the use of the array construct is just that
I can't parallelism loops, only arithmetic expressions (again, remember
this is a SIMD machine, so that is perhaps reasonable).  More importantly
however, it means I have to code differently on this system then on any
other system.  So I end up with a special version of my code just for the
CM2.  With a large production code, that is a real pain and dangerous since
every correction has to be put into every version and tht usually leads to
problems.

I don't know how Cray does parallel constructs.  But I hardly know anyone
who tries to do serious parallel programming on the Cray.  Unless you are
among the chosen few who can get dedicated Cray time, it is not cost effective.
We tried it on the Cray 2 and on the YMP a couple years ago.  We were rarely
able to get all the processors at one time.  Also, you don't have very
many processors.  This comment will no doubt offend some people, but to
be honest, my impression of Cray with regard to parallelism is that they
are really nothing but diletantes in the field.  I don't mean that to
be antagonistic by the way, it is just that I haven't seen Cray as much
of a force in the community.  Perhaps someone can comment on that.  I
do understand they are beginning to take the idea of massive paralleism
more seriously however (according to last Sunday's New York Times).

If C provides more natural ways to express parallelism, then that would
be a major claim in its favor!  I would be very interested to hear more
about this.

A comment and question.  Being new to Usenet, I am not completely familiar
with the various taboos and conventions.  Specifically, alot of the issues
I am raising and that have been raised in recent weeks in this newsgroup
are not specifically Fortran issues (though they usually start that way).
There is a group for parallelism, but so far it has struck me as mostly
being concerned with mechanics, not the issues of how to do numerical
work with parallelism.  Is there a newsgroup devoted just to scientific
computing (languages, methods, etc...) ?  If not, is there interest in
one?  Finally, I strikes me (if there is not a scientific computation
newsgroup) that the Fortran newsgroup is not such a bad place for these
discussions simply because it is read by scientific programmers (which I am
willing to bet most of the other language groups are not).  Comments?

John Prentice
john@unmfys.unm.edu

morreale@bierstadt.scd.ucar.edu (Peter Morreale) (12/07/90)

In article <1990Dec5.182145.2639@ariel.unm.edu>, john@ghostwheel.unm.edu (John Prentice) writes:
> 
> Of couse, Cray Fortran has had constructs (the CDIR directives) for telling
> the compiler to vectorize a loop since the beginning.  However, in general
> I agree.  

    The Cray Fortran compilers will vectorize *every* loop (which meets
    vectorization criteria) by default.  The programmer doesn't need to
    make any modifications to his code.  (although most do to obtain
    increased performance, but non-portable constructs are not used or
    needed)

> With regard to expressing parallelism, the people at Myrias had
> the easiest expression in Fortran that I have encountered yet.  If you
> wanted a parallel do-loop, you said pardo instead of do.  That was it.  They

    Sounds like a very non-portable construct.  The Cray method of
    obtaining parallelism is to add directives which appear as Fortran
    comment cards.   The directives are interpreted by source code
    analyzers and translated into system calls.

> It is a pity because they were way ahead in terms of making it easy to
> express parallelism in higher languages (at least that is my impression,

     With the Cray utility "cf77" I need only specify a flag on the
     command line and I get parallelism introduced into my portable
     code.

     The Cray method is to (with the proper command line flags...) run
     the portable Fortran through various filters which eventually
     re-write the source code with the proper system calls, which is fed
     into the compiler.  Seems very easy for the user.  

     Of course, to increase performance of the code, I would look at the
     source code translation and tune it accordingly.
> 
> I don't know how Cray does parallel constructs.  But I hardly know anyone
> who tries to do serious parallel programming on the Cray.  Unless you are
> among the chosen few who can get dedicated Cray time, it is not cost effective.

     Hummm...  I do question this statement.  If I am executing a code
     on a Cray which only runs for a few minutes, you bet, it's not cost
     effective.

     How about ocean and climate models which run for literally
     *hundreds* of Cray CPU hours?  (say a thousand wallclock hours) If
     I can reduce the turnaround time by a factor of 3 or 4, 6, is it
     worth it?  If I can, perhaps I can increase the resolution of the
     model in the first place.  Does better science result from
     increased resolution?   In addition, I get 64bit results on the
     Cray for every calculation in single precision mode.  Is this
     important?

     In the Cray, I can get subroutines, and/or do loop iterations 
     passed across multiple CPUs.
    
> We tried it on the Cray 2 and on the YMP a couple years ago.  We were rarely
> able to get all the processors at one time.  Also, you don't have very
> many processors.  

     So?  On a Connection machine with 64k processors, you only get the
     parallel region of the code executed on those processors.  The
     serial portion of the code is excecuted on a front-end.  For a
     highly parallel code, you get good results, for a "typical" user
     code, you get front-end speeds.  

> 
> If C provides more natural ways to express parallelism, then that would
> be a major claim in its favor!  I would be very interested to hear more
> about this.
> 

     I'm not a C wizard, but I suspect that with the heavy reliance of
     pointers, it would be difficult to exploite parallelism within the 
     language.  How would you update pointers consistently amongst CPUs?

-PWM

(Comments are my own, no-one else's....)
------------------------------------------------------------------
Peter W. Morreale                  email:  morreale@ncar.ucar.edu
Nat'l Center for Atmos Research    voice:  (303) 497-1293
Scientific Computing Division     
Consulting Office
------------------------------------------------------------------

john@ghostwheel.unm.edu (John Prentice) (12/07/90)

In article <9424@ncar.ucar.edu> morreale@bierstadt.scd.ucar.edu (Peter Morreale) writes:
>In article <1990Dec5.182145.2639@ariel.unm.edu>, john@ghostwheel.unm.edu (John Prentice) writes:
>> 
>> Of couse, Cray Fortran has had constructs (the CDIR directives) for telling
>> the compiler to vectorize a loop since the beginning.  However, in general
>> I agree.  
>
>    The Cray Fortran compilers will vectorize *every* loop (which meets
>    vectorization criteria) by default.  The programmer doesn't need to
>    make any modifications to his code.  (although most do to obtain
>    increased performance, but non-portable constructs are not used or
>    needed)
>

This is true, but the problem is the 
vectorization crieria.  The Cray compiler is much better at sensing when a loop
is vectorizable than it used to be, but one can still construct cases 
where it is unable to resolve what appear to it to be vector dependencies but
which are in fact not.  That is why Cray provides the CDIR directives in
the first place.  An interesting flip side on the Cray is if you have a
short loop.  Often you need to inhibit vectorization because the overhead
exceeds the savings of vectorization.  If you loop with something like
             do 10 i=1,n
the compiler has no way to know that n is small and the loop should not
be vectorized.  You have to go in and tell it what to do by hand.   The
Convex compiler is alot better at vectorizing than the Cray one is by the
way.  It can vectorize nexted do-loops with it for example, something Cray
has never been able to do.  But Cray has never been famous for their
software.

>> With regard to expressing parallelism, the people at Myrias had
>> the easiest expression in Fortran that I have encountered yet.  If you
>> wanted a parallel do-loop, you said pardo instead of do.  That was it.  They
>
>    Sounds like a very non-portable construct.  The Cray method of
>    obtaining parallelism is to add directives which appear as Fortran
>    comment cards.   The directives are interpreted by source code
>    analyzers and translated into system calls.
>

No argument, the Myrias approach is not portable, but it is easy (of course
getting it to run efficiently may not be).  We handled the portability problem
using the C preprocessor to either use a do or pardo depending on the target
computer.

>> 
>> I don't know how Cray does parallel constructs.  But I hardly know anyone
>> who tries to do serious parallel programming on the Cray.  Unless you are
>> among the chosen few who can get dedicated Cray time, it is not cost effective.
>
>     Hummm...  I do question this statement.  If I am executing a code
>     on a Cray which only runs for a few minutes, you bet, it's not cost
>     effective.
>
>     How about ocean and climate models which run for literally
>     *hundreds* of Cray CPU hours?  (say a thousand wallclock hours) If
>     I can reduce the turnaround time by a factor of 3 or 4, 6, is it
>     worth it?  If I can, perhaps I can increase the resolution of the
>     model in the first place.  Does better science result from
>     increased resolution?   In addition, I get 64bit results on the
>     Cray for every calculation in single precision mode.  Is this
>     important?
>
>     In the Cray, I can get subroutines, and/or do loop iterations 
>     passed across multiple CPUs.
>

I have no quarrel with the goal of using parallelism to reduce any measure
of the time required by a calculation.  It has just been my experience (and
that of my colleagues at Sandia) that you don't get there using a Cray unless
the system is idle but for you job.  I do know of people doing it at Los Alamos
and getting good results using the YMP, but again, these people virtually own
those systems.  My point is not whether there is an
advantage to exploting parallelism (quite the opposite!), it is whether the
Cray is the system to do it on.  By the way, our applications take hundreds
of hours of Cray time also and our limitation is not wall clock time, it is
money.  Even using cheap DOE lab Cray computer time, these calculations
often cost us $50,000.  Cray parallelism has not helped there usually because
the only cost advantage it offers is reduction of the memory integral.  We
get charged per processor, so if we have N processors and the wall clock
time is now N times less (which it won't be obviously), the summed CPU time
is the same.  All that has been saved is the memory integral since it is
a shared memory system.  However, we have not been able to get all the 
processors at one time typically (on a crowded system), so we lose out there
too.  The final point I would make, the limitations on resolution are just
as serious due to unavailablity of memory and disk as they are by the 
time it takes to run a calculation.  Our finite difference codes use
meshes with many million cells, each carrying 20 or so variables.  You
don't have to dump very many cycles before you exceed the avaiable disk.
This is a problem facing all big computing that has not been adequately
addressed.

>> We tried it on the Cray 2 and on the YMP a couple years ago.  We were rarely
>> able to get all the processors at one time.  Also, you don't have very
>> many processors.  
>
>     So?  On a Connection machine with 64k processors, you only get the
>     parallel region of the code executed on those processors.  The
>     serial portion of the code is excecuted on a front-end.  For a
>     highly parallel code, you get good results, for a "typical" user
>     code, you get front-end speeds.  
>

Again, no question about it.   However, the sorts of calculations you
quoted earlier (weather, etc...) ARE highly parallelizable.  This is the
same argument people use against vectorization, yet I don't see any
massive rush from the scientific community to abondon it.  For the
"typical" user code, yes, you do get front-end speeds.  But given that
workstations give floating point performance within a factor of 2 to 5 of
a single processor YMP (for non-vectorized code, which the "typical"
user code is), why does the "typical" user need a Cray?  
The Cray is dynamite for the really big calculation that vectorizes like crazy 
or that requires tons of memory, but that is the exception, not the rule for 
"typical" users (who after all are most of the people out there).  The old
comment we used to make at Los Alamos was that 10% of the people in the
lab used 90% of the computing resources. 

John Prentice
Amparo Corporation
Albuquerque, NM

john@unmfys.unm.edu

jlg@lanl.gov (Jim Giles) (12/07/90)

From article <9424@ncar.ucar.edu>, by morreale@bierstadt.scd.ucar.edu (Peter Morreale):
> [...]
>     The Cray Fortran compilers will vectorize *every* loop (which meets
>     vectorization criteria) by default.  The programmer doesn't need to
>     make any modifications to his code.  (although most do to obtain
>     increased performance, but non-portable constructs are not used or
>     needed)

This is not _quite_ true.  It is true that the Cray compilers have
always been able to vectorize _some_ loops automatically.  But, it
took many years for them to come to the degree of effectiveness it
has today.  Further, _some_ loops actually have dependencies, but
some of the algorithms which have such loops are stable if the loop
is vectorized anyway - for such cases, the CDIR$ NODEP directive
exists.  Finally, of course, there is also a directive which tells
the compiler _not_ to vectorize.  So, what you say is true to an extent,
but so was the comment you were responding to.

J. Giles

userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (12/09/90)

In article <9424@ncar.ucar.edu>, morreale@bierstadt.scd.ucar.edu (Peter Morreale) writes:
>In article <1990Dec5.182145.2639@ariel.unm.edu>, john@ghostwheel.unm.edu (John Prentice) writes:
>>
<<<<deletions>>>>
>> With regard to expressing parallelism, the people at Myrias had
>> the easiest expression in Fortran that I have encountered yet.  If you
>> wanted a parallel do-loop, you said pardo instead of do.  That was it.  They
>
>    Sounds like a very non-portable construct.  The Cray method of
>    obtaining parallelism is to add directives which appear as Fortran
>    comment cards.   The directives are interpreted by source code
>    analyzers and translated into system calls.
>
 
With the death of Myrias Corp, PARDO may remain non-portable,
unless someone picks up the torch - there are not many Myrias
machines around. I don't see the construct as *very* non-portable,
though, as the syntax does not differ from that of a DO.
 
-------------------+-------------------------------------------
Al Dunbar          |
Edmonton, Alberta  |  "this mind left intentionally blank"
CANADA             |          - Manuel Writer
-------------------+-------------------------------------------