[comp.software-eng] Fortran follies

Paul_L_Schauble@cup.portal.com (05/13/88)

I'm presently working on maintaining a Fortran compiler for a mainframe
computer manufacturer. I've had a few requests lately that I'd like to
throw out for opinions. Flames accepted too. 

The machine in question is a segmented architecture. Each segment has its
own size and read/write permissions. Unfortunately, the hardware only
permits 512 segments visible at one time, so they can't be used for
individual arrays. The compiler has basic scalar optimization and automatic
vectorizing.

The first program is this
     program main
     real a(10000), b(10000)
     ...
     call sub (a, b, 10000)
     ...
     end
     subroutine sub (a, b, n)
     real a(1), b(1)			<-- note dimensions
     do 1 i = 1,n
1    a(i) = complex expression with a(i) and b(i)
     ....

The vectorizer looked at this and said that the maximum size of the array
is 1, therefore the maximum subscript is 1 and the vector temporary needed
in the subroutine only needs to be one word long. Of course, the real
vector size in n.

Second program is this
     program main
     ...
     common /one/ a(1)
     common /two/ space(100 000)
     common /tre/ alast
     ...
c    calculate sizes of sub arrays
     il1 = something
     il2 = something else
     il3 = yet more
c    calculate starting points of sub arrays
     ist1 = 1
     ist2 = ist1 + il1
     ist3 = ist2 + il2
c    call working subroutine
     call subbr (a(ist1), a(ist2), a(ist3), il1, il2, il3)
     ...
     end
     subroutine subbr(x, y, z, ilx, ily, ilz)
     real x(1), y(1), z(1)
c    long calculation using x, y, z as arrays ilx, ily, and ilz long
     ...
     end

It's an interesting attempt at dynamic storage allocation. This is from the
CERN library, which is appearantly popular in Europe. 

My problem is that the compiler puts each common block in its own segment, 
so that all of the references to a produce segment protection faults. 

Now, I know that both of these are non-standard. The last not only assumes
that all of common is in one segment but also assumes the order in which
the common blocks are laid down in memory. The technique would work if used
within a common block.

But they become significant issues to me when the customer bugs my
management to change the compiler to support these programs! They say that
they don't want to change their code.

I wonder of other compiler groups have hit these issues, and, if so, what
have you decided to do about them? Is there really a significant amount of
Fortran code out there that does this type of thing? Is it really possible
to do Fortran on a segmented architecture machine or do prevailing coding
practices rule it out? My thought is that these practices were ruled out of
the standard for very good reasons. But the customer is still always right.

Thanks in advance for any information,
Paul_L_Schauble@cup.portal.com or sun!portal!Paul_L_Schauble
...

geoff@desint.UUCP (Geoff Kuenning) (05/15/88)

In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes:

>      subroutine sub (a, b, n)
>      real a(1), b(1)			<-- note dimensions

Since variable dimensions have been part of standard Fortran for over ten
years, there is little excuse for using this older technique.  However, it
used to be very popular, so I suppose the customer has an argument in
expecting the compiler to support it.  Isn't the vectorizer smart enough
to see that the loop overruns the array?

>      common /one/ a(1)
>      common /two/ space(100 000)
>      common /tre/ alast

This is totally unacceptable.  In particular, I have used Fortran compilers
(actually linkers) that created common in order of declaration, and others
(e.g., DEC, I think) that sorted it into alphabetical order.  This code
would not work on a DEC, since "alast" would precede "space".  The standard
explicitly and loudly prohibits assumptions about the order of common.  In
this case, I think you should tell your customer to read the standard and
stuff his program in a certain dark place.
-- 
	Geoff Kuenning   geoff@ITcorp.com   {uunet,trwrb}!desint!geoff

franka@mmintl.UUCP (Frank Adams) (05/17/88)

In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes:
>     real a(1), b(1)			<-- note dimensions

At least one Fortran compiler I have used generated faster code with these
declarations than with the alternative a(n), b(n).  The latter did some
initialization, even when it wasn't used.

I would recommend that you regard a dimension of 1 for an argument as
meaning that the dimension is undefined.  It's not pretty, but it works.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

ok@quintus.UUCP (Richard A. O'Keefe) (05/21/88)

In article <2852@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes:
> In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes:
> >     real a(1), b(1)			<-- note dimensions
> 
> I would recommend that you regard a dimension of 1 for an argument as
> meaning that the dimension is undefined.  It's not pretty, but it works.

This has never been strictly legal.  Fortran 77, unless I am much mistaken,
has a "proper" way of doing it:  the last (and only the last) dimension of
a formal array parameter may be '*'.  So this declaration should read
	real a(*), b(*)
A Fortran compiler is entitled to generate code to check the actual
subscripts against the declared dimensions.

ssd@sugar.UUCP (Scott Denham) (06/04/88)

In article <1005@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes:
> In article <2852@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes:
> > In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes:
> > I would recommend that you regard a dimension of 1 for an argument as
> > meaning that the dimension is undefined.  It's not pretty, but it works.
> This has never been strictly legal.  Fortran 77, unless I am much mistaken,
> has a "proper" way of doing it:  the last (and only the last) dimension of
> a formal array parameter may be '*'.  So this declaration should read
> 	real a(*), b(*)
> A Fortran compiler is entitled to generate code to check the actual
> subscripts against the declared dimensions.

 Agreed, the RIGHT way to do it in '77 is to use the * - I've always 
disliked the use of anything other than a variable or a * for a dummy
array as it implies information (true array extent) that's very often
not true. A Fortran compiler certainly IS entitled to do run-time sub-
script checking, but I'd hate to see what the impact would be on a real
set of production codes using a high level of subroutine nesting. It's
a great thing to have during development and debugging but it's just not
realistic in many environments.  In the case that subscript checking is
NOT being done, than the best thing the compiler designer can do for the
user is assume that the last (and only the last) dimension of ANY 
array received as an argument is in fact unknown, be it specified as 1,
100, ISIZE, or *. Why break existing code if you can avoid it ???

    Scott S. Denham 
    Western Atlas International
    Houston, TX

ssd@sugar.UUCP (Scott Denham) (06/22/88)

In article <701@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes:
> 
> Assuming all dummy arrays are assumed-size (largest dimension is *) breaks
> vectorisers and optimisers which need to know the array size. (This has to
> do with dependency analysis.)
> 

You make a good point. I have since learned that the most recent version
of IBM's vectorizing compiler makes what is probably the most reasonable
assumtion that can be made: a final dimension of 1 or * on a dummy array
are treated the same; for purposes of vectorization and optimization the
acutal dimension is assumed to be unknown. Any other value is assumed to
be correct. I suppose the rationale is that if the programmer went to the
trouble to put a dimension in there, it is probably meaningful. As it turns
out, this approach is useful for us, or would be if all vector compiler
vendors used the same logic. The only other way to guide the compiler in
making decisions is through the use of directives, and these have no 
standard form at all. Further, an estimate of size is much safer than a
binary VECTOR/NOVECTOR directive, since the boundary will differ on 
different architectures and possibly on different models within the 
same architecture. 

 Scott Denham 
  Western Geophysical
   Houston, TX

smryan@garth.UUCP (Steven Ryan) (06/23/88)

In article <2157@sugar.UUCP> ssd@sugar.UUCP (Scott Denham) writes:
>You make a good point. I have since learned that the most recent version
>of IBM's vectorizing compiler makes what is probably the most reasonable
>assumtion that can be made: a final dimension of 1 or * on a dummy array
>are treated the same; for purposes of vectorization and optimization the
>acutal dimension is assumed to be unknown. Any other value is assumed to
>be correct.

As does the CDC Cyber 205 Fortran for the (?) last year. (I only know when
I coded--the powers that be decided when/if it was released.)

>                      Further, an estimate of size is much safer than a
>binary VECTOR/NOVECTOR directive, since the boundary will differ on 
>different architectures and possibly on different models within the 
>same architecture. 

64 elements for a Cray, 4096 for a Cyber 990, 65535 for a Cyber 205/ETA 10.
I don't know what IBM vectors are like. Is the Hitachi machine (?VPxxxx)
in existent yet?

ssd@sugar.UUCP (Scott Denham) (06/24/88)

In article <777@garth.UUCP>, smryan@garth.UUCP writes:
 Lots of stuff deleted..........
> 
> 64 elements for a Cray, 4096 for a Cyber 990, 65535 for a Cyber 205/ETA 10.
> I don't know what IBM vectors are like. Is the Hitachi machine (?VPxxxx)
> in existent yet?

The IBM vectors are 128 elements in the current implementation, but the
architecture definition allows for 16 (I think) to 512; it's done in a
nice way so the compiler doesn't have to KNOW what it is. 

The Amdahl (Fujitsu) VP's have a reconfigurable register section that 
can go from something like 8 regs of 8192 to 256 regs of 256. If the
Hitachi machine is the one being nmarketed here by NAS, it exists, and
they claim some pretty impressive price/performance relative to the IBM
3090's. 

   Scott Denham 

*** None of this has anything to do with my employer... I heard it from
my cat.

david@titan.rice.edu (David Callahan) (06/25/88)

In article <2157@sugar.UUCP> ssd@sugar.UUCP (Scott Denham) writes:
>In article <701@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes:
>> 
>> Assuming all dummy arrays are assumed-size (largest dimension is *) breaks
>> vectorisers and optimisers which need to know the array size. (This has to
>> do with dependency analysis.)
>
>You make a good point. 

I'm not sure about that. Vectorizers will only rarely need the largest
dimension since it does not appear in the addressing arithmetic. For that
reason it probably will not be used by the decision procedure which
determines if a pair of references to a particular variable overlap and
so will not influence vectorization. Furthermore, unless the bound
is hardwired as a constant, it won't be very useful anyway. If you
see reduced vectorization it may be due to an assumption that the
dimension is short and hence vectorization would be unprofitable.

David Callahan
Rice University

smryan@garth.UUCP (Steven Ryan) (06/25/88)

In article <3244@s.cc.purdue.edu> ags@s.cc.purdue.edu.UUCP (Dave Seaman) writes:
>>As does the CDC Cyber 205 Fortran ....
>
>Unfortunately the Cyber 205 FTN200 compiler turns out to be nonstandard
>because of this.  You cannot treat an array with final dimension 1 as being
>indistinguishable from an assumed-size array, because the standard says the
>following is legal Fortran .......
>FTN200 used to handle this correctly, but when the change was made so that
>runtime array bounds checking (when enabled) would not apply to dummy
>arrays with a final bound of 1, an undesired side effect was to make code
>like that above fail to compile.  And yes, there are legitimate reasons for
>writing code like this.

Not to disagree. The compiler was changed to make the manager happy.
I would've preferred to make people change 1 to * when that was they meant.

>ags@j.cc.purdue.edu
         --------

John Jackson et al?

-------------------------------------------
The sherrif looks at me and says,
"Whacha doin here, boy?
You'd better get your bags and leave."
It's the same old story,
keeping the customers satisfied....
satisfied.
                  -Paul Simon
                   (the singer not the bowtie)

smryan@garth.UUCP (Steven Ryan) (06/25/88)

In article <2168@sugar.UUCP> ssd@sugar.UUCP (Scott Denham) writes:
>                                                        it's done in a
>nice way so the compiler doesn't have to KNOW what it is. 

Actually, you want the compiler to know if you want really snazzy dependency
analysis. (Ah, yes, see this diophantine equation has a solution for n=xxx.
But my vectors ar only yyy long. Oh, no problem.) Of course nobody has
dependency analysis quite that snazzy.

smryan@garth.UUCP (Steven Ryan) (06/26/88)

>I'm not sure about that. Vectorizers will only rarely need the largest
>dimension since it does not appear in the addressing arithmetic.

It is critical for dependency analysis.

Given a loop like
           for i from m to n
             a[xi]:=f a[yi]
dependency analysis determines if xi=yj for m<=i<j<=n. (which means a
value is computed and the result subsequently used--on a vector machine
the results might still be in flight.) In practice, many subscript functions
x and y have solutions for i<j if they are otherwise unbounded. Hence it
is critical to get good values for m and n. They can be used directly from the
loop, but the resulting expressions may be nasty.

If Cyber 205 Fortran is unable to safely determine recursion with the
actual loop bounds it will try again with array bounds. Hence the assumption
that the array bounds are valid. The fact that the largest dimension does not
affect address is irrelevant--it is iteration size that is needed.

>                                     Furthermore, unless the bound
>is hardwired as a constant, it won't be very useful anyway.

The vectoriser  handles constant bounds as a special case.  It uses symbolic
expressions for loop bounds, array dimensions, and subscript expressions.

>                                                            If you
>see reduced vectorization it may be due to an assumption that the
>dimension is short and hence vectorization would be unprofitable.

The Cyber 205's breakeven vector length is from 20 to 50 elements. To get large
enough vectors the compiler has always concentrated on vectorising a loop nest
rather than the innermost loop. (Cray, Kuck, the Good Folks at Rice only worry
about the innermost loop according to the literature.) So.....

If you have loop nest like,
      for i to m
        scalar := ....
        a[i] := ....
        for j to n
            b[i,j] := ....
        c[i] := scalar + ....

If everything is otherwise vectorisable, the j loop can be vectorised
even if n>hardware vector length by surrounding it with scalar stripmining loop.

If m*n<=hardware vector length, the entire nest can be vectorised. But if
m*n>hardware vector length, the i-loop as written cannot be vectorised. If the
loops are split it is possible, but such a split must correctly handle the
promoted scalar which is defined above the split and used below.

Finally to the point: if m and n are expressions, it difficult or impossible
to compare m*n to the hardware limit. In this case, FTN200 agains hunts for
constant bounds of the array. If it can find an upper bound for m*n less than
65535, it will vectorise the entire loop nest. If greater than 65535 or a
constant upper bound is not known, it can only vectorise the innermost.

smryan@garth.UUCP (Steven Ryan) (06/28/88)

>The Cyber 205's breakeven vector length is from 20 to 50 elements.

[A person asked where this number came from. I really don't know how to respond
personally (I only learned about *f* and *F* by accidents) through this strange
network, so....]

That is the number Arden Hills always gave us. Where did they get? I'm not
sure, but I think it was murkily derived from benchmark tests.

The vector math library routines are rather arcane. They start by checking the
vector length. If less than 20, they use scalar loops unrolled by a factor
of three (the memory handles up to three concurrent load/stores). Otherwise
they use vector instructions.

ags@s.cc.purdue.edu (Dave Seaman) (06/28/88)

>>The Cyber 205's breakeven vector length is from 20 to 50 elements.

I have found the breakeven length to vary from about 5 to 50 elements,
depending on the type of operations being performed.  For a simple vector
add, the breakeven length is around 5 or 6.

-- 
Dave Seaman	  					
ags@j.cc.purdue.edu

ssd@sugar.UUCP (Scott Denham) (07/01/88)

In article <801@garth.UUCP>, smryan@garth.UUCP writes:

> Actually, you want the compiler to know if you want really snazzy dependency
> analysis. (Ah, yes, see this diophantine equation has a solution for n=xxx.
> But my vectors ar only yyy long. Oh, no problem.) Of course nobody has
> dependency analysis quite that snazzy.

YOW - perhaps it's a good thing that nobody does, too!! I've used those 
sorts of tricks when writing AP microcode and have found that though
they may yield impressive performance when done right, may also lead
to strange and not-so-wonderful things happening when someone get in
there and tweaks a bit. 
 Still, I wouldn't turn down a compiler with that kind of snazzy 
analysis if it were offered!! :}

smryan@garth.UUCP (Steven Ryan) (07/03/88)

>YOW - perhaps it's a good thing that nobody does, too!! I've used those 
>sorts of tricks when writing AP microcode and have found that though
>they may yield impressive performance when done right, may also lead
>to strange and not-so-wonderful things happening when someone get in
>there and tweaks a bit. 

Obviously the compiler and hardware people have to talk to each other.
Because engineers are not willing to make guarentees, this trick is not used.

If the vectoriser is done right, it just means stuffing in an upper bound.
That is already done, in principle, but always with +infinity.