[comp.lang.fortran] Fortran follies

Paul_L_Schauble@cup.portal.com (05/13/88)

I'm presently working on maintaining a Fortran compiler for a mainframe
computer manufacturer. I've had a few requests lately that I'd like to
throw out for opinions. Flames accepted too. 

The machine in question is a segmented architecture. Each segment has its
own size and read/write permissions. Unfortunately, the hardware only
permits 512 segments visible at one time, so they can't be used for
individual arrays. The compiler has basic scalar optimization and automatic
vectorizing.

The first program is this
     program main
     real a(10000), b(10000)
     ...
     call sub (a, b, 10000)
     ...
     end
     subroutine sub (a, b, n)
     real a(1), b(1)			<-- note dimensions
     do 1 i = 1,n
1    a(i) = complex expression with a(i) and b(i)
     ....

The vectorizer looked at this and said that the maximum size of the array
is 1, therefore the maximum subscript is 1 and the vector temporary needed
in the subroutine only needs to be one word long. Of course, the real
vector size in n.

Second program is this
     program main
     ...
     common /one/ a(1)
     common /two/ space(100 000)
     common /tre/ alast
     ...
c    calculate sizes of sub arrays
     il1 = something
     il2 = something else
     il3 = yet more
c    calculate starting points of sub arrays
     ist1 = 1
     ist2 = ist1 + il1
     ist3 = ist2 + il2
c    call working subroutine
     call subbr (a(ist1), a(ist2), a(ist3), il1, il2, il3)
     ...
     end
     subroutine subbr(x, y, z, ilx, ily, ilz)
     real x(1), y(1), z(1)
c    long calculation using x, y, z as arrays ilx, ily, and ilz long
     ...
     end

It's an interesting attempt at dynamic storage allocation. This is from the
CERN library, which is appearantly popular in Europe. 

My problem is that the compiler puts each common block in its own segment, 
so that all of the references to a produce segment protection faults. 

Now, I know that both of these are non-standard. The last not only assumes
that all of common is in one segment but also assumes the order in which
the common blocks are laid down in memory. The technique would work if used
within a common block.

But they become significant issues to me when the customer bugs my
management to change the compiler to support these programs! They say that
they don't want to change their code.

I wonder of other compiler groups have hit these issues, and, if so, what
have you decided to do about them? Is there really a significant amount of
Fortran code out there that does this type of thing? Is it really possible
to do Fortran on a segmented architecture machine or do prevailing coding
practices rule it out? My thought is that these practices were ruled out of
the standard for very good reasons. But the customer is still always right.

Thanks in advance for any information,
Paul_L_Schauble@cup.portal.com or sun!portal!Paul_L_Schauble
...

mcdonald@uxe.cso.uiuc.edu (05/15/88)

(not including the examples, which are long)

The second program is not only illegal, it is horrible practice
and you ought to forget about it. The first one, though, you had
better get to work right. Dimensioning things "1" when they are passed
as arguments is extremely common. I just read the appropriate sections
of the F77 standard, and I can't tell if it is legal. But if your
compiler won't work on it, your customers have a good reason to be mad.
Why can't the compiler tell how big the arrays are, hence how big
any temporary storage needs to be, from the index range of the DO loop?
If it can't, it is pretty stupid.


Actually, consider this example:

        real x(100000)
        call sub(1000, x(1),x(10000),x(20000))
        ...

        subroutine sub(n,x,y,z)
        dimension x(n), y(n), z(n)
        do 1 i= 1,n
1       z(i)= x(i)+y(i)
         end

This had also better work. I THINK that it is even legal.
Doug McDonald

geoff@desint.UUCP (Geoff Kuenning) (05/15/88)

In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes:

>      subroutine sub (a, b, n)
>      real a(1), b(1)			<-- note dimensions

Since variable dimensions have been part of standard Fortran for over ten
years, there is little excuse for using this older technique.  However, it
used to be very popular, so I suppose the customer has an argument in
expecting the compiler to support it.  Isn't the vectorizer smart enough
to see that the loop overruns the array?

>      common /one/ a(1)
>      common /two/ space(100 000)
>      common /tre/ alast

This is totally unacceptable.  In particular, I have used Fortran compilers
(actually linkers) that created common in order of declaration, and others
(e.g., DEC, I think) that sorted it into alphabetical order.  This code
would not work on a DEC, since "alast" would precede "space".  The standard
explicitly and loudly prohibits assumptions about the order of common.  In
this case, I think you should tell your customer to read the standard and
stuff his program in a certain dark place.
-- 
	Geoff Kuenning   geoff@ITcorp.com   {uunet,trwrb}!desint!geoff

franka@mmintl.UUCP (Frank Adams) (05/17/88)

In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes:
>     real a(1), b(1)			<-- note dimensions

At least one Fortran compiler I have used generated faster code with these
declarations than with the alternative a(n), b(n).  The latter did some
initialization, even when it wasn't used.

I would recommend that you regard a dimension of 1 for an argument as
meaning that the dimension is undefined.  It's not pretty, but it works.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

david@titan.rice.edu (David Callahan) (05/19/88)

In article <50500052@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>
>Actually, consider this example:
>
>        real x(100000)
>        call sub(1000, x(1),x(10000),x(20000))
>        ...
>
>        subroutine sub(n,x,y,z)
>        dimension x(n), y(n), z(n)
>        do 1 i= 1,n
>1       z(i)= x(i)+y(i)
>         end
>
>This had also better work. I THINK that it is even legal.
>Doug McDonald

Very common (probably essential to making libraries) but not legal;

"15.9.3.6 Restrictions on Assoication of Entities.  If a 
subprogram reference causes a dummy argument in the 
referenced subprogram to become associated with another 
dummy argument in the referenced subprogram, neither 
dummy argument may become defined during execution of
that subprogram. For example, if a subroutine is headed by
	SUBROUTINE XYZ(A,B)
and is referend by 
	CALL XYZ (C,C)
then the dummy arguments A and B each become associated
with the same acutal argument C and therefore with each other.
Neither A nor B may become defined during this execution
of the subroutine XYZ or by any procedure referend by ZYX."

david callahan

dik@cwi.nl (Dik T. Winter) (05/19/88)

In article <705@thalia.rice.edu> david@titan.UUCP (David Callahan) writes:
 > In article <50500052@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
 > >
 > >        real x(100000)
 > >        call sub(1000, x(1),x(10000),x(20000))
 > >        ...
 > >
 > >        subroutine sub(n,x,y,z)
 > >        dimension x(n), y(n), z(n)
 > >This had also better work. I THINK that it is even legal.
 > Very common (probably essential to making libraries) but not legal;
 > 
 > "15.9.3.6 Restrictions on Assoication of Entities.  If a 
 > subprogram reference causes a dummy argument in the 
 > referenced subprogram to become associated with another 
 > dummy argument in the referenced subprogram, neither 
 > dummy argument may become defined during execution of
 > that subprogram. For example, if a subroutine is headed by
 > 	SUBROUTINE XYZ(A,B)
 > and is referend by 
 > 	CALL XYZ (C,C)
 > then the dummy arguments A and B each become associated
 > with the same acutal argument C and therefore with each other.
 > Neither A nor B may become defined during this execution
 > of the subroutine XYZ or by any procedure referend by ZYX."
Yes, that is already in the Fortran 66 standard, but does not apply here.
If a dummy argument is an array, the actual argument may be an array or
an array element.  Here we heve the letter case, and the three dummy
arguments are not associated with the same actual arguments (those are
not complete arrays but only sections).

(The general restriction is that aliassing is prohibited for variables
or array elements you are assigning to.  Many optimizers rely on this.)
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

johnl@ima.ISC.COM (John R. Levine) (05/19/88)

In article <705@thalia.rice.edu> david@titan.UUCP (David Callahan) writes:
>In article <50500052@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
>>
>>        real x(100000)
>>        call sub(1000, x(1),x(10000),x(20000))
>>        ...
>>        subroutine sub(n,x,y,z)
>>        dimension x(n), y(n), z(n)
>>	  ...
>Very common (probably essential to making libraries) but not legal;
>
>"15.9.3.6 Restrictions on Association of Entities. If a subprogram reference
>causes a dummy argument in the referenced subprogram to become associated with
>another dummy argument in the referenced subprogram, neither dummy argument
>may become defined during execution of that subprogram. ...

By my reading of the F77 standard, it's perfectly legal to pass disjoint
chunks of an array to a subprogram as separate arguments. 2.14 says that
association means that the same datum may be identified by different symbolic
names. The discussion of association of storage sequences and entities in
17.1.2 and 17.1.3 makes it pretty clear that two arrays are associated iff
their storage overlaps; in this case they don't so 15.9.3.6 doesn't apply.

Intuitively, the restriction in 15.9.3.6 is intended to prohibit argument
aliasing that would break calling sequences that copy in argument values at
call time and copy changed results back before the return. In this case, even
if the arrays were passed by copy/return (pretty unlikely for an array, but
still legal) the code would still work.

Scripturally yours,
-- 
John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869
{ ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.something
Rome fell, Babylon fell, Scarsdale will have its turn.  -G. B. Shaw

karzes@mfci.UUCP (Tom Karzes) (05/19/88)

In article <705@thalia.rice.edu> david@titan.UUCP (David Callahan) writes:
}In article <50500052@uxe.cso.uiuc.edu> mcdonald@uxe.cso.uiuc.edu writes:
}>
}>Actually, consider this example:
}>
}>        real x(100000)
}>        call sub(1000, x(1),x(10000),x(20000))
}>        ...
}>
}>        subroutine sub(n,x,y,z)
}>        dimension x(n), y(n), z(n)
}>        do 1 i= 1,n
}>1       z(i)= x(i)+y(i)
}>         end
}>
}>This had also better work. I THINK that it is even legal.
}>Doug McDonald
}
}Very common (probably essential to making libraries) but not legal;
}
}"15.9.3.6 Restrictions on Assoication of Entities.
}...

No, this is legal Fortran 77.  You are confused about what constitutes
an association of entities.  Just because object A is stored at a
precisely defined position relative to object B does not mean they are
associated.  They must actually overlap to be associated.  See sections
17.1.2 (Association of Storage Sequences) and 17.1.3 (Association of
Entities) for an explanation of association in Fortran 77.  Also note
where, in the example in 17.1.3, they explicitly say that C(1) and C(2)
are not associated with each other (a slightly different case, but it
illustrates the basic point).

ok@quintus.UUCP (Richard A. O'Keefe) (05/21/88)

In article <2852@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes:
> In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes:
> >     real a(1), b(1)			<-- note dimensions
> 
> I would recommend that you regard a dimension of 1 for an argument as
> meaning that the dimension is undefined.  It's not pretty, but it works.

This has never been strictly legal.  Fortran 77, unless I am much mistaken,
has a "proper" way of doing it:  the last (and only the last) dimension of
a formal array parameter may be '*'.  So this declaration should read
	real a(*), b(*)
A Fortran compiler is entitled to generate code to check the actual
subscripts against the declared dimensions.

ssd@sugar.UUCP (Scott Denham) (06/04/88)

In article <1005@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes:
> In article <2852@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes:
> > In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes:
> > I would recommend that you regard a dimension of 1 for an argument as
> > meaning that the dimension is undefined.  It's not pretty, but it works.
> This has never been strictly legal.  Fortran 77, unless I am much mistaken,
> has a "proper" way of doing it:  the last (and only the last) dimension of
> a formal array parameter may be '*'.  So this declaration should read
> 	real a(*), b(*)
> A Fortran compiler is entitled to generate code to check the actual
> subscripts against the declared dimensions.

 Agreed, the RIGHT way to do it in '77 is to use the * - I've always 
disliked the use of anything other than a variable or a * for a dummy
array as it implies information (true array extent) that's very often
not true. A Fortran compiler certainly IS entitled to do run-time sub-
script checking, but I'd hate to see what the impact would be on a real
set of production codes using a high level of subroutine nesting. It's
a great thing to have during development and debugging but it's just not
realistic in many environments.  In the case that subscript checking is
NOT being done, than the best thing the compiler designer can do for the
user is assume that the last (and only the last) dimension of ANY 
array received as an argument is in fact unknown, be it specified as 1,
100, ISIZE, or *. Why break existing code if you can avoid it ???

    Scott S. Denham 
    Western Atlas International
    Houston, TX

ssd@sugar.UUCP (Scott Denham) (06/22/88)

In article <701@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes:
> 
> Assuming all dummy arrays are assumed-size (largest dimension is *) breaks
> vectorisers and optimisers which need to know the array size. (This has to
> do with dependency analysis.)
> 

You make a good point. I have since learned that the most recent version
of IBM's vectorizing compiler makes what is probably the most reasonable
assumtion that can be made: a final dimension of 1 or * on a dummy array
are treated the same; for purposes of vectorization and optimization the
acutal dimension is assumed to be unknown. Any other value is assumed to
be correct. I suppose the rationale is that if the programmer went to the
trouble to put a dimension in there, it is probably meaningful. As it turns
out, this approach is useful for us, or would be if all vector compiler
vendors used the same logic. The only other way to guide the compiler in
making decisions is through the use of directives, and these have no 
standard form at all. Further, an estimate of size is much safer than a
binary VECTOR/NOVECTOR directive, since the boundary will differ on 
different architectures and possibly on different models within the 
same architecture. 

 Scott Denham 
  Western Geophysical
   Houston, TX

smryan@garth.UUCP (Steven Ryan) (06/23/88)

In article <2157@sugar.UUCP> ssd@sugar.UUCP (Scott Denham) writes:
>You make a good point. I have since learned that the most recent version
>of IBM's vectorizing compiler makes what is probably the most reasonable
>assumtion that can be made: a final dimension of 1 or * on a dummy array
>are treated the same; for purposes of vectorization and optimization the
>acutal dimension is assumed to be unknown. Any other value is assumed to
>be correct.

As does the CDC Cyber 205 Fortran for the (?) last year. (I only know when
I coded--the powers that be decided when/if it was released.)

>                      Further, an estimate of size is much safer than a
>binary VECTOR/NOVECTOR directive, since the boundary will differ on 
>different architectures and possibly on different models within the 
>same architecture. 

64 elements for a Cray, 4096 for a Cyber 990, 65535 for a Cyber 205/ETA 10.
I don't know what IBM vectors are like. Is the Hitachi machine (?VPxxxx)
in existent yet?

eugene@pioneer.arpa (Eugene N. Miya) (06/24/88)

In article <777@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
>64 elements for a Cray, 4096 for a Cyber 990, 65535 for a Cyber 205/ETA 10.
>I don't know what IBM vectors are like. Is the Hitachi machine (?VPxxxx)
>in existent yet?

FYI:
IBM 3090 is 128 32-bit elements or 64 64-bit elements.
Flame on:
What burns me up about these figures is that some literature has IBM
making vectors legit (e.g., didn't they invent virtual memory? ;-) "Don't mind
the man behind the curtain") and that 64-elements was determined to be
the best length by sophisticated research (probably market rather than
simulation).  Anyway flame off.
You are confusing the Hitachi and the Fujitsu.
The Hitachi S-810 line is an IBM 370 compatible long vector machine.
I've not run on it.
The Fujitsu VP-200 [also 50, 100, and 400] aka Amdahl 1200 is also
370-compatible and long vectors [not compat] have 65K length vectors
closer to the 205/10s.  They were built and delivered years ago
(82/3).  The VP line is the second most populous supercomputer in the
world.
4K length vectors for the 990 sound interesting.  I should go try one.

dik@cwi.nl (Dik T. Winter) (06/24/88)

In article <10757@ames.arc.nasa.gov> eugene@pioneer.UUCP (Eugene N. Miya) writes:
 > In article <777@garth.UUCP> smryan@garth.UUCP (Steven Ryan) writes:
 > >64 elements for a Cray, 4096 for a Cyber 990, 65535 for a Cyber 205/ETA 10.
...
 > IBM 3090 is 128 32-bit elements or 64 64-bit elements.
...
 > The Hitachi S-810 line is an IBM 370 compatible long vector machine.
 > I've not run on it.
 > The Fujitsu VP-200 [also 50, 100, and 400] aka Amdahl 1200 is also
 > 370-compatible and long vectors [not compat] have 65K length vectors
 > closer to the 205/10s.  They were built and delivered years ago
 > (82/3).  The VP line is the second most populous supercomputer in the
 > world.
 > 4K length vectors for the 990 sound interesting.  I should go try one.
Interesting, but wrong.  512 elements in a vector.  (The vector length
field in an instruction is 12 bits though.)

Further: NEC SX (not IBM compatible) 128 or 256, depending on model,
with vector registers, like the Cray.  This is the fastest supercomputer
in the world.
-- 
dik t. winter, cwi, amsterdam, nederland
INTERNET   : dik@cwi.nl
BITNET/EARN: dik@mcvax

ssd@sugar.UUCP (Scott Denham) (06/24/88)

In article <777@garth.UUCP>, smryan@garth.UUCP writes:
 Lots of stuff deleted..........
> 
> 64 elements for a Cray, 4096 for a Cyber 990, 65535 for a Cyber 205/ETA 10.
> I don't know what IBM vectors are like. Is the Hitachi machine (?VPxxxx)
> in existent yet?

The IBM vectors are 128 elements in the current implementation, but the
architecture definition allows for 16 (I think) to 512; it's done in a
nice way so the compiler doesn't have to KNOW what it is. 

The Amdahl (Fujitsu) VP's have a reconfigurable register section that 
can go from something like 8 regs of 8192 to 256 regs of 256. If the
Hitachi machine is the one being nmarketed here by NAS, it exists, and
they claim some pretty impressive price/performance relative to the IBM
3090's. 

   Scott Denham 

*** None of this has anything to do with my employer... I heard it from
my cat.

eugene@pioneer.arpa (Eugene N. Miya) (06/25/88)

Ah! yes, Dik is right, and what Scott said about the adjustable vector
length is right.  I've run on TOO many of these machines I and I have to
go back and check manuals.  Dik is also
right about noting the NEC SX-2 as the fastest uniprocessor super,
but the original article never brought the SX up.  I wish to thank the
Rice people for HARC access, and the Amdahl people [indirectly,
the 1200 people know nothing of Usenet access].

Another gross generalization from
	^^right?!

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "Mailers?! HA!", "If my mail does not reach you, please accept my apology."
  {uunet,hplabs,ncar,decwrl,allegra,tektronix}!ames!aurora!eugene
  "Send mail, avoid follow-ups.  If enough, I'll summarize."

david@titan.rice.edu (David Callahan) (06/25/88)

In article <2157@sugar.UUCP> ssd@sugar.UUCP (Scott Denham) writes:
>In article <701@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes:
>> 
>> Assuming all dummy arrays are assumed-size (largest dimension is *) breaks
>> vectorisers and optimisers which need to know the array size. (This has to
>> do with dependency analysis.)
>
>You make a good point. 

I'm not sure about that. Vectorizers will only rarely need the largest
dimension since it does not appear in the addressing arithmetic. For that
reason it probably will not be used by the decision procedure which
determines if a pair of references to a particular variable overlap and
so will not influence vectorization. Furthermore, unless the bound
is hardwired as a constant, it won't be very useful anyway. If you
see reduced vectorization it may be due to an assumption that the
dimension is short and hence vectorization would be unprofitable.

David Callahan
Rice University

smryan@garth.UUCP (Steven Ryan) (06/25/88)

In article <3244@s.cc.purdue.edu> ags@s.cc.purdue.edu.UUCP (Dave Seaman) writes:
>>As does the CDC Cyber 205 Fortran ....
>
>Unfortunately the Cyber 205 FTN200 compiler turns out to be nonstandard
>because of this.  You cannot treat an array with final dimension 1 as being
>indistinguishable from an assumed-size array, because the standard says the
>following is legal Fortran .......
>FTN200 used to handle this correctly, but when the change was made so that
>runtime array bounds checking (when enabled) would not apply to dummy
>arrays with a final bound of 1, an undesired side effect was to make code
>like that above fail to compile.  And yes, there are legitimate reasons for
>writing code like this.

Not to disagree. The compiler was changed to make the manager happy.
I would've preferred to make people change 1 to * when that was they meant.

>ags@j.cc.purdue.edu
         --------

John Jackson et al?

-------------------------------------------
The sherrif looks at me and says,
"Whacha doin here, boy?
You'd better get your bags and leave."
It's the same old story,
keeping the customers satisfied....
satisfied.
                  -Paul Simon
                   (the singer not the bowtie)

smryan@garth.UUCP (Steven Ryan) (06/25/88)

In article <10757@ames.arc.nasa.gov> eugene@pioneer.UUCP (Eugene N. Miya) writes:
>FYI:
>IBM 3090 is 128 32-bit elements or 64 64-bit elements.

Thankyou. I really only know about Cray and CDC machines.

>What burns me up about these figures is that some literature has IBM
>making vectors legit (e.g., didn't they invent virtual memory? ;-) "Don't mind
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(I thought it was somebody English like Atlas)

>the man behind the curtain") and that 64-elements was determined to be
>the best length by sophisticated research (probably market rather than
>simulation).

I dislike IBM on general principles. (That is, besides being the other guy.)

So small sounds like vector registers rather than memory to memory vectors.

>4K length vectors for the 990 sound interesting.  I should go try one.

Disclaimer: this is a personal comment without very much knowledge of the
current situation:  The hardware is ready, but I am not sure those dinks
will ever get their *** together and produce a reasonable compiler.

990 is also a memory to memory vector machine. By the way 4K is also the
minimum page size for a 990. So a vector (except gather/scatter) resides
on at most two pages. Isn't that magical?

smryan@garth.UUCP (Steven Ryan) (06/25/88)

In article <2168@sugar.UUCP> ssd@sugar.UUCP (Scott Denham) writes:
>                                                        it's done in a
>nice way so the compiler doesn't have to KNOW what it is. 

Actually, you want the compiler to know if you want really snazzy dependency
analysis. (Ah, yes, see this diophantine equation has a solution for n=xxx.
But my vectors ar only yyy long. Oh, no problem.) Of course nobody has
dependency analysis quite that snazzy.

smryan@garth.UUCP (Steven Ryan) (06/26/88)

>I'm not sure about that. Vectorizers will only rarely need the largest
>dimension since it does not appear in the addressing arithmetic.

It is critical for dependency analysis.

Given a loop like
           for i from m to n
             a[xi]:=f a[yi]
dependency analysis determines if xi=yj for m<=i<j<=n. (which means a
value is computed and the result subsequently used--on a vector machine
the results might still be in flight.) In practice, many subscript functions
x and y have solutions for i<j if they are otherwise unbounded. Hence it
is critical to get good values for m and n. They can be used directly from the
loop, but the resulting expressions may be nasty.

If Cyber 205 Fortran is unable to safely determine recursion with the
actual loop bounds it will try again with array bounds. Hence the assumption
that the array bounds are valid. The fact that the largest dimension does not
affect address is irrelevant--it is iteration size that is needed.

>                                     Furthermore, unless the bound
>is hardwired as a constant, it won't be very useful anyway.

The vectoriser  handles constant bounds as a special case.  It uses symbolic
expressions for loop bounds, array dimensions, and subscript expressions.

>                                                            If you
>see reduced vectorization it may be due to an assumption that the
>dimension is short and hence vectorization would be unprofitable.

The Cyber 205's breakeven vector length is from 20 to 50 elements. To get large
enough vectors the compiler has always concentrated on vectorising a loop nest
rather than the innermost loop. (Cray, Kuck, the Good Folks at Rice only worry
about the innermost loop according to the literature.) So.....

If you have loop nest like,
      for i to m
        scalar := ....
        a[i] := ....
        for j to n
            b[i,j] := ....
        c[i] := scalar + ....

If everything is otherwise vectorisable, the j loop can be vectorised
even if n>hardware vector length by surrounding it with scalar stripmining loop.

If m*n<=hardware vector length, the entire nest can be vectorised. But if
m*n>hardware vector length, the i-loop as written cannot be vectorised. If the
loops are split it is possible, but such a split must correctly handle the
promoted scalar which is defined above the split and used below.

Finally to the point: if m and n are expressions, it difficult or impossible
to compare m*n to the hardware limit. In this case, FTN200 agains hunts for
constant bounds of the array. If it can find an upper bound for m*n less than
65535, it will vectorise the entire loop nest. If greater than 65535 or a
constant upper bound is not known, it can only vectorise the innermost.

smryan@garth.UUCP (Steven Ryan) (06/28/88)

>The Cyber 205's breakeven vector length is from 20 to 50 elements.

[A person asked where this number came from. I really don't know how to respond
personally (I only learned about *f* and *F* by accidents) through this strange
network, so....]

That is the number Arden Hills always gave us. Where did they get? I'm not
sure, but I think it was murkily derived from benchmark tests.

The vector math library routines are rather arcane. They start by checking the
vector length. If less than 20, they use scalar loops unrolled by a factor
of three (the memory handles up to three concurrent load/stores). Otherwise
they use vector instructions.

bct@its63b.ed.ac.uk (B Tompsett) (06/28/88)

In article <800@garth.UUCP>  writes:
>>In article <10757@ames.arc.nasa.gov> eugene@pioneer.UUCP (Eugene N. Miya) writes:
>>What burns me up about these figures is that some literature has IBM
>>making vectors legit (e.g., didn't they invent virtual memory? ;-) "Don't mind
>                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>(I Thought It Was Somebody English Like Atlas)

 Yes. IBM purchased the world patent rights from Manchester University for some
paltry sum. The University though it A good deal at the time :-). After all, 
Government scientists of the day thought that only one or two computers would
ever be needed in the world.

  Brian.
-- 
> Brian Tompsett. Department of Computer Science, University of Edinburgh,
> JCMB, The King's Buildings, Mayfield Road, EDINBURGH, EH9 3JZ, Scotland, U.K.
> Telephone:         +44 31 667 1081 x2711.
> JANET:  bct@uk.ac.ed.ecsvax  ARPA: bct%ed.ecsvax@nss.cs.ucl.ac.uk
> USENET: bct@ecsvax.ed.ac.uk  UUCP: ...!mcvax!ukc!ed.ecsvax!bct
> BITNET: ukacrl.earn!ed.ecsvax!bct or bct%ed.ecsvax@uk.ac

ags@s.cc.purdue.edu (Dave Seaman) (06/28/88)

>>The Cyber 205's breakeven vector length is from 20 to 50 elements.

I have found the breakeven length to vary from about 5 to 50 elements,
depending on the type of operations being performed.  For a simple vector
add, the breakeven length is around 5 or 6.

-- 
Dave Seaman	  					
ags@j.cc.purdue.edu

ssd@sugar.UUCP (Scott Denham) (06/29/88)

In article <10757@ames.arc.nasa.gov>, eugene@pioneer.arpa (Eugene N. Miya) writes:
 
> IBM 3090 is 128 32-bit elements or 64 64-bit elements.
 Well, almost. In fact it is 128 either 16 registers of 32 bit elements or
 8 registers of 64 bit elements. 
  
> The Fujitsu VP-200 [also 50, 100, and 400] aka Amdahl 1200 is also
> 370-compatible and long vectors [not compat] have 65K length vectors
> closer to the 205/10s.  They were built and delivered years ago
> (82/3).  The VP line is the second most populous supercomputer in the
> world.
 That's an interesting statistic - but how many of those VP's are in 
Japanese universities ????  At the time we benchmarked the VP, there 
were very few VP's at unsusidized sites.  And do you consider the 3090
a "supercomputer" in this figure??  I find it hard to beleive that there
are more VP's out there than 3090/VF's (but I could be wrong)
(P.S. I'm not siding with 3090 over VP - we have both and there are pros
and cons to each)


 Scott Denham 
 Western Atlas International

ssd@sugar.UUCP (Scott Denham) (07/01/88)

In article <801@garth.UUCP>, smryan@garth.UUCP writes:

> Actually, you want the compiler to know if you want really snazzy dependency
> analysis. (Ah, yes, see this diophantine equation has a solution for n=xxx.
> But my vectors ar only yyy long. Oh, no problem.) Of course nobody has
> dependency analysis quite that snazzy.

YOW - perhaps it's a good thing that nobody does, too!! I've used those 
sorts of tricks when writing AP microcode and have found that though
they may yield impressive performance when done right, may also lead
to strange and not-so-wonderful things happening when someone get in
there and tweaks a bit. 
 Still, I wouldn't turn down a compiler with that kind of snazzy 
analysis if it were offered!! :}

smryan@garth.UUCP (Steven Ryan) (07/03/88)

>YOW - perhaps it's a good thing that nobody does, too!! I've used those 
>sorts of tricks when writing AP microcode and have found that though
>they may yield impressive performance when done right, may also lead
>to strange and not-so-wonderful things happening when someone get in
>there and tweaks a bit. 

Obviously the compiler and hardware people have to talk to each other.
Because engineers are not willing to make guarentees, this trick is not used.

If the vectoriser is done right, it just means stuffing in an upper bound.
That is already done, in principle, but always with +infinity.