Paul_L_Schauble@cup.portal.com (05/13/88)
I'm presently working on maintaining a Fortran compiler for a mainframe computer manufacturer. I've had a few requests lately that I'd like to throw out for opinions. Flames accepted too. The machine in question is a segmented architecture. Each segment has its own size and read/write permissions. Unfortunately, the hardware only permits 512 segments visible at one time, so they can't be used for individual arrays. The compiler has basic scalar optimization and automatic vectorizing. The first program is this program main real a(10000), b(10000) ... call sub (a, b, 10000) ... end subroutine sub (a, b, n) real a(1), b(1) <-- note dimensions do 1 i = 1,n 1 a(i) = complex expression with a(i) and b(i) .... The vectorizer looked at this and said that the maximum size of the array is 1, therefore the maximum subscript is 1 and the vector temporary needed in the subroutine only needs to be one word long. Of course, the real vector size in n. Second program is this program main ... common /one/ a(1) common /two/ space(100 000) common /tre/ alast ... c calculate sizes of sub arrays il1 = something il2 = something else il3 = yet more c calculate starting points of sub arrays ist1 = 1 ist2 = ist1 + il1 ist3 = ist2 + il2 c call working subroutine call subbr (a(ist1), a(ist2), a(ist3), il1, il2, il3) ... end subroutine subbr(x, y, z, ilx, ily, ilz) real x(1), y(1), z(1) c long calculation using x, y, z as arrays ilx, ily, and ilz long ... end It's an interesting attempt at dynamic storage allocation. This is from the CERN library, which is appearantly popular in Europe. My problem is that the compiler puts each common block in its own segment, so that all of the references to a produce segment protection faults. Now, I know that both of these are non-standard. The last not only assumes that all of common is in one segment but also assumes the order in which the common blocks are laid down in memory. The technique would work if used within a common block. But they become significant issues to me when the customer bugs my management to change the compiler to support these programs! They say that they don't want to change their code. I wonder of other compiler groups have hit these issues, and, if so, what have you decided to do about them? Is there really a significant amount of Fortran code out there that does this type of thing? Is it really possible to do Fortran on a segmented architecture machine or do prevailing coding practices rule it out? My thought is that these practices were ruled out of the standard for very good reasons. But the customer is still always right. Thanks in advance for any information, Paul_L_Schauble@cup.portal.com or sun!portal!Paul_L_Schauble ...
geoff@desint.UUCP (Geoff Kuenning) (05/15/88)
In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes: > subroutine sub (a, b, n) > real a(1), b(1) <-- note dimensions Since variable dimensions have been part of standard Fortran for over ten years, there is little excuse for using this older technique. However, it used to be very popular, so I suppose the customer has an argument in expecting the compiler to support it. Isn't the vectorizer smart enough to see that the loop overruns the array? > common /one/ a(1) > common /two/ space(100 000) > common /tre/ alast This is totally unacceptable. In particular, I have used Fortran compilers (actually linkers) that created common in order of declaration, and others (e.g., DEC, I think) that sorted it into alphabetical order. This code would not work on a DEC, since "alast" would precede "space". The standard explicitly and loudly prohibits assumptions about the order of common. In this case, I think you should tell your customer to read the standard and stuff his program in a certain dark place. -- Geoff Kuenning geoff@ITcorp.com {uunet,trwrb}!desint!geoff
franka@mmintl.UUCP (Frank Adams) (05/17/88)
In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes: > real a(1), b(1) <-- note dimensions At least one Fortran compiler I have used generated faster code with these declarations than with the alternative a(n), b(n). The latter did some initialization, even when it wasn't used. I would recommend that you regard a dimension of 1 for an argument as meaning that the dimension is undefined. It's not pretty, but it works. -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
ok@quintus.UUCP (Richard A. O'Keefe) (05/21/88)
In article <2852@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes: > In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes: > > real a(1), b(1) <-- note dimensions > > I would recommend that you regard a dimension of 1 for an argument as > meaning that the dimension is undefined. It's not pretty, but it works. This has never been strictly legal. Fortran 77, unless I am much mistaken, has a "proper" way of doing it: the last (and only the last) dimension of a formal array parameter may be '*'. So this declaration should read real a(*), b(*) A Fortran compiler is entitled to generate code to check the actual subscripts against the declared dimensions.
ssd@sugar.UUCP (Scott Denham) (06/04/88)
In article <1005@cresswell.quintus.UUCP>, ok@quintus.UUCP (Richard A. O'Keefe) writes: > In article <2852@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes: > > In article <5377@cup.portal.com> Paul_L_Schauble@cup.portal.com writes: > > I would recommend that you regard a dimension of 1 for an argument as > > meaning that the dimension is undefined. It's not pretty, but it works. > This has never been strictly legal. Fortran 77, unless I am much mistaken, > has a "proper" way of doing it: the last (and only the last) dimension of > a formal array parameter may be '*'. So this declaration should read > real a(*), b(*) > A Fortran compiler is entitled to generate code to check the actual > subscripts against the declared dimensions. Agreed, the RIGHT way to do it in '77 is to use the * - I've always disliked the use of anything other than a variable or a * for a dummy array as it implies information (true array extent) that's very often not true. A Fortran compiler certainly IS entitled to do run-time sub- script checking, but I'd hate to see what the impact would be on a real set of production codes using a high level of subroutine nesting. It's a great thing to have during development and debugging but it's just not realistic in many environments. In the case that subscript checking is NOT being done, than the best thing the compiler designer can do for the user is assume that the last (and only the last) dimension of ANY array received as an argument is in fact unknown, be it specified as 1, 100, ISIZE, or *. Why break existing code if you can avoid it ??? Scott S. Denham Western Atlas International Houston, TX
ssd@sugar.UUCP (Scott Denham) (06/22/88)
In article <701@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes: > > Assuming all dummy arrays are assumed-size (largest dimension is *) breaks > vectorisers and optimisers which need to know the array size. (This has to > do with dependency analysis.) > You make a good point. I have since learned that the most recent version of IBM's vectorizing compiler makes what is probably the most reasonable assumtion that can be made: a final dimension of 1 or * on a dummy array are treated the same; for purposes of vectorization and optimization the acutal dimension is assumed to be unknown. Any other value is assumed to be correct. I suppose the rationale is that if the programmer went to the trouble to put a dimension in there, it is probably meaningful. As it turns out, this approach is useful for us, or would be if all vector compiler vendors used the same logic. The only other way to guide the compiler in making decisions is through the use of directives, and these have no standard form at all. Further, an estimate of size is much safer than a binary VECTOR/NOVECTOR directive, since the boundary will differ on different architectures and possibly on different models within the same architecture. Scott Denham Western Geophysical Houston, TX
smryan@garth.UUCP (Steven Ryan) (06/23/88)
In article <2157@sugar.UUCP> ssd@sugar.UUCP (Scott Denham) writes: >You make a good point. I have since learned that the most recent version >of IBM's vectorizing compiler makes what is probably the most reasonable >assumtion that can be made: a final dimension of 1 or * on a dummy array >are treated the same; for purposes of vectorization and optimization the >acutal dimension is assumed to be unknown. Any other value is assumed to >be correct. As does the CDC Cyber 205 Fortran for the (?) last year. (I only know when I coded--the powers that be decided when/if it was released.) > Further, an estimate of size is much safer than a >binary VECTOR/NOVECTOR directive, since the boundary will differ on >different architectures and possibly on different models within the >same architecture. 64 elements for a Cray, 4096 for a Cyber 990, 65535 for a Cyber 205/ETA 10. I don't know what IBM vectors are like. Is the Hitachi machine (?VPxxxx) in existent yet?
ssd@sugar.UUCP (Scott Denham) (06/24/88)
In article <777@garth.UUCP>, smryan@garth.UUCP writes: Lots of stuff deleted.......... > > 64 elements for a Cray, 4096 for a Cyber 990, 65535 for a Cyber 205/ETA 10. > I don't know what IBM vectors are like. Is the Hitachi machine (?VPxxxx) > in existent yet? The IBM vectors are 128 elements in the current implementation, but the architecture definition allows for 16 (I think) to 512; it's done in a nice way so the compiler doesn't have to KNOW what it is. The Amdahl (Fujitsu) VP's have a reconfigurable register section that can go from something like 8 regs of 8192 to 256 regs of 256. If the Hitachi machine is the one being nmarketed here by NAS, it exists, and they claim some pretty impressive price/performance relative to the IBM 3090's. Scott Denham *** None of this has anything to do with my employer... I heard it from my cat.
david@titan.rice.edu (David Callahan) (06/25/88)
In article <2157@sugar.UUCP> ssd@sugar.UUCP (Scott Denham) writes: >In article <701@garth.UUCP>, smryan@garth.UUCP (Steven Ryan) writes: >> >> Assuming all dummy arrays are assumed-size (largest dimension is *) breaks >> vectorisers and optimisers which need to know the array size. (This has to >> do with dependency analysis.) > >You make a good point. I'm not sure about that. Vectorizers will only rarely need the largest dimension since it does not appear in the addressing arithmetic. For that reason it probably will not be used by the decision procedure which determines if a pair of references to a particular variable overlap and so will not influence vectorization. Furthermore, unless the bound is hardwired as a constant, it won't be very useful anyway. If you see reduced vectorization it may be due to an assumption that the dimension is short and hence vectorization would be unprofitable. David Callahan Rice University
smryan@garth.UUCP (Steven Ryan) (06/25/88)
In article <3244@s.cc.purdue.edu> ags@s.cc.purdue.edu.UUCP (Dave Seaman) writes: >>As does the CDC Cyber 205 Fortran .... > >Unfortunately the Cyber 205 FTN200 compiler turns out to be nonstandard >because of this. You cannot treat an array with final dimension 1 as being >indistinguishable from an assumed-size array, because the standard says the >following is legal Fortran ....... >FTN200 used to handle this correctly, but when the change was made so that >runtime array bounds checking (when enabled) would not apply to dummy >arrays with a final bound of 1, an undesired side effect was to make code >like that above fail to compile. And yes, there are legitimate reasons for >writing code like this. Not to disagree. The compiler was changed to make the manager happy. I would've preferred to make people change 1 to * when that was they meant. >ags@j.cc.purdue.edu -------- John Jackson et al? ------------------------------------------- The sherrif looks at me and says, "Whacha doin here, boy? You'd better get your bags and leave." It's the same old story, keeping the customers satisfied.... satisfied. -Paul Simon (the singer not the bowtie)
smryan@garth.UUCP (Steven Ryan) (06/25/88)
In article <2168@sugar.UUCP> ssd@sugar.UUCP (Scott Denham) writes: > it's done in a >nice way so the compiler doesn't have to KNOW what it is. Actually, you want the compiler to know if you want really snazzy dependency analysis. (Ah, yes, see this diophantine equation has a solution for n=xxx. But my vectors ar only yyy long. Oh, no problem.) Of course nobody has dependency analysis quite that snazzy.
smryan@garth.UUCP (Steven Ryan) (06/26/88)
>I'm not sure about that. Vectorizers will only rarely need the largest >dimension since it does not appear in the addressing arithmetic. It is critical for dependency analysis. Given a loop like for i from m to n a[xi]:=f a[yi] dependency analysis determines if xi=yj for m<=i<j<=n. (which means a value is computed and the result subsequently used--on a vector machine the results might still be in flight.) In practice, many subscript functions x and y have solutions for i<j if they are otherwise unbounded. Hence it is critical to get good values for m and n. They can be used directly from the loop, but the resulting expressions may be nasty. If Cyber 205 Fortran is unable to safely determine recursion with the actual loop bounds it will try again with array bounds. Hence the assumption that the array bounds are valid. The fact that the largest dimension does not affect address is irrelevant--it is iteration size that is needed. > Furthermore, unless the bound >is hardwired as a constant, it won't be very useful anyway. The vectoriser handles constant bounds as a special case. It uses symbolic expressions for loop bounds, array dimensions, and subscript expressions. > If you >see reduced vectorization it may be due to an assumption that the >dimension is short and hence vectorization would be unprofitable. The Cyber 205's breakeven vector length is from 20 to 50 elements. To get large enough vectors the compiler has always concentrated on vectorising a loop nest rather than the innermost loop. (Cray, Kuck, the Good Folks at Rice only worry about the innermost loop according to the literature.) So..... If you have loop nest like, for i to m scalar := .... a[i] := .... for j to n b[i,j] := .... c[i] := scalar + .... If everything is otherwise vectorisable, the j loop can be vectorised even if n>hardware vector length by surrounding it with scalar stripmining loop. If m*n<=hardware vector length, the entire nest can be vectorised. But if m*n>hardware vector length, the i-loop as written cannot be vectorised. If the loops are split it is possible, but such a split must correctly handle the promoted scalar which is defined above the split and used below. Finally to the point: if m and n are expressions, it difficult or impossible to compare m*n to the hardware limit. In this case, FTN200 agains hunts for constant bounds of the array. If it can find an upper bound for m*n less than 65535, it will vectorise the entire loop nest. If greater than 65535 or a constant upper bound is not known, it can only vectorise the innermost.
smryan@garth.UUCP (Steven Ryan) (06/28/88)
>The Cyber 205's breakeven vector length is from 20 to 50 elements.
[A person asked where this number came from. I really don't know how to respond
personally (I only learned about *f* and *F* by accidents) through this strange
network, so....]
That is the number Arden Hills always gave us. Where did they get? I'm not
sure, but I think it was murkily derived from benchmark tests.
The vector math library routines are rather arcane. They start by checking the
vector length. If less than 20, they use scalar loops unrolled by a factor
of three (the memory handles up to three concurrent load/stores). Otherwise
they use vector instructions.
ags@s.cc.purdue.edu (Dave Seaman) (06/28/88)
>>The Cyber 205's breakeven vector length is from 20 to 50 elements.
I have found the breakeven length to vary from about 5 to 50 elements,
depending on the type of operations being performed. For a simple vector
add, the breakeven length is around 5 or 6.
--
Dave Seaman
ags@j.cc.purdue.edu
ssd@sugar.UUCP (Scott Denham) (07/01/88)
In article <801@garth.UUCP>, smryan@garth.UUCP writes: > Actually, you want the compiler to know if you want really snazzy dependency > analysis. (Ah, yes, see this diophantine equation has a solution for n=xxx. > But my vectors ar only yyy long. Oh, no problem.) Of course nobody has > dependency analysis quite that snazzy. YOW - perhaps it's a good thing that nobody does, too!! I've used those sorts of tricks when writing AP microcode and have found that though they may yield impressive performance when done right, may also lead to strange and not-so-wonderful things happening when someone get in there and tweaks a bit. Still, I wouldn't turn down a compiler with that kind of snazzy analysis if it were offered!! :}
smryan@garth.UUCP (Steven Ryan) (07/03/88)
>YOW - perhaps it's a good thing that nobody does, too!! I've used those >sorts of tricks when writing AP microcode and have found that though >they may yield impressive performance when done right, may also lead >to strange and not-so-wonderful things happening when someone get in >there and tweaks a bit. Obviously the compiler and hardware people have to talk to each other. Because engineers are not willing to make guarentees, this trick is not used. If the vectoriser is done right, it just means stuffing in an upper bound. That is already done, in principle, but always with +infinity.