dik@cwi.nl (Dik T. Winter) (02/25/90)
In article <14251@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: > In fact, for programs of any really useful size, passing > array arguments is vital. In this respect, at least, C really _DOESN'T_ > have arrays. > But in this aspect Fortran (at least upto and including 77) does not have arrays either. Let's compare some languages (for a matrix-vector product routine*): Algol 68: .proc matvec = (.ref [,] .real m, .ref [] .real v) .ref [] .real: ( [1 .lwb m:1 .upb m] .real result; .co assert correct bounds .co .if 2 .lwb m < .lwb v .or 2 .upb m > .upb v .then .skip .co error detected! .co .fi .for i .from 1 .lwb m .to 1 .upb m .do .ref .real sum = result[i]; sum:= 0.0; .for j .from 2 .lwb m .to 2 .upb m .do sum += m[i,j] * v[j] .od .od; result ) Ada: with LA_PACK; -- defines types MATRIX, VECTOR etc. use LA_PACK; function MATVEC(M: MATRIX; V: VECTOR) return VECTOR is RESULT: VECTOR(M'FIRST(1) .. M'LAST(1)); SUM: FLOAT; begin -- assert correct bounds if M'FIRST(2) < V'FIRST or else M'LAST(2) > V'LAST then raise ARRAY_BOUNDS_ERROR; end if; for I in M'RANGE(1) loop SUM = 0.0; for J in M'RANGE(2) loop SUM = M(I,J) * V(J) + SUM; end loop; RESULT(I) = SUM; end loop; return RESULT; end MATVEC; Fortran: SUBROUTINE MATVEC(M, V, W, L1M, U1M, L2M, U2M) C NO FUNCTION, FORTRAN CANNOT RETURN ARRAYS INTEGER L1M, U1M, L2M, U2M, I, J REAL M(L1M:U1M,L2M:U2M), V(L2M:U2M), W(L1M:U1M), SUM C NO BOUND CHECKING POSSIBLE DO 20 I = L1M, U1M SUM = 0.0 DO 10 J = L2M, U2M SUM = M(I,J) * V(J) + SUM 10 CONTINUE W(I) = SUM 20 CONTINUE RETURN END C (Fortran style (you can write Fortran in any language!)): void matvec(m, v, w, l1m, u1m, l2m, u2m) /* No function, C cannot return arrays. It can return pointers, but then we need malloc, and how to free the stuff? (Garbage collection of course!) */ int l1m, u1m, l2m, u2m; float *m, *v, *w; #define M(i,j) m[(u2m - l2m + 1) * i + j] #define V(i) v[i] #define W(i) w[i] { int i, j; float sum; /* No bound checking possible. */ for(i = l1m; i <= u1m; i++) { sum = 0.0; for(j = l2m; j <= u2m; j++) { sum += M(i,j) * V(j); } W(i) = sum; } } You can write it in C in C style of course; it looks a bit different. Anyhow, I think I made my point. C and Fortran are not very different with respect to arrays. Except of course that Fortran disallows aliasing, while C does not (hence the Fortran compiler can vectorize the above text, while a C compiler can not). Clearly Algol 68 and Ada, that supports arrays in their fulll glory, are superior with the possibility of array bound checking etc.. Both in Fortran and C a routine receives an address and defines its own indexing function, so both in Fortran and C it is possible to give the array a different shape in the callee when compared to the caller. This is impossible in languages that have complete support for arrays. (And note that one of the most important packages in Fortran, the BLAS, makes much use of it!) But I do not think that is a feature, it looks more like a bug. You might argue that the C define for M(i,j) is complicated. I would agree, Fortran has a simple syntactic description for (essentially) the same definition. If you continue to say that in Fortran it is more clear to the compiler that rows of a matrix are accessed, I am going to scream. Everybody will agree that a good optimizing C compiler will compile the C source to essentially the same code as a good optimizing Fortran compiler does with the Fortran source (everybody, except Piercarlo Grandi of course). -- (*) Yes, I know, this is not the best (vectorizing) code for a matrix- vector product. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
jlg@lambda.UUCP (Jim Giles) (02/27/90)
In article <8836@boring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes: > In article <14251@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: > > In fact, for programs of any really useful size, passing > > array arguments is vital. In this respect, at least, C really _DOESN'T_ > > have arrays. > > > But in this aspect Fortran (at least upto and including 77) does not have > arrays either. Let's compare some languages (for a matrix-vector product > routine*): You are making two errors here. The first is to misrepresent Fortran (see below). The second is to assume that Fortran is a particular favorite of mine. Fortran has many faults. Although none of them are quite as serious as those of C, the language is in need of many improvements. Since you quoted my statement out of context (without even using elipses) you missed an important part of what I said. C converts arrays into pointers AND provides no way to undo the damage!! ** _SOME_ Fortran environments (alright: most) do convert arrays to pointers - unaliased pointers. All fortran environments not only _allow_ you to redeclare your arrays as such in the procedure - but require it! Only the last dimension of an array argument may be left unspecified. **Note: Fortran is deliberately designed so that parameter passing (including arrays) can be carried out as either call-by-reference or as call-by-value/result. So, in a distributed environment, copy-in/out may actually be the _most_ efficient parameter passing mechanism and Fortran is allowed to use it. This dual implementation possibility is why aliasing is prohibited in Fortran (at least, the original reason - better optimization is a _real_ good reason to keep the prohibition). > [...] > Fortran: > SUBROUTINE MATVEC(M, V, W, L1M, U1M, L2M, U2M) > [...] > C NO BOUND CHECKING POSSIBLE This is false. I haven't ever worked with a Fortran environment which didn't do array bounds checking (at least as an option). Of course, the compiler _does_ have to just take the caller's word for it that the array bounds that he passed are correct. In fact, the compiler has to take the caller's word for it that the array really has as many dimensions as the code claims. These problems are corrected in Fortran 90 which passes the whole array descriptor (dimensions, bounds, and all). > [...] > C (Fortran style (you can write Fortran in any language!)): > void matvec(m, v, w, l1m, u1m, l2m, u2m) > [...] > /* No bound checking possible. */ Again false. But this time, the bounds checking must be done 'by hand'. You are still taking the caller's word for it that the bounds were passed correctly and that the number of dimensions is correct. In C, you also have to do all the subscripting 'by hand' - which involves, at least, an extra macro definition for each array argument. (Some C users claim that using macros for this is inefficient and that you should always do the subscript expressions explicitly - presumably because many C compilers miss the strength reduction on the multiplies. I wonder if Piercarlo Grandi ever uses cars or elevators or if his brand of do-it- the-hard-way masochism is limited to programming only :-) Finally, there is no way in C to tell the compiler that your pointers (I mean arrays) are not aliased - here is damage which simply can't be undone. > [...] > and defines its own indexing function, so both in Fortran and C it is > possible to give the array a different shape in the callee when compared > to the caller. This is impossible in languages that have complete > support for arrays. (And note that one of the most important packages > in Fortran, the BLAS, makes much use of it!) But I do not think that is > a feature, it looks more like a bug. I both agree and disagree here. I agree that this a bad thing for the procedure interface to do, but I think that reshaping arrays might turn out to be an important feature. Fortran 90 provides a RESHAPE function for this very purpose. The first public draft of Fortran 8x had a better mechanism using IDENTIFY, but this feature was removed. The new proposal has a much worse mechanism with POINTERs, these are not as useful as IDENTIFY (or even raw subscript expressions) but they do introduce the spectre of aliasing into Fortran in a way which (I think) should have been avoided. > [...] > Everybody will agree that a good optimizing C compiler will compile the > C source to essentially the same code as a good optimizing Fortran compiler > does with the Fortran source (everybody, except Piercarlo Grandi of course). I'm part of 'everybody' and I don't agree. C can never generate code as efficient as Fortran because of aliasing. Period. Aliasing inhibits any optimizations which involve code movement or storage of temporary results. Aliasing inhibits register allocation. I can't, off-hand think of _any_ optimization technique that _isn't_ inhibited in some way by aliasing - on _any_ type of machine. C can't even generate good code as an _option_ since the committee decided to leave 'noalias' out of the standard. I know that I'm beating this aliasing problem to death. But, everytime I think the issue has been resolved, there is another 'C optimizes as well as Fortran' comment. I'll stop as soon as everyone alse does (or as soon as _I'm_ proved wrong - strange things _do_ happen). I'll also stop when I retire, since the issue will no longer matter to me. J. Giles
dik@cwi.nl (Dik T. Winter) (02/28/90)
In article <14255@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: > In article <8836@boring.cwi.nl>, dik@cwi.nl (Dik T. Winter) writes: > > In article <14251@lambda.UUCP> jlg@lambda.UUCP (Jim Giles) writes: > > > In fact, for programs of any really useful size, passing > > > array arguments is vital. In this respect, at least, C really _DOESN'T_ > > > have arrays. > > > > > But in this aspect Fortran (at least upto and including 77) does not have > > arrays either. Let's compare some languages (for a matrix-vector product > > routine*): Note (I will come back to that later) that it is about the support of arrays. > > You are making two errors here. The first is to misrepresent Fortran (see > below). Perhaps; but I do not think so. > The second is to assume that Fortran is a particular favorite of > mine. Fortran has many faults. Although none of them are quite as serious > as those of C, the language is in need of many improvements. I never assumed anything, I just did a comparison. And I really do not know whether Fortran is inherently better than C. > > Since you quoted my statement out of context (without even using elipses) > you missed an important part of what I said. We are in the same boat I think. > C converts arrays into > pointers AND provides no way to undo the damage!! ** _SOME_ Fortran > environments (alright: most) do convert arrays to pointers - unaliased > pointers. All fortran environments not only _allow_ you to redeclare > your arrays as such in the procedure - but require it! Only the last > dimension of an array argument may be left unspecified. C provides a way to undo the damage which is no more proof against abuse as the Fortran way, again, see below. And I know a lot of Fortran environments that do allow you to redeclare your arrays in a procedure as whatever you want. I know this is against language rules, but ... In fact there is no more and no less safety in C than in Fortran. In C, when an array is passed to a procedure, the procedure receives a pointer. You can access through that pointer only elements of the original array; everything else is against the standard (at least ANSI, and disregarding the possibility to calculate the address just beyond the last element). The guarantee of unaliased pointers in Fortran is fine, but is not the essence of support for arrays. There are languages that have fuller support for arrays than Fortran and that disallow aliases (Ada) and others that allow aliases (Algol 60). Note that I hinted in my original article on the possibilities for optimization when aliasing is not allowed. > > **Note: Fortran is deliberately designed so that parameter passing > (including arrays) can be carried out as either call-by-reference or > as call-by-value/result. So, in a distributed environment, copy-in/out > may actually be the _most_ efficient parameter passing mechanism and > Fortran is allowed to use it. This dual implementation possibility is > why aliasing is prohibited in Fortran (at least, the original reason - > better optimization is a _real_ good reason to keep the prohibition). Yup, the same applies to Ada. But they made a clean breast. And they require call-by-value/result for scalars. > > > [...] > > Fortran: > > SUBROUTINE MATVEC(M, V, W, L1M, U1M, L2M, U2M) > > [...] > > C NO BOUND CHECKING POSSIBLE > > This is false. I haven't ever worked with a Fortran environment which > didn't do array bounds checking (at least as an option). Of course, the > compiler _does_ have to just take the caller's word for it that the array > bounds that he passed are correct. In fact, the compiler has to take the > caller's word for it that the array really has as many dimensions as the > code claims. But if the compiler has to assume things, it is clear no checking can be performed! So I stand by my claim, no bound checking is possible, except for local accesses against the locally declared bounds of course, but that is bogus. I know, the routine is correct, but is it called correctly? (ellipses for once) > ... > > C (Fortran style (you can write Fortran in any language!)): > ... > > /* No bound checking possible. */ > > Again false. But this time, the bounds checking must be done 'by hand'. Again it is clear that you think 'bounds checking' is only local; it is not. > ... > In C, you also > have to do all the subscripting 'by hand' - which involves, at least, > an extra macro definition for each array argument. What is the difference with a DIMENSION statement? I know the syntax is a bit less clear; but effectively they are the same. > (Some C users claim > that using macros for this is inefficient and that you should always > do the subscript expressions explicitly - presumably because many C > compilers miss the strength reduction on the multiplies. This is a bogus argument. I know a Fortran compiler that is not able to compile DSQRT correctly. Do I avoid all occurrences of DSQRT? No, I yell at the supplier. (At least, I would yell if it would be worth the hassle, but considering the number of bugs in that compiler...) > I wonder if > Piercarlo Grandi ever uses cars or elevators or if his brand of do-it- > the-hard-way masochism is limited to programming only :-) Let's avoid Piercarlo Grandi. > Finally, > there is no way in C to tell the compiler that your pointers (I mean > arrays) are not aliased - here is damage which simply can't be undone. But that is of course a whole different can of worms. > ... I agree on this. (Oh, it is about RESHAPE and IDENTIFY and so on.) > ... (About generating equivalent code.) Again aliassing is a problem. But the code I presented has no problems with aliasses unless used on multi-processor systems (only innerproducts are used, so vector processors have no problems, unless they are not good at that, and some are). Summarizing: the support for arrays in C is only slightly worse than in Fortran. Problem areas are: 1. Aliassing (as discussed ad nauseum). 2. The pre-ANSI requirement to do everything in double precision, which no -f flag is able to annihilate (strange enough, not yet discussed in this thread). Many C users think that aliassing can be detected by the run time, so that multiple parts of code can be generated depending on the presence of an alias or not. This is not true. If two pointers point to different arrays one may not compere them. If they point to identical arrays one may compare, but there are (generally used) access methods where a complete detection of the absense of aliassing takes as much time as the operation itself. Many Fortran users think that aliassing is allowed and hence do some defensive programming against it (or even state in the documentation that aliassing is allowed). Also this is impossible. Given that two parameters are (against the rules) aliasses of each other, there is no way to detect that. A simple case I encountered was in a long integer package. The routine to subtract numbers 'SUB(A, B, C)' took two input arrays A and B and one output array C. An attempt was made to detect whether A and B where the same. The test failed on at least one system. You may now ask: "what is your favorite language?" I would answer that I do not know. I use the language available to perform the task that has to be done. All languages I used/use have their defects. I think there is nothing to be done about that; I am not in the field of Herman Rubin who proposes a universal language encompassing all current and future hardware instructions directly available. I will do assembler without complaining if need arises (or even if there is no need, but than just for fun). But, for numerical processes, Pascal is inadequate, unless it is an implementation with conformant arrays (Level 2?), Fortran and C are semi-adequate. Algol 68 is very nice, but not readily available (Herman Rubin ought to learn it and get a compiler; he could get all the infix operators he wants), and Algol 60 is pretty good of course. And, need I mention Ada? Yes, she has her good points, but of course also her defects. -- dik t. winter, cwi, amsterdam, nederland dik@cwi.nl
mike@cs.arizona.edu (Mike Coffin) (02/28/90)
From article <8849@boring.cwi.nl>, by dik@cwi.nl (Dik T. Winter): > ... > Many C users think that aliassing can be detected by the run time, so that > multiple parts of code can be generated depending on the presence of an > alias or not. This is not true. If two pointers point to different > arrays one may not compare them. ... The *programmer* is not allowed to compare pointers from different arrays, but it's legal for the *compiler* to generate code to compare them. The compiler is certainly not constrained to generate legal C code. -mike -- Mike Coffin mike@arizona.edu Univ. of Ariz. Dept. of Comp. Sci. {allegra,cmcl2}!arizona!mike Tucson, AZ 85721 (602)621-2858
jhallen@wpi.wpi.edu (Joseph H Allen) (02/28/90)
Here's something else to think about in this discussion... I'm most interested in a language which is close to what you do in assembly language. So as far as arrays... Several points: If the language has dynamic arrays then library functions (possibly inline ones) arn't any slower so the language doesn't need them to be built in. If the language has static arrays, you can simulate them painlessly with macros anyway. In fact, to call fortran library functions from C I just use macros to make arrays which are compatible with fortran's. I don't care about bounds checking. If the language is powerful enough to have bounds checking it should also be powerful enough to have fully dynamic arrays. I'm not considering optimizations you can make by assuming non-aliased arrays (I.E., I know this can be a big improvement, but for most things (and this argument) I don't care) The only advantage of having static arrays built into the language is for type checking purposes. But this isn't valid since you can just make a type 'array' which is really a pointer (or more accurately, an address)... (the nicest array language I've ever used was a version of BASIC which didn't have a DIM statement but which handled arrays of arbitrary dimension and size. I.E., you could just say 'A(1,5,10)=7' and it would allocate the array for you. You could then say 'A(1,5,10,8)=9' and it would make the element A(1,5,10) contain a sub-array... Is there any other language with this power? APL maybe? Or do you have to declare arrays in APL also... (I don't know APL)) (which version BASIC was that you ask? Why, my own... (Joe's slow BASIC) :-) -- "Come on Duke, lets do those crimes" - Debbie "Yeah... Yeah, lets go get sushi... and not pay" - Duke
jlg@lambda.UUCP (Jim Giles) (03/01/90)
From article <8849@boring.cwi.nl>, by dik@cwi.nl (Dik T. Winter): > [...] > > ... > (About generating equivalent code.) > Again aliassing is a problem. But the code I presented has no problems > with aliasses unless used on multi-processor systems (only innerproducts > are used, so vector processors have no problems, unless they are not good > at that, and some are). Actually, not true. At least one form of optimization (loop unrolling) is inhibited by possible aliasing in your example code. Loop unrolling can be effective in speeding up code on purely scalar machines. A friend of mine has an algorithm that runs 15% faster on an IBM/PC when the loops are unrolled once. In your example, the inner loop can be unrolled, but not the outer one. In addition, the vectorization you claim possible won't happen since the operation in the inner loop is a reduction operation (it sums all the elements of a vector into a single scalar). Many vector machines don't have reduction operators in their vector instruction set. For those that do, the instruction is usually nearly as slow as the equivalent scalar loop. The most efficient vector form of your algorithm is to put the summation on the outer loop and vectorize that way. But this would be inhibited by aliasing in the C version. (I reproduce the loops here for anyone who is still trying to follow this argument.) for(i = l1m; i <= u1m; i++) { sum = 0.0; for(j = l2m; j <= u2m; j++) { sum += M(i,j) * V(j); } W(i) = sum; } The faster form is: for(i = l1m; i <= u1m; i++) W(i) = 0.0; for(j = l2m; j <= u2m; j++) { for(i = l1m; i <= u1m; i++) { W(i) += M(i,j) * V(j); } } Here, _both_ the add and the multiply vectorize. But, C can't vectorize this. > [... trying to detect aliasing at run-time ...] > [...] A simple case I encountered was in a long integer package. The > routine to subtract numbers 'SUB(A, B, C)' took two input arrays A and B > and one output array C. An attempt was made to detect whether A and B > where the same. The test failed on at least one system. If A and B are input only, then aliasing in Fortran _IS_ allowed since the subroutine doesn't do anything which would make aliasing cause problems. Aliasing is only a problem if one or the other of the aliased names is the target of an assignment. I can see why they wanted to detect the aliasing - to save time (return zero without actually having to do the subtract). However, failure to detect the aliased inputs will no change the _MEANING_ of the code, only its efficiency. J. Giles
steven@cwi.nl (Steven Pemberton) (03/08/90)
In article <9230@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes: > (the nicest array language I've ever used was a version of BASIC which > didn't have a DIM statement but which handled arrays of arbitrary > dimension and size. I.E., you could just say 'A(1,5,10)=7' and it > would allocate the array for you. You could then say 'A(1,5,10,8)=9' > and it would make the element A(1,5,10) contain a sub-array... Is > there any other language with this power? Sure! ABC, recently posted to the net (but *still* not appeared on binaries.{mac, ibm.pc} and sources.unix - what are those moderators doing?) has the equivalent: - no declaration of arrays, or their sizes - dynamically sized - sparse For instance (>>> is the ABC prompt; it's an interactive language): >>> PUT {} IN tel \ initialises tel to empty >>> PUT 4141 IN tel["Eddy"] >>> PUT 4999 IN tel["Babs"] >>> PUT 7878 IN tel["Lambert"] >>> WRITE tel["Babs"] 4999 >>> WRITE tel \ you can write the whole array {["Babs"]: 4999; ["Eddy"]: 4141; ["Lambert"]: 7878} >>> WRITE keys tel {"Babs"; "Eddy"; "Lambert"} >>> FOR name IN keys tel: WRITE name, ":", tel[name] / Babs: 4999 Eddy: 4141 Lambert: 7878 Similarly with 'multi-dimension' arrays: >>> PUT {} IN value >>> PUT 1 IN value[5, 8] >>> PUT 13 IN value[3, 6] >>> PUT -1 IN value[1, 9] >>> WRITE value[3, 6] 13 >>> WRITE value {[1, 9]: -1; [3, 6]: 13; [5, 8]: 1} >>> WRITE keys value {(1, 9); (3, 6); (5, 8)} Actually the keys here are single values of type 'compound', so you may also say things like >>> PUT 1, 9 IN point >>> WRITE value[point] -1 >>> IF point1 IN keys value: WRITE value[point1] etc. Steven Pemberton, CWI, Amsterdam; steven@cwi.nl "Let us go then you and I/while the night is laid out against the sky/like a smear of mustard on an old pork pie" Nice poem Tom. I have ideas for changes though, why not come over? - Ezra
sommar@enea.se (Erland Sommarskog) (03/10/90)
Joseph H Allen (jhallen@wpi.wpi.edu) writes: > (the nicest array language I've ever used was a version of BASIC which > didn't have a DIM statement but which handled arrays of arbitrary > dimension and size. I.E., you could just say 'A(1,5,10)=7' and it > would allocate the array for you. You could then say 'A(1,5,10,8)=9' > and it would make the element A(1,5,10) contain a sub-array... Is > there any other language with this power? Sounds like you're speaking of MUMPS. Also, in MUMPS you're not restricted to integer indexes. If you want A("This key", "That key") just go for it. And if you put a ^ before the A you have a global variable, which in MUMPS terminolgy means that it is stored on disk. Files is low-level concept never heard of in MUMPS. On the other hand, the language does not allow parameters in subroutine calls... Disclaimer: I worked with MUMPS in 1981. Things may have changed since then. -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
bert@gufalet.UUCP (Bert Bos) (03/20/90)
In article <9230@wpi.wpi.edu> jhallen@wpi.wpi.edu (Joseph H Allen) writes: >(the nicest array language I've ever used was a version of BASIC which didn't >have a DIM statement but which handled arrays of arbitrary dimension and size. >I.E., you could just say 'A(1,5,10)=7' and it would allocate the array for >you. You could then say 'A(1,5,10,8)=9' and it would make the element >A(1,5,10) contain a sub-array... Is there any other language with this power? Awk can do this too (and more...)