jlg@lanl.gov (Jim Giles) (10/20/90)
> [... again, even number of '>' is me, odd number is Dan Bernstein ...] > >> You're the one who insists that users be forced to explicitly monkey >> with the implementation level of data structures. > > QUESTION 4: Has it gotten through your head that nobody's forcing you to > abandon your high-level structures just because pointers exist? See > question 5. Well, you say that here. But you say in your next article: > Jim, nothing in Nemesis that you've described to me requires any > extensions to implement in C; [...] Well? Which is it? Am I to be forced through a 'pointers only' intermediate or not? The modifications I request are much less useful if the code generator part of the compiler cannot directly access and make use of all the semantic constraints appropriate to each data object. The advantage of high-level structures is two-fold: 1) it's easier for the user to write, read, and maintain; 2) it's possible for the compiler to make use of the semantic properties of these higher level structures to generate better code. You want to deprive me of that second advantage. > [... your other article follows with ...] > [...] The programmer can get all the greatness and > glory of the Giles Gaggle, with whatever typechecking and constraints > you want, with at most a C++-like preprocessor. Agreed? To be sure. And, you _could_ remove all the control flow constructs from C except a conditional GOTO and force all users of higher-level control to pass through a preprocessor. Conditional GOTOs can simulate _all_ the "Gaggle" of control constructs that Structured programming fanatics demand. However, for efficiency the compiler would have to be told, or have to rediscover, all the useful semantic properties of the missing control constructs (like proper nesting and so forth). Automatic rediscovery of such properties may be completely intractable. Converting data structures to a 'pointers only' form is the same kind of thing. But it's a _lot_ worse in practice. The lost semantic properties are more numerous, more difficult to recover (even with hints from directives) and have a more severe impact on the quality of code generated than forced switching to GOTOs would have. I don't ask you to preprocess all control constructs into GOTOs, why do you insist that I must preprocess all data structures into pointers? You also have the following self contradiction: > [... requote from above ...] > Jim, nothing in Nemesis that you've described to me requires any > extensions to implement in C; [...] > [... from later in same article ...] > For the compiler to take advantage of un-aliased variables, it either > has to compile several copies of the code with different aliasing > restrictions, or let the programmer talk about aliasing somehow. Convex > lets you selectively specify pointer arguments as unaliased, for > example. Q has an assertion to express the lack of aliasing. I admit > that C as is, while powerful enough to support the Giles Gaggle, may not > be perfectly efficient without some extensions. This is an area where > language design has to advance. Ok, which is it? Either C requires no extensions or C does. The better efficiency of the constructs I've recommended are an inherent _part_ of their purpose. In order to provide this, C _must_ be extended. Since you agree that this is an area where language design has to advance, you must accept the fact that including the structures I recommend (directly in the language - NOT from a preprocessor) is a way to accomplish just such an advance. > [... this is one of Dan's statements that I've deliberately lifted ...] > [... out of context - it is false no matter where in this discussion ...] > {... it appears ...] > > No; I merely see that this is not a human factor issue. *ALL* language design issues are human factor issues. End of chapter 2. J. Giles
djones@megatest.UUCP (Dave Jones) (10/23/90)
From article <66254@lanl.gov), by jlg@lanl.gov (Jim Giles): ) Conditional GOTOs can ) simulate _all_ the "Gaggle" of control constructs that Structured ) programming fanatics demand. However, for efficiency the compiler ) would have to be told, or have to rediscover, all the useful semantic ) properties of the missing control constructs (like proper nesting and ) so forth). Automatic rediscovery of such properties may be completely ) intractable. Where do you get this? I've seen a few compilers, and more than a few compiler books, and unless my memory fails me, every algorithm I've ever seen that takes advantage of the "useful semantic properties" does so by first turning all the control-statements into gotos, then analyzing that. What "rediscovery" are you thinking of that may be "completely intractable"? I'm baffled.
gudeman@cs.arizona.edu (David Gudeman) (10/23/90)
In article <66254@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
]... The advantage of high-level structures is two-fold: ...
] 2) it's possible for
]the compiler to make use of the semantic properties of these higher
]level structures to generate better code....
You should qualify this as "the advantage of the high-level structures
I have been discussing". High-level structures in general are no
easier to implement efficiently --and usually a good deal harder--
than low-level structures. For example run-time type checking,
generalized lookup tables, pattern matching, unification, term
rewriting, and automatic storage management are all much higher level
than the features you have been discussing.
In fact I wouldn't even use the term "high-level" to qualify most of
the things you have discussed. I know it is non-standard, but the
terminology for classifying data structures has not kept up with rest
of the field. I'd call most of your suggestions medium-level.
--
David Gudeman
Department of Computer Science
The University of Arizona gudeman@cs.arizona.edu
Tucson, AZ 85721 noao!arizona!gudeman
jlg@lanl.gov (Jim Giles) (10/23/90)
From article <26692@megaron.cs.arizona.edu>, by gudeman@cs.arizona.edu (David Gudeman): > [...] > You should qualify this as "the advantage of the high-level structures > I have been discussing". High-level structures in general are no > easier to implement efficiently --and usually a good deal harder-- > than low-level structures. For example run-time type checking, > generalized lookup tables, pattern matching, unification, term > rewriting, and automatic storage management are all much higher level > than the features you have been discussing. Fair enough. Too be sure, on of my criteria for selecting features is the knowledge of an efficient implementation that has been around long enough to be solidly reliable. All the features I've mentioned fit this criterion - or, like deliberate aliasing, are sometimes necessary even if inefficient. However, the less efficient modes are deliberately _not_ the default ones. As for the higher-level features you mentioned: when the technology matures, I'm sure that a number of them will become efficient and reliable. When that happens, I'm sure that they too will generally be more efficiently implemented automatically by the compiler than by all but the most adept programmers. And even those _most_ adept will still prefer to use the high-level features except when absolute efficiency is the _only_ criterion for the program. Other than that the ease of maintenance, legibility, or the speed at which the code is available and fully debugged will almost certainly be the overriding factors. > In fact I wouldn't even use the term "high-level" to qualify most of > the things you have discussed. I know it is non-standard, but the > terminology for classifying data structures has not kept up with rest > of the field. I'd call most of your suggestions medium-level. Again, fair enough. The seven data structures I posted are copied almost exactly from C.A.R. Hoare's chapter in "Structured Programming", which was published in 1972. Again: a basic criterion was reliability. Old features have a longer track record and are more likely to meet the criterion. This, by the way, was the criterion Hoare applied to the selection of features as well - 18 years ago! The 'new' features that I've mentioned aren't really new either. The 'aliased' attribute, for example, is just _exactly_ the same semantic object as Pascal pointers - the difference is mainly syntactic (plus the fact that aliasing is quite a lot more constrained than Pascal pointers allow - aliasing is an attribute of specific collections of variables and not a property of the data type). However, nomenclature aside, these features are clearly much higher level than pointers: the lowest level data structuring tool above the bit. J. Giles
jlg@lanl.gov (Jim Giles) (10/24/90)
From article <14271@goofy.megatest.UUCP>, by djones@megatest.UUCP (Dave Jones): > [...] > I've seen a few compilers, and more than a few compiler books, and > unless my memory fails me, every algorithm I've ever seen that takes > advantage of the "useful semantic properties" does so by first turning > all the control-statements into gotos, then analyzing that. What > "rediscovery" are you thinking of that may be "completely intractable"? > I'm baffled. I'm sorry I was not clear on this. Procedure calls are a form of flow control. As such, they can also be replaced by GOTOs. If the procedure call mechanism is replaced with GOTOs (especially with recursive procedures present) the rediscovery of the main flow properties of the program can indeed be very difficult. None the less, you are right in pointing out that recovering the semantic properties of flow control is comparitively easy to the problem of recovering the semantic properties of data structures once information about them has been lost. And indeed, some compilers do throw away _some_ flow control information in order to process the control information in a canonical way. It's hard to imagine doing the same for data structures though. For example (and this is one of the more trivial ones where there _is_ a solution): x(1,:) = x(2,:) This is an array assignment in the Fortran Extended (the new name for Fortran 8x, which became Fortran 90, etc.). The compiler can easily determine that this assignment doesn't alias or overlap the source as destinations - so it can optimize (the statement simply copies the second column into the first). The same statement done pointers might be: p = &x; q = p + N; for (i=0;i<N;i++) *p++ = *q++; Well, clearly the compiler _can_ resolve this, it only requires the solution of a fairly simple Diaphantine problem showing that the 'p's never overlap any of the 'q's during the execution of the loop. Notice, in the C version I have switched the array around so the C code is copying the second _row_ to the first _row_ and is not copying between columns. This is the obvious change since C arrays are row-wise and Fortran's are column-wise. But, suppose for other reasons that the C code _needed_ to do the column move: p = &x; q = p + 1; for (i=0;i<M;i++,p+=M,q+=M) *p = *q; Now the two pointers interleave each other in the loop - the compiler will have a much harder analysis to solve in order to determine that the two are indeed still not overlapped. In the case with arrays, going "against the grain" is as simple as the original case was: x(:,1) = x(:,2) Still easy to determine not to be overlapped. But, if the array notation was simply _preprocessed_ into the pointer notation (as was suggested), the compiler would have to solve the analysis I showed above. The general problem of array indexing in multiple dimensions is not completely intractable, but it is more difficult than the specific case I've shown here. Why force the compiler to do all that work? The time the compiler spends on this could have been used to optimize other parts of the code. The time the compiler _writer_ spends on getting this part of the compiler done could have been spent on developing other language features - or at least improving the compiler's efficiency. J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/06/90)
In article <3652@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > x(1,:) = x(2,:) [ easy to see that there's no aliasing ] [ C version it's much harder, especially with the rows flipped ] Jim, I sympathize with the point you're trying to make, but this has absolutely nothing to do with pointers. If language L has array slicing and array assignment, then it will understand the above statement directly, and can optimize just as easily as Fortran. Pointers never appear. ---Dan
jlg@lanl.gov (Jim Giles) (11/07/90)
From article <1622:Nov606:57:0190@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): | In article <3652@lanl.gov> jlg@lanl.gov (Jim Giles) writes: |> x(1,:) = x(2,:) | [ easy to see that there's no aliasing ] | [ C version it's much harder, especially with the rows flipped ] | | Jim, I sympathize with the point you're trying to make, but this has | absolutely nothing to do with pointers. If language L has array slicing | and array assignment, then it will understand the above statement | directly, and can optimize just as easily as Fortran. Pointers never | appear. However, the context of this discussion is whether to use arrays or to use pointers _instead_. If pointers exist _in_addition_ I could care less as I would almost never use them. Certainly not in a case like this anyway. J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/07/90)
In article <5073@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <1622:Nov606:57:0190@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > | In article <3652@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > |> x(1,:) = x(2,:) > | [ easy to see that there's no aliasing ] > | [ C version it's much harder, especially with the rows flipped ] > | Jim, I sympathize with the point you're trying to make, but this has > | absolutely nothing to do with pointers. If language L has array slicing > | and array assignment, then it will understand the above statement > | directly, and can optimize just as easily as Fortran. Pointers never > | appear. > However, the context of this discussion is whether to use arrays or > to use pointers _instead_. Exactly. That's why your example has nothing to do with this discussion. It should convince people that array slicing and array copying are useful features. I don't think any programmer is going to replace a single language builtin (such as x(1,:) = x(2,:)) with a series of statements, unless he's done some extensive tests showing how poorly the compiler handles the builtin. ---Dan
jlg@lanl.gov (Jim Giles) (11/07/90)
From article <7006:Nov620:42:3990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > In article <5073@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >> |> x(1,:) = x(2,:) >> | [ easy to see that there's no aliasing ] >> | [ C version it's much harder, especially with the rows flipped ] [...] >> However, the context of this discussion is whether to use arrays or >> to use pointers _instead_. > > Exactly. That's why your example has nothing to do with this discussion. It has everything to do with the original context of this thread, as Dan very well knows. The issue was whether C-like pointers were more efficient than Fortran-like arrays. The above example can be cast in that form. > It should convince people that array slicing and array copying are > useful features. I don't think any programmer is going to replace a > single language builtin (such as x(1,:) = x(2,:)) with a series of > statements, unless he's done some extensive tests showing how poorly the > compiler handles the builtin. That's not the point. The code (in existing languages) would have to be written as: Modula 2 C for i := 1 to N do p = &x[1][0]; q = p+N; x[1,i] := x[2,i] for (i=0; i<N; i++) end; *p++ = *q++; (Modula 2 this time - I did Fortran last time.) The pointer/C version is still be harder for the compiler to optimize. To be sure, the smart C compiler would be able to find the fact that this is a disjoint row switch (but with your distrust of complicated optimizers, I would think you would want to avoid forcing this). It is still easy in the Modula loop to tell that the rows are disjoint - but it wouldn't be if the array notation were preprocessed out (as Dan suggested in the article from which I began this thread). J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/08/90)
In article <5088@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <7006:Nov620:42:3990@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > > In article <5073@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > >> |> x(1,:) = x(2,:) [ I claim that this is an argument for array slices and copies, ] [ and has no bearing upon the pointer issue ] > That's not the point. The code (in existing languages) would > have to be written as: You are now *changing* what you said. I agree that your new version is relevant to this thread. I emphasize that your original x(1,:) = x(2,:) example was not. > Modula 2 C > for i := 1 to N do p = &x[1][0]; q = p+N; > x[1,i] := x[2,i] for (i=0; i<N; i++) > end; *p++ = *q++; > (Modula 2 this time - I did Fortran last time.) The pointer/C version is > still be harder for the compiler to optimize. I take back what I said about this being relevant to the thread: your C code is incorrect. I believe that if you stated it correctly, the Convex compiler would be able to optimize it (specifically, vectorize it) the way you want. ---Dan
jlg@lanl.gov (Jim Giles) (11/08/90)
From article <6243:Nov720:29:4090@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > In article <5088@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > [...] >> Modula 2 C >> for i := 1 to N do p = &x[1][0]; q = p+N; >> x[1,i] := x[2,i] for (i=0; i<N; i++) >> end; *p++ = *q++; > [...] > I believe that if you stated it correctly, the Convex compiler would be > able to optimize it (specifically, vectorize it) the way you want. I never claimed that a smart compiler _couldn't_ optimize the above. I gave it originally as an example of something which the compiler would have to do _extra_ work to recover information if the array code were preprocessed into pointers (which you _did_ suggest should be done). I specifically said that a smart compiler _could_ find the information again. With your stand against optimizers though, why do you claim that it's acceptable to put the compiler through all that extra work? J. Giles
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/09/90)
In article <5220@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > From article <6243:Nov720:29:4090@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > > In article <5088@lanl.gov> jlg@lanl.gov (Jim Giles) writes: > > [...] > >> Modula 2 C > >> for i := 1 to N do p = &x[1][0]; q = p+N; > >> x[1,i] := x[2,i] for (i=0; i<N; i++) > >> end; *p++ = *q++; > > [...] > > I believe that if you stated it correctly, the Convex compiler would be > > able to optimize it (specifically, vectorize it) the way you want. > I never claimed that a smart compiler _couldn't_ optimize the above. Fine. Glad we settled that. > I gave it originally as an example of something which the compiler > would have to do _extra_ work to recover information In this case, the compiler does little more than reverse that preprocessing. Hardly a lot of work. > if the array > code were preprocessed into pointers (which you _did_ suggest should > be done). Did I really? Where? I certainly don't believe that all array code should be converted into pointer code. > With your stand against optimizers though, why do you claim that it's > acceptable to put the compiler through all that extra work? There are ranges of acceptability. Given that there isn't any simple alternative, and given that putting i back is so trivial to do correctly, I'm not going to argue against this optimization. What I'd love to have is an automated tool that takes a section of code and a new variable from me (like ``double *p = &x[1][i]'') and adds that variable in, keeping track of changes to i at each step. Also a tool that reduces the code by eliminating an unused variable (like ``i''). Note that the original code is kept around for maintenance; perhaps make or some other tool could ensure consistency. If the optimizer's elementary (array reference) strength reduction were adapted to use this mechanism, programming would be great. If I could then get a symbolic algebra source-code peephole optimizer that understood how to keep track of ``x * exp(y - 3 * z)'' as z changes, programming would be absolutely amazing. If it understood range constraints and optimized using them, splitting code into sections to make convenient constraints come true in one section, I'd almost believe that automatic optimization could compete with hand optimization. ---Dan
jlg@lanl.gov (Jim Giles) (11/10/90)
From article <24487:Nov906:17:2490@kramden.acf.nyu.edu>, by brnstnd@kramden.acf.nyu.edu (Dan Bernstein): > [... pointers copying one row to another of a two-d array) ...] > In this case, the compiler does little more than reverse that > preprocessing. Hardly a lot of work. Not in this trivial, short, and totally unrealistic example. But, when you copy columns in C, the pointers would interleave each other. If you copy diagonals, it's even worse (yes, array notation makes this easier to see too). If the loop is long with a lot of arrays and a lot of expressions, the task becomes harder still. In general it requires the solution of multiple linear Diophantine equations an proving that none of them have coincident solutions. Not impossible, but not the sort of thing you want to do if you don't _have_ to. > [...] >> if the array >> code were preprocessed into pointers (which you _did_ suggest should >> be done). > > Did I really? Where? I certainly don't believe that all array code > should be converted into pointer code. You did really. If you didn't believe it, you shouldn't have said it: > Jim, nothing in Nemesis that you've described to me requires any > extensions to implement in C; [...] > [...] The programmer can get all the greatness and > glory of the Giles Gaggle, with whatever typechecking and constraints > you want, with at most a C++-like preprocessor. Agreed? If this doesn't imply converting all higher structures into pointers, what does it mean? J. Giles