kent@xanth.cs.odu.edu (Kent Paul Dolan) (02/04/88)
Presley, Thanks for the long note. I have included it verbatim (except for spelling corrections and reformatting) here with my comments. Since the issues discussed are of interest to the larger community, I have taken the liberty of posting the discussion, long as it is. I think the major difference we have is that I am less interested in the performance of compilers (I use, rather than sell them), than I am in the performance of programmers (being one). I see the committee's proposal, flawed though it undoubtedly is, as a major, correct step toward decreasing the life cycle cost of writing and maintaining FORTRAN code. In that larger perspective, taking a hit on compiler performance is pretty small potatoes. Kent, the man from xanth. >From psmith@convex.uucp Wed Feb 3 15:42:07 1988 >To: kent@xanth.cs.odu.edu >Subject: Re: FYI - Thanks for the Input [In response to a previous note of mine - Kent.] >In reality, I tend to agree with you. In fact, IBM, DEC, and others >do also. They have said that the committee has produced a new language. No, the committee has done major surgery on a very ill patient. Not the same thing at all. >I have no problem with letting FORTRAN go the way of ALGOL and others. I have no trouble with someone taking a gun to FORTRAN and saying "Bang, you're dead"; I just don't think it will happen. Letting the language linger on, looking like a viable language, continuing to be taught by FORTRAN using professors to unsuspecting engineering majors, costing probably billions in excessive maintenance costs due to problems which can be remedied only by such radical surgery as proposed by the committee, to this I do object. The problem is not that FORTRAN is dying, the problem is that FORTRAN is not dying, but it is bleeding us white. >When you try to piece together old portions with new portions many times >what you get is a mess. No quarrel, the committee has taken on an ambitious job, and has to do it well to have a successful result. I can just imagine how much fun it must be to be trying to make major changes to the language with 1/3 of the committee screaming bloody murder and trying to impede the changes at every step. Actually, I don't have to imagine it; I remember it from my X3H3 work, when every vendor's favorite construct _had_ to be added to the language. If the Germans had not gone off in a corner and written GKS by themselves, and threatened to take it to ISO as a DIN proposal if we didn't get in gear, we would be there arguing still. I can easily envision the maintenance responsibility for FORTRAN being wrested from the United States if the committee minority continues to stonewall against needed improvements. FORTRAN is too important to the international community to be long held hostage to the current petty nonsense. >The committee has held on to all the old constructs and added all the >new and then trying to encourage people to move... Not exactly true. The committee seems to have brought in the notation in common use in vectorized FORTRANs, added a lot of maintainability constructs, and used the usual deprecation route to tell FORTRAN users: "Don't use these old constructs any more, they cost more than they are worth, and, on a schedule herewith published, we plan to remove them from the language." They have NOT "held on to all the old constructs", they have used the committee's long standing standard practice for getting rid of them. >People being people they don't want to go back and change the old dusty >decks ... they work ... why fix them. This one I am sure is a red herring over the lifetime of the planned changes; at least 10 years, more likely 20, will pass before the changes that make the "dusty decks" uncompilable will finally take effect. I worked recently in a major programming shop (200 active coders) with a 6.5 million line inventory of production code, about 80:20 COBOL to FORTRAN. Essentially NONE of it sat for as long as 5 years without change, and with each change, we had to pay for the lack of maintainability constructs in two of the oldest languages in common use. In fact, as part of a mainframe change that changed the word-size of our computers, we had to rewrite ALL of the code in 18 months (we hired expensive help). I think this experience is common; all of the FORTRAN dusty deck has had to undergo maintenance to run on 64 bit machines, then again to run on vector machines, then again to take full advantage of IEEE standard 80 bit floating point. Those decks don't sit still long enough to gather dust, and every time they move, they cost extra because FORTRAN lacks data abstraction, strong typing, records, type checking across compilations, adequate flow control mechanisms, and all else that the committee is offering in the way of maintainability improvements. >One of our arguments has always been that when people are faced with >a major rewrite, they will go with a different language which solves >their problem instead of using one that is full of quirks... My experience in the above major change was that we had Pascal, Ada, and many fourth generation language "COBOL generators" available for use, but we had a shop full of FORTRAN and COBOL programmers. Inertia and good sense meant that we upgraded our code to the latest standard (COBOL 74, FORTRAN 77) in the process of making the move, but did not change languages for most of the code. I think the argument ignores the inertia of the mass of programmers and the reluctance of their managers to provide expensive training or take unquantified risks. >We put together the following summary in response to some of the >other net traffic. I would appreciate your comments on it. You couldn't hold me back! ;-) >We look at the issue from two sides. The current customer base >and the complexity of keeping the old and the new in the same >compiler. Unless you also look at the higher cost of "staying put", you have done yourselves and your customers a large disservice. The problem of carrying along deprecated features has always been with the FORTRAN community, but the community has, up to now, chosen to bear that cost because of the smooth transition it allows from the old to the new compilers. >We also see holes in what is being proposed. Some of the things look >great on the surface ... but are not complete as in the case of array >notation or will not allow optimal compiler execution and code >execution as in derived data types. Things which are incomplete should be addressed with example language showing how to extend them to be complete, in your comments, not recommended for exclusion because they are not perfect. Again, if you keep this narrow focus, you do a disservice to your customers. The array notation you show below, and argue against in the cause of "excessive keystrokes", looks like the vector notation I have seen in common use in vectorizing FORTRANs; complying with the committee's mandate to standardize standard practice. The Ada committee was a lot wiser about the question of "extra keystrokes", they explicitly included in their goals a language that was harder to write but thereby easier to maintain. Lack of record structures in FORTRAN causes nightmarish convolutions in code; either the pieces of a record must be moved and accessed in separate statements, increasing the bulk of the code, or the pieces are forced into a single array of inhomogeneous types of data, making a maintenance headache of immense magnitude. >My thought is that the committee should reconfirm FORTRAN for >another 5 years, clean up this new proposed standard removing >the things they want to deprecate, etc., fix the problems in >areas of new functionality ... and call it language X. Let >FORTRAN stay FORTRAN and move the world to language X. This won't work at all; names have a strong magic: it was not technical superiority or competitive pricing or excellent support that made their microcomputers outsell all others, it was the three letters "IBM". Similarly, if it isn't "FORTRAN", but it still contains FORTRAN's weaknesses, no one will go to your language X. It is the programming language going by the name of FORTRAN that needs improving. Your proposal says "do nothing" for FORTRAN. Yuk! FORTRAN is in desperate need of help; the committee proposal looks like, and is, radical surgery. The alternative is the continuation of the patient's lingering illness. >Here's the summary. I would appreciate comments... I took the liberty of reformatting this to ease comments, and correcting a raft of typos. GNUemacs is _so_ nice! > Summary of Major Additions Proposed in FORTRAN 8x >This summary that follows is the Forward in the FORTRAN 8x draft: >Array Operations: >Computation involving large arrays is an important part of >engineering and scientific use computing. Arrays may be used as >entities in FORTRAN 8x. Operations for processing whole arrays and >sub arrays (array sections) are included in the language for two >principle reasons: (1) these features provide a more concise and >higher level language that will allow programmers more quickly and >reliably to development and maintain scientific/engineering >applications, and (2) these features can significantly facilitate >optimization of array operations on many computer architectures. >The FORTRAN 77 arithmetic, logical, and character operations and >intrinsic functions are extended to operate on array-valued operands. >These include whole, partial, and masked array assignment, >array-valued constants and expressions, and facilities to define >user-supplied array-valued functions. New intrinsic functions are >provided to manipulate and construct arrays, to perform >gather/scatter operations, and to support extended computational >capabilities involving arrays. >Numerical Computation: >Scientific computation is one of the principal application domains of >FORTRAN, and a guiding objective for all of the technical work is to >strengthen FORTRAN as a vehicle for implementing scientific software. >Though nonnumeric computations are increasing dramatically in >scientific applications, numeric computation remains dominant. >Accordingly, the additions include portable control over numeric >precision specification, inquiry as to the characteristics of numeric >information representation, and improved control of the performance >of numerical programs (for example, improved argument range reduction >and scaling). >Derived Data Types: >"Derived data type" is the term given to that set of features in this >standard that allows the programmer to define arbitrary data >structures and operations on them. Data structures are user-defined >aggregations of intrinsic and derived data types. Intrinsic >operations on structured objects include assignment, input/output, >and use as procedure arguments. With no additional derived-type >operations defined by the user, the derived data type facility is a >simple data structuring mechanism. With additional operation >definitions, derived data types provide an effective implementation >mechanism for data abstractions. >Procedure definitions may be used to define operations on intrinsic >or derived data types and nonintrinsic assignments for intrinsic and >derived types. These procedures are essentially the same as external >procedures, except that they also can be used to define infix >operators. I have a problem with this; infix procedures should be importable from modules; this seems to say that they are not; reference in contrast the Ada standard. If I define a data type in a module, and overload the '+' and '*' operator for that data type, it does me little good if I cannot export that overloading for use by other compilation units. >Modular Definitions: >In FORTRAN 77, there is no way to define a global data area in only >one place and have all the program units in an application use that >definition. In addition, the ENTRY statement is awkward and >restrictive for implementing a related set of procedures possibly >involving common data objects. Finally, there is no means in FORTRAN >77 by which procedure definitions, especially interface information, >may be made known locally to a program unit. These and other >deficiencies are remedied by a new type of program unit that may >contain any combination of data object declarations, derived date >type definitions, procedure definitions, and procedure information >information. This program unit, called a MODULE, may be considered >as a generalization and replacement for the block data program unit. This is poorly stated. The MODULE may supersede the BLOCK DATA subprogram, but they are not closely enough related to consider it a generalization of the BLOCK DATA. It would be better said that it gives FORTRAN for the first time a data abstraction mechanism. >A module may be accessed by any program unit, thereby making the >module contents available to the program unit. Thus, modules provide >improved facilities for defining global data areas, procedure >packages, and encapsulated data abstractions. >Language Evolution: >With the addition of new facilities, certain old features become >redundant and may eventually be phased out of the language as use >declines. For example, the numeric facilities alluded to above >provide the functionality of double precision; with the new array >facilities, nonconformable argument association (such as associating >an array element with a dummy array) is unnecessary (and in fact is >not useful as an array operation); This last is unclear; it may be language from the FORTRAN 77 standard, but it is compiler writers' terms of art, not programmers'. >and BLOCK DATA program units are redundant and inferior to modules. >As part of the evolution of the language, categories of language >features (deleted, obsolescent, and deprecated) are provided which >allow unused features of the language to be removed from future >standards. Good. This is as it has been done in the past and will be in the future, one hopes. > Our Comments >Array Operations: >Array operations allow the user to write simpler code in some cases. >There are also cases where the code may be more complicated to write >or may not be possible to write using the array operation notation. You have missed the purpose of the notation; it allows the user to indicate in a simple way to a vectorizing compiler the gather and scatter operations desired. The trade off is a little more typing to achieve a regular notation which can be compiled easily to (much) more efficient code. >Examples: > A = B * C is what people think of as array operations > DO I = 1,100,2 > DO J = 1,100,2 > A(I,J) = B(I,J) * C(I,J) > D(I,J) = E(I,J) * A(I,J) > ENDDO > ENDDO You can't say this in FORTRAN 77 (by my textbook). Since your proposal is to leave FORTRAN 77 unchanged and create a new language, you must compare against the FORTRAN 77 standard, not the parts of FORTRAN 8X which you most admire. > becomes: > A(1:100:2,1:100:2) = B(1:100:2,1:100:2) + C(1:100:2,1:100:2) > D(1:100:2,1:200:2) = E(1:100:2,1:100:2) + A(1:100:2,1:100:2) But you are indicating a complex situation where the gather/scatter operations need to be explicated to the compiler by the programmer. Is not A = B + C equally valid for conforming arrays, and does it not require the same amount of typing to create this loop in the old notation? Surely the gain is in the normal case, and more than makes up for the loss in the more complex case? > The DO loop contains 72 characters and the array notation contains > 112 characters. But also, the DO loop to add the whole arrays contains (more like 84 characters in FORTRAN 77) the same number of characters, and yet A=B+C needs only 5? You are not giving a fair comparison here. > If you used identified arrays to define dynamic aliases for the > 5 arrays above, then the statements would be simplified, but > you'd have 5 IDENTIFY statements and 5 new names to add to the > program. But you would then make this savings everywhere that your used the aliases. If that gather/scatter pattern were important to the problem being computed, it would be unusual indeed to see it used only once in the program. Again, properly analyzed, the gain is to the committee's proposal. > DO I = 2,100,2 > A(I-1) = B(I) + C(I/2) * D(102-I) > ENDDO > becomes: > A(1:100:2) = B(2:100:2) + C( :50) * D(100:2:-2) > Which is easier to read and understand in the DO loop form. The > bottom example saves 2 characters. Again, you miss the point that the more regular notation immensely eases the task of a vectorizing compiler. It is nearly an AI problem (and probably provably NP complete), to extract the assignment conflicts from a complex DO LOOP, while the committee's notation, in successful use in vectorizing compilers today, makes the task very easy. > > DO I = 2,100 > A(I) = A(I-1) + B(I) > ENDDO > Cannot be represented in array notation. > A(2:100) = A(1:99) + B(2:100) ! answer is different to the DO loop Certainly, but this is not an array operation; this is an array element operation, and is a classical example of needing the result of the (i-1)st step before computing the (i)th. This is hardly an argument against the committee's proposal, but is instead just a textbook example of why it is difficult to analyze DO loops for vectorizability. In fact, it argues against you, since, by providing the array notation for use by the programmer where vectorization is appropriate, the compiler writer should be no longer obligated to analyze DO loops for vectorizability, immensely easing the task of writing a vectorizing compiler for FORTRAN. > Thus, not all array constructs can be represented in array notation. Again, that is not an "array construct", it is an elementwise operation on an array, not at all the same thing. > Other issues include: > 1. Array subscripts are not allowed in array notation. Unclear. > 2. Implied array temporaries may increase execution time. If they were needed anyway, there is no loss. If they were not needed, then the compiler writer has not done his job correctly. If they allow vector operation where it was not possible before, and the execution slows, replace the hardware! > 3. Dope vectors will be required with function invocations increasing > call overhead. Since most usage of FORTRAN today is moving toward vector hardware (why else use such an antique language with all its problems if you don't have big problems requiring fast execution times?), the use of "dope vectors" is already either standard practice or taken care of in hardware. This is a non-problem. >Numerical Computation: > REAL (PRECISION=10, RANGE=50) TEMPERATURE > D = DIGITS (TEMPERATURE) > E = EFFECTIVE_PRECISION (TEMPERATURE) > Allows the program to determine attributes about the underlying > implementation and allows support for more than 2 underlying REAL > data types. Aids in moving programs to machines with different > word size. > Issues include: > 1. May greatly increase the number of specific intrinsic functions I think standard usage is to use the next bigger type that fits. Poof, no problem, you already have that intrinsic function written. > 2. "If no method exists that satisfies the specified precision and > exponent range, the results are processor dependent" Ada > compilers are required to flag such situations as an error. And, certainly, so should FORTRAN compilers; the purpose is to let the programmer tell the compiler his real needs for numeric precision for this data type; unless he is lying, the program may be expected to fail, so why proceed to link and execute it? >Derived Data Types: > TYPE PERSON > INTEGER AGE > CHARACTER (LEN=50) NAME > REAL, ARRAY (2,2) :: RATES > END TYPE PERSON This is very pretty. However, it fails to discriminate between typing to form an aggregate, and typing to differentiate usages of intrinsic types. To see the problem, consider defining just a person's name to be of TYPE character length 50. What I want to do is: TYPE CHARACTER (50) NAME END TYPE but I doubt this would compile. There needs to be a keyword to indicate a typing declaration, and a separate keyword to indicate an aggregation. Standard practice is TYPE and RECORD in Pascal or Ada or Modula-2, typedef and struct in C. Probably the former would be preferable for FORTRAN. > TYPE PERSON CHAIRMAN This, however, is not pretty. It should be clear in reading that this is a variable of type person being declared. It looks more like a redeclaration of person to use a data element of type chairman. The committee should take cognizance of reality; most future FORTRAN users will have had their computer language training in languages which are not line oriented, and so visual cues relating to line structure, easy for old FORTRAN programmers to grasp, will be imperceptible to students trained in modern, free format languages. A much better choice would be VAR, widely used, instead of TYPE. If I understand the use of "::" in the proposal: VAR CHAIRMAN :: PERSON would be better. > CHAIRMAN%AGE = 50 And this is an abomination. Almost every language that uses a record structure uses the period to indicate a substructural element, why make life hard for the multiple language user? The period provides a visually obvious break, easing reading and thus the maintenance task. The percent sign is non-standard and hard to find in the middle of a word. Yuk! > Provides a record data structure for FORTRAN (which is different from > the record data structures currently available in VMS FORTRAN.) No vendor is guaranteed that his particular implementation of a non-standard feature will become the one accepted in the next standard. All he can expect is that, if successful, something with the same functionality will probably make it into a future standard. Sounds like sour grapes to me. ;-) > Issues: > 1. Overloaded operators can make programs harder to read and > understand. Wrong! FORTRAN already massively overloads the arithmetic operators; this does nothing but make the language easier to use and read. > 2. Generic user functions can cause an explosion in the object code > if the function has many arguments. No worse than a procedure call; what's the problem? > 3. The compiler cannot optimize operations on derived data types > in the same fashion as with intrinsic data types. For example, > if the BIT data type is defined in the language, the compiler > can generate optimized code to deal with this data type. You > can do a BIT data type using derived data types, but the > compiler will not have the same amount of information available > for optimization since the derived data type is a generalized > function. Only partially true. Operations such as assignment can be done by a move bytes operation, which is usually quite fast, and this in turn can be done mostly as a move words operation if it is long enough to be worth the trouble. User defined operations must be done using the code in the user's MODULE, but if not done that way, they would be being done instead, by inline, user written code multiple places in the main routines, with concomitant larger code sizes, confusion, and major maintenance problems when the logical object being manipulated changes due to maintenance activity. > Modular Definitions: > MODULE POOL1 > INTEGER X(1000) > REAL Y(100,100) > END MODULE > USE POOL1 > Allows for better modularization of programs. Interface errors between > modules will be caught at compile/link time. > Issues: > 1. The dependent compilation model has not been tested in the FORTRAN > arena and is not like the Ada dependent compilation. Nevertheless, dependent compilation is now a well known, smoothly functioning technology which has an incredible payback in independent maintenance activities and reduced maintenance costs. This is the foundation of data abstraction, sorely needed by FORTRAN for many years. Gaining the module construct for FORTRAN is worth a great deal of cost in compiler redesign. > 2. Increases compilation complexity and requires changes in other > areas of the system software such as the linkers and loaders. Yes, but the payoff is well worth the cost. > 3. Will cause compilers to be slower. Some argue that faster machines > will overcome this, but if I have a FORTRAN 77 and a FORTRAN 8x > compiler on the same machine, the FORTRAN 8x compiler will have to, > by definition of the standard, do more work to compile the same > program. If the program uses only FORTRAN 77 constructs it will still compile under the old compiler. Poof! If it uses new constructs, it will only compile under the new compiler. Poof! If the old compiler is thrown away, and the old code is compiled under the new compiler, it will exercise only those portions of the compiler pertaining to the old constructs. Unless the compiler is HORRIBLY written, "Poof!" again; not a problem. > 4. The INCLUDE statement, which performs the function of allowing a > single definition of common code, is NOT a part of this proposed > standard. Thre is no need to repeat code throughout the program: the sole use for that in FORTRAN 77 is to propagate COMMON declarations. COMMON is superseded by the MODULE; incorporating INCLUDE will only delay adoption of the MODULE. (I know for a fact that all the compiler vendors will continue to carry INCLUDE as a non-standard feature, but why put it in, just to deprecate it in the next standard release? Dumb.) >Everyone has the right to form their own opinion about the proposed >FORTRAN 8x standard. Remember that if you are going to give your >opinion in writting to the committee, that letters must be received >in Washington DC by February 23, 1988. The address is: > X3 Secretariat > ATTN: Gwendy Phillips > Computer and Business Equipment Manufacturers Association > Suite 500 > 311 First Street, NW > Washington, DC 20001-2178 Kent, the man from xanth.