[comp.std.internat] FORTRAN 8X discussion

kent@xanth.cs.odu.edu (Kent Paul Dolan) (02/04/88)
Presley,

	Thanks for the long note.  I have included it verbatim (except
	for spelling corrections and reformatting) here with my
	comments.  Since the issues discussed are of interest to the
	larger community, I have taken the liberty of posting the
	discussion, long as it is.

	I think the major difference we have is that I am less
	interested in the performance of compilers (I use, rather than
	sell them), than I am in the performance of programmers (being
	one).  I see the committee's proposal, flawed though it
	undoubtedly is, as a major, correct step toward decreasing the
	life cycle cost of writing and maintaining FORTRAN code.  In
	that larger perspective, taking a hit on compiler performance
	is pretty small potatoes.

Kent, the man from xanth.

>From psmith@convex.uucp Wed Feb  3 15:42:07 1988
>To: kent@xanth.cs.odu.edu
>Subject: Re:  FYI - Thanks for the Input

[In response to a previous note of mine - Kent.]

>In reality, I tend to agree with you.  In fact, IBM, DEC, and others
>do also.  They have said that the committee has produced a new language.

	No, the committee has done major surgery on a very ill
	patient.  Not the same thing at all.

>I have no problem with letting FORTRAN go the way of ALGOL and others.

	I have no trouble with someone taking a gun to FORTRAN and
	saying "Bang, you're dead"; I just don't think it will happen.
	Letting the language linger on, looking like a viable
	language, continuing to be taught by FORTRAN using professors
	to unsuspecting engineering majors, costing probably billions
	in excessive maintenance costs due to problems which can be
	remedied only by such radical surgery as proposed by the
	committee, to this I do object.  The problem is not that
	FORTRAN is dying, the problem is that FORTRAN is not dying,
	but it is bleeding us white.

>When you try to piece together old portions with new portions many times
>what you get is a mess.

	No quarrel, the committee has taken on an ambitious job, and
	has to do it well to have a successful result.  I can just
	imagine how much fun it must be to be trying to make major
	changes to the language with 1/3 of the committee screaming
	bloody murder and trying to impede the changes at every step.

	Actually, I don't have to imagine it; I remember it from my
	X3H3 work, when every vendor's favorite construct _had_ to be
	added to the language.  If the Germans had not gone off in a
	corner and written GKS by themselves, and threatened to take
	it to ISO as a DIN proposal if we didn't get in gear, we would
	be there arguing still.

	I can easily envision the maintenance responsibility for
	FORTRAN being wrested from the United States if the committee
	minority continues to stonewall against needed improvements.
	FORTRAN is too important to the international community to be
	long held hostage to the current petty nonsense.

>The committee has held on to all the old constructs and added all the
>new and then trying to encourage people to move...

	Not exactly true.  The committee seems to have brought in the
	notation in common use in vectorized FORTRANs, added a lot of
	maintainability constructs, and used the usual deprecation
	route to tell FORTRAN users: "Don't use these old constructs
	any more, they cost more than they are worth, and, on a
	schedule herewith published, we plan to remove them from the
	language."  They have NOT "held on to all the old constructs",
	they have used the committee's long standing standard practice
	for getting rid of them.

>People being people they don't want to go back and change the old dusty
>decks ... they work ... why fix them.

	This one I am sure is a red herring over the lifetime of the
	planned changes; at least 10 years, more likely 20, will pass
	before the changes that make the "dusty decks" uncompilable
	will finally take effect.  I worked recently in a major
	programming shop (200 active coders) with a 6.5 million line
	inventory of production code, about 80:20 COBOL to FORTRAN.
	Essentially NONE of it sat for as long as 5 years without
	change, and with each change, we had to pay for the lack of
	maintainability constructs in two of the oldest languages in
	common use.

	In fact, as part of a mainframe change that changed the
	word-size of our computers, we had to rewrite ALL of the code
	in 18 months (we hired expensive help).  I think this
	experience is common; all of the FORTRAN dusty deck has had to
	undergo maintenance to run on 64 bit machines, then again to
	run on vector machines, then again to take full advantage of
	IEEE standard 80 bit floating point.  Those decks don't sit
	still long enough to gather dust, and every time they move,
	they cost extra because FORTRAN lacks data abstraction, strong
	typing, records, type checking across compilations, adequate
	flow control mechanisms, and all else that the committee is
	offering in the way of maintainability improvements.

>One of our arguments has always been that when people are faced with
>a major rewrite, they will go with a different language which solves
>their problem instead of using one that is full of quirks...

	My experience in the above major change was that we had Pascal,
	Ada, and many fourth generation language "COBOL generators"
	available for use, but we had a shop full of FORTRAN and COBOL
	programmers.  Inertia and good sense meant that we upgraded
	our code to the latest standard (COBOL 74, FORTRAN 77) in the
	process of making the move, but did not change languages for
	most of the code.  I think the argument ignores the inertia of
	the mass of programmers and the reluctance of their managers
	to provide expensive training or take unquantified risks.

>We put together the following summary in response to some of the 
>other net traffic.  I would appreciate your comments on it.  

	You couldn't hold me back!  ;-)

>We look at the issue from two sides.  The current customer base
>and the complexity of keeping the old and the new in the same 
>compiler.

	Unless you also look at the higher cost of "staying put", you
	have done yourselves and your customers a large disservice.
	The problem of carrying along deprecated features has always
	been with the FORTRAN community, but the community has, up to
	now, chosen to bear that cost because of the smooth transition
	it allows from the old to the new compilers.

>We also see holes in what is being proposed.  Some of the things look
>great on the surface ... but are not complete as in the case of array
>notation or will not allow optimal compiler execution and code
>execution as in derived data types.

	Things which are incomplete should be addressed with example
	language showing how to extend them to be complete, in your
	comments, not recommended for exclusion because they are not
	perfect.

	Again, if you keep this narrow focus, you do a disservice to
	your customers.  The array notation you show below, and argue
	against in the cause of "excessive keystrokes", looks like the
	vector notation I have seen in common use in vectorizing
	FORTRANs; complying with the committee's mandate to
	standardize standard practice.  The Ada committee was a lot
	wiser about the question of "extra keystrokes", they
	explicitly included in their goals a language that was harder
	to write but thereby easier to maintain.

	Lack of record structures in FORTRAN causes nightmarish
	convolutions in code; either the pieces of a record must be
	moved and accessed in separate statements, increasing the bulk
	of the code, or the pieces are forced into a single array of
	inhomogeneous types of data, making a maintenance headache of
	immense magnitude.

>My thought is that the committee should reconfirm FORTRAN for 
>another 5 years, clean up this new proposed standard removing 
>the things they want to deprecate, etc., fix the problems in
>areas of new functionality ... and call it language X.    Let
>FORTRAN stay FORTRAN and move the world to language X.  

	This won't work at all; names have a strong magic:  it was not
	technical superiority or competitive pricing or excellent
	support that made their microcomputers outsell all others, it
	was the three letters "IBM".  Similarly, if it isn't "FORTRAN",
	but it still contains FORTRAN's weaknesses, no one will go to
	your language X.  It is the programming language going by the
	name of FORTRAN that needs improving.  Your proposal says "do
	nothing" for FORTRAN.  Yuk!  FORTRAN is in desperate need of
	help; the committee proposal looks like, and is, radical
	surgery.  The alternative is the continuation of the patient's
	lingering illness.

>Here's the summary.  I would appreciate comments...

	I took the liberty of reformatting this to ease comments, and
	correcting a raft of typos.  GNUemacs is _so_ nice!

>        Summary of Major Additions Proposed in FORTRAN 8x

>This summary that follows is the Forward in the FORTRAN 8x draft:

>Array Operations:

>Computation involving large arrays is an important part of
>engineering and scientific use computing.  Arrays may be used as
>entities in FORTRAN 8x.  Operations for processing whole arrays and
>sub arrays (array sections) are included in the language for two
>principle reasons: (1) these features provide a more concise and
>higher level language that will allow programmers more quickly and
>reliably to development and maintain scientific/engineering
>applications, and (2) these features can significantly facilitate
>optimization of array operations on many computer architectures.

>The FORTRAN 77 arithmetic, logical, and character operations and
>intrinsic functions are extended to operate on array-valued operands.
>These include whole, partial, and masked array assignment,
>array-valued constants and expressions, and facilities to define
>user-supplied array-valued functions.  New intrinsic functions are
>provided to manipulate and construct arrays, to perform
>gather/scatter operations, and to support extended computational
>capabilities involving arrays.

  
>Numerical Computation:

>Scientific computation is one of the principal application domains of
>FORTRAN, and a guiding objective for all of the technical work is to
>strengthen FORTRAN as a vehicle for implementing scientific software.
>Though nonnumeric computations are increasing dramatically in
>scientific applications, numeric computation remains dominant.
>Accordingly, the additions include portable control over numeric
>precision specification, inquiry as to the characteristics of numeric
>information representation, and improved control of the performance
>of numerical programs (for example, improved argument range reduction
>and scaling).


>Derived Data Types:

>"Derived data type" is the term given to that set of features in this
>standard that allows the programmer to define arbitrary data
>structures and operations on them.  Data structures are user-defined
>aggregations of intrinsic and derived data types.  Intrinsic
>operations on structured objects include assignment, input/output,
>and use as procedure arguments.  With no additional derived-type
>operations defined by the user, the derived data type facility is a
>simple data structuring mechanism.  With additional operation
>definitions, derived data types provide an effective implementation
>mechanism for data abstractions.

>Procedure definitions may be used to define operations on intrinsic
>or derived data types and nonintrinsic assignments for intrinsic and
>derived types.  These procedures are essentially the same as external
>procedures, except that they also can be used to define infix
>operators.

	I have a problem with this; infix procedures should be
	importable from modules; this seems to say that they are not;
	reference in contrast the Ada standard.  If I define a data
	type in a module, and overload the '+' and '*' operator for
	that data type, it does me little good if I cannot export that
	overloading for use by other compilation units.
 
>Modular Definitions:

>In FORTRAN 77, there is no way to define a global data area in only
>one place and have all the program units in an application use that
>definition.  In addition, the ENTRY statement is awkward and
>restrictive for implementing a related set of procedures possibly
>involving common data objects.  Finally, there is no means in FORTRAN
>77 by which procedure definitions, especially interface information,
>may be made known locally to a program unit.  These and other
>deficiencies are remedied by a new type of program unit that may
>contain any combination of data object declarations, derived date
>type definitions, procedure definitions, and procedure information
>information.  This program unit, called a MODULE, may be considered
>as a generalization and replacement for the block data program unit.

	This is poorly stated.  The MODULE may supersede the BLOCK
	DATA subprogram, but they are not closely enough related to
	consider it a generalization of the BLOCK DATA.  It would be
	better said that it gives FORTRAN for the first time a data
	abstraction mechanism.

>A module may be accessed by any program unit, thereby making the
>module contents available to the program unit.  Thus, modules provide
>improved facilities for defining global data areas, procedure
>packages, and encapsulated data abstractions.


>Language Evolution:

>With the addition of new facilities, certain old features become
>redundant and may eventually be phased out of the language as use
>declines.  For example, the numeric facilities alluded to above
>provide the functionality of double precision; with the new array
>facilities, nonconformable argument association (such as associating
>an array element with a dummy array) is unnecessary (and in fact is
>not useful as an array operation);

	This last is unclear; it may be language from the FORTRAN 77
	standard, but it is compiler writers' terms of art, not
	programmers'.

>and BLOCK DATA program units are redundant and inferior to modules.

>As part of the evolution of the language, categories of language
>features (deleted, obsolescent, and deprecated) are provided which
>allow unused features of the language to be removed from future
>standards.

	Good.  This is as it has been done in the past and will be in
	the future, one hopes.  

>                       Our Comments


>Array Operations:

>Array operations allow the user to write simpler code in some cases.  
>There are also cases where the code may be more complicated to write
>or may not be possible to write using the array operation notation.

	You have missed the purpose of the notation; it allows the
	user to indicate in a simple way to a vectorizing compiler the
	gather and scatter operations desired.  The trade off is a
	little more typing to achieve a regular notation which can be
	compiled easily to (much) more efficient code.

>Examples:

>  A = B * C       is what people think of as array operations

>  DO I = 1,100,2
>  DO J = 1,100,2
>     A(I,J) = B(I,J) * C(I,J)
>     D(I,J) = E(I,J) * A(I,J)
>  ENDDO
>  ENDDO

	You can't say this in FORTRAN 77 (by my textbook).  Since your
	proposal is to leave FORTRAN 77 unchanged and create a new
	language, you must compare against the FORTRAN 77 standard,
	not the parts of FORTRAN 8X which you most admire.

>    becomes:

>  A(1:100:2,1:100:2) = B(1:100:2,1:100:2) + C(1:100:2,1:100:2)
>  D(1:100:2,1:200:2) = E(1:100:2,1:100:2) + A(1:100:2,1:100:2)

	But you are indicating a complex situation where the
	gather/scatter operations need to be explicated to the
	compiler by the programmer.  Is not A = B + C equally valid
	for conforming arrays, and does it not require the same amount
	of typing to create this loop in the old notation?  Surely the
	gain is in the normal case, and more than makes up for the
	loss in the more complex case?

>     The DO loop contains 72 characters and the array notation contains
>     112 characters.

	But also, the DO loop to add the whole arrays contains (more
	like 84 characters in FORTRAN 77) the same number of
	characters, and yet A=B+C needs only 5?  You are not giving a
	fair comparison here.

>     If you used identified arrays to define dynamic aliases for the
>     5 arrays above, then the statements would be simplified, but
>     you'd have 5 IDENTIFY statements and 5 new names to add to the
>     program.

	But you would then make this savings everywhere that your used
	the aliases.  If that gather/scatter pattern were important to
	the problem being computed, it would be unusual indeed to see
	it used only once in the program.  Again, properly analyzed,
	the gain is to the committee's proposal.

>  DO I = 2,100,2
>  A(I-1) = B(I) + C(I/2) * D(102-I)
>  ENDDO
  
>     becomes:

>  A(1:100:2) = B(2:100:2) + C( :50) * D(100:2:-2)  

>     Which is easier to read and understand in the DO loop form.  The 
>     bottom example saves 2 characters.

	Again, you miss the point that the more regular notation
	immensely eases the task of a vectorizing compiler.  It is
	nearly an AI problem (and probably provably NP complete), to
	extract the assignment conflicts from a complex DO LOOP, while
	the committee's notation, in successful use in vectorizing
	compilers today, makes the task very easy.
>
>  DO I = 2,100
>  A(I) = A(I-1) + B(I)
>  ENDDO

>     Cannot be represented in array notation.  

>  A(2:100) = A(1:99) + B(2:100)  ! answer is different to the DO loop 

	Certainly, but this is not an array operation; this is an
	array element operation, and is a classical example of needing
	the result of the (i-1)st step before computing the (i)th.
	This is hardly an argument against the committee's proposal,
	but is instead just a textbook example of why it is difficult
	to analyze DO loops for vectorizability.  In fact, it argues
	against you, since, by providing the array notation for use by
	the programmer where vectorization is appropriate, the
	compiler writer should be no longer obligated to analyze DO
	loops for vectorizability, immensely easing the task of
	writing a vectorizing compiler for FORTRAN.

>     Thus, not all array constructs can be represented in array notation.

	Again, that is not an "array construct", it is an elementwise
	operation on an array, not at all the same thing.

>  Other issues include:

>   1.  Array subscripts are not allowed in array notation.

	Unclear.

>   2.  Implied array temporaries may increase execution time.

	If they were needed anyway, there is no loss.  If they were not
	needed, then the compiler writer has not done his job
	correctly.  If they allow vector operation where it was not
	possible before, and the execution slows, replace the hardware!

>   3.  Dope vectors will be required with function invocations increasing
>	call overhead. 

	Since most usage of FORTRAN today is moving toward vector
	hardware (why else use such an antique language with all its
	problems if you don't have big problems requiring fast
	execution times?), the use of "dope vectors" is already either
	standard practice or taken care of in hardware.  This is a
	non-problem. 

>Numerical Computation:

>   REAL (PRECISION=10, RANGE=50) TEMPERATURE
>   D = DIGITS (TEMPERATURE)
>   E = EFFECTIVE_PRECISION (TEMPERATURE)

>  Allows the program to determine attributes about the underlying
>  implementation and allows support for more than 2 underlying REAL
>  data types.  Aids in moving programs to machines with different
>  word size.

>  Issues include:

>  1.  May greatly increase the number of specific intrinsic functions

	I think standard usage is to use the next bigger type that
	fits.  Poof, no problem, you already have that intrinsic
	function written.

>  2.  "If no method exists that satisfies the specified precision and
>       exponent range, the results are processor dependent"   Ada 
>       compilers are required to flag such situations as an error.

	And, certainly, so should FORTRAN compilers; the purpose is to
	let the programmer tell the compiler his real needs for
	numeric precision for this data type; unless he is lying, the
	program may be expected to fail, so why proceed to link and
	execute it?

>Derived Data Types:

>   TYPE PERSON
>      INTEGER AGE
>      CHARACTER (LEN=50) NAME
>      REAL, ARRAY (2,2) :: RATES
>   END TYPE PERSON

	This is very pretty.  However, it fails to discriminate
	between typing to form an aggregate, and typing to
	differentiate usages of intrinsic types.  To see the problem,
	consider defining just a person's name to be of TYPE character
	length 50.  What I want to do is:

	   TYPE 
		CHARACTER (50) NAME
	   END TYPE

	but I doubt this would compile. There needs to be a keyword to
	indicate a typing declaration, and a separate keyword to
	indicate an aggregation.  Standard practice is TYPE and RECORD
	in Pascal or Ada or Modula-2, typedef and struct in C.
	Probably the former would be preferable for FORTRAN.

>   TYPE PERSON CHAIRMAN 

	This, however, is not pretty.  It should be clear in reading
	that this is a variable of type person being declared.  It
	looks more like a redeclaration of person to use a data
	element of type chairman.  The committee should take
	cognizance of reality; most future FORTRAN users will have had
	their computer language training in languages which are not
	line oriented, and so visual cues relating to line structure,
	easy for old FORTRAN programmers to grasp, will be
	imperceptible to students trained in modern, free format
	languages.  A much better choice would be VAR, widely used,
	instead of TYPE.  If I understand the use of "::" in the
	proposal:

	   VAR CHAIRMAN :: PERSON

	would be better.

>   CHAIRMAN%AGE = 50

	And this is an abomination.  Almost every language that uses a
	record structure uses the period to indicate a substructural
	element, why make life hard for the multiple language user?
	The period provides a visually obvious break, easing reading
	and thus the maintenance task.  The percent sign is
	non-standard and hard to find in the middle of a word.  Yuk!

>   Provides a record data structure for FORTRAN (which is different from
>   the record data structures currently available in VMS FORTRAN.)

	No vendor is guaranteed that his particular implementation of
	a non-standard feature will become the one accepted in the
	next standard.  All he can expect is that, if successful,
	something with the same functionality will probably make it
	into a future standard.  Sounds like sour grapes to me.  ;-)

>   Issues:

>   1.  Overloaded operators can make programs harder to read and
>	understand.

	Wrong!  FORTRAN already massively overloads the arithmetic
	operators; this does nothing but make the language easier to use
	and read.

>   2.  Generic user functions can cause an explosion in the object code
>	if the function has many arguments. 

	No worse than a procedure call; what's the problem?
	
> 3.  The compiler cannot optimize operations on derived data types
>     in the same fashion as with intrinsic data types.  For example,
>     if the BIT data type is defined in the language, the compiler
>     can generate optimized code to deal with this data type.  You
>     can do a BIT data type using derived data types, but the
>     compiler will not have the same amount of information available
>     for optimization since the derived data type is a generalized
>     function.

	Only partially true.  Operations such as assignment can be
	done by a move bytes operation, which is usually quite fast,
	and this in turn can be done mostly as a move words operation
	if it is long enough to be worth the trouble.  User defined
	operations must be done using the code in the user's MODULE, but
	if not done that way, they would be being done instead, by
	inline, user written code multiple places in the main
	routines, with concomitant larger code sizes, confusion, and
	major maintenance problems when the logical object being
	manipulated changes due to maintenance activity.

> Modular Definitions:

>   MODULE POOL1
>      INTEGER X(1000)
>      REAL Y(100,100)
>   END MODULE


>   USE POOL1


>     Allows for better modularization of programs.  Interface errors between
>     modules will be caught at compile/link time.

>   Issues:

>    1.  The dependent compilation model has not been tested in the FORTRAN
>	 arena and is not like the Ada dependent compilation.

	Nevertheless, dependent compilation is now a well known,
	smoothly functioning technology which has an incredible
	payback in independent maintenance activities and reduced
	maintenance costs.  This is the foundation of data
	abstraction, sorely needed by FORTRAN for many years.  Gaining
	the module construct for FORTRAN is worth a great deal of cost
	in compiler redesign.

>    2.  Increases compilation complexity and requires changes in other
>	 areas of the system software such as the linkers and loaders.

	Yes, but the payoff is well worth the cost.

>    3.  Will cause compilers to be slower.  Some argue that faster machines
>	 will overcome this, but if I have a FORTRAN 77 and a FORTRAN 8x
>	 compiler on the same machine, the FORTRAN 8x compiler will have to,
>	 by definition of the standard, do more work to compile the same   
>	 program.  

	If the program uses only FORTRAN 77 constructs it will still
	compile under the old compiler.  Poof!  If it uses new
	constructs, it will only compile under the new compiler.
	Poof!  If the old compiler is thrown away, and the old code is
	compiled under the new compiler, it will exercise only those
	portions of the compiler pertaining to the old constructs.
	Unless the compiler is HORRIBLY written, "Poof!" again; not a
	problem.

>    4.  The INCLUDE statement, which performs the function of allowing a 
>	 single definition of common code, is NOT a part of this proposed
>	 standard.  

	Thre is no need to repeat code throughout the program:  the
	sole use for that in FORTRAN 77 is to propagate COMMON
	declarations.  COMMON is superseded by the MODULE;
	incorporating INCLUDE will only delay adoption of the MODULE.
	(I know for a fact that all the compiler vendors will continue
	to carry INCLUDE as a non-standard feature, but why put it in,
	just to deprecate it in the next standard release?  Dumb.)

>Everyone has the right to form their own opinion about the proposed
>FORTRAN 8x standard.  Remember that if you are going to give your
>opinion in writting to the committee, that letters must be received
>in Washington DC by February 23, 1988.  The address is:

>         X3 Secretariat
>	  ATTN:  Gwendy Phillips
>	  Computer and Business Equipment Manufacturers Association
>	  Suite 500
>	  311 First Street, NW
>	  Washington, DC   20001-2178

Kent, the man from xanth.