mlj8e@dale.acc.Virginia.EDU (Michael L. Johnson) (04/16/88)
I am one of the scientist's (Biophysisist) who is pigheaded enough to want to use FORTRAN instead of one of a couple of dozen other languages which I know: FOCAL, ALGOL, C, BASIC, Pascal, Modula-2, Sail, Pl/1, and the assembler on several machines. Since a number of people seem not to understand why anyone uses FORTRAN I thought that I would post an explaination of why I use FORTRAN, even though I know a long series of other languages. Before proceeding to code a computer program, an scientist must first address a simple question which has very subtle implications. The question is "What computer language should be used for this type of analysis?" There are six primary considerations which need to be addressed before the computer language is chosen. They are: 1) The computer language must be universal. 2) Only the "language standard" should be used; i.e. no non-standard extensions. 3) The investigator should consider the language features required for the particular problem. 4) The computer program should be efficient. 5) Options inherent within certain languages can make the particular problem easier. 6) There is no need to reinvent the wheel by rewriting software which has already been written and is available at a lower cost than paying a programmer to do it again. The optimal choice of a language is one which meets all of your absolute requirements and is available on almost every computer system. For example, ten years ago when at the National Institutes of Health, I asked their computer specialists which language should be used. I was told that if I wanted to use their DEC System-10 I should use the SAIL language and if he wanted to use their IBM-360 he should use PL/1. This investigator instead chose FORTRAN. Later, after moving to the University of Virginia, he discovered that SAIL programs can only run on DEC System 10/20 computers. Furthermore, the University of Virginia's CDC-855 did not have a PL/1 compiler. However, all three computers had FORTRAN compilers and, consequently, there was no need to translate any programs. Translations can be very time consuming and expensive, especially when the computer programs are long and complex. Almost every computer currently in use has a FORTRAN-77 compiler available. Most computers now also have Pascal, C and BASIC available. Other languages like ALGOL, BLISS, FOCAL and Modula-2 are not commonly available. Therefore, in this discussion of relative merits we will consider only the most popular languages: FORTRAN-77, Pascal, C, and BASIC. The reader is cautioned not to use the convenient language extensions which most manufacturers have included in their compilers. For example, some versions of BASIC allow the direct manipulation of matrices while others and the language standard do not. Consequently, if a BASIC program which uses these matrix features on one computer is transfered to another computer whose compiler does not have these extensions, the program will not run. Another example is the number of attractive services provided by the UNIX operating system which are exclusive to particular versions of UNIX. If a computer program is developed which relies on operating system features on one computer, it will be difficult or impossible to transfer it to a different computer system! To make matters more confusing, a number of computer languages, like Pascal and BASIC, have such poorly defined "standards" that they vary substantially among computers from different manufacturers. A computer program should use only those features of a language which are the same on all computers which implement that language, i.e. the minimal language standard. There are a number of computer language features which are required for the development of scientific programs which are not common among FORTRAN-77, Pascal, C, and BASIC. BASIC has a number of unattractive features for this application. It does not always permit the easy use of subroutines (or subprocedures) and it does not always permit more than single character variable names. A potential problem with Pascal relates to the "Pascal standard" which does not provide for double precision floating point variables. Matrix manipulations, such as dot products, $must$ be performed with greater precision than that offered by single precision floating point calculations, which typically store variables as 32-bit numbers. Furthermore, the "Pascal standard" also does not allow subprocedures to be compiled separately from the main program. It is inefficient and time consuming to recompile thousands of lines of Pascal code in order to make a simple change in a ten line subprocedure. The Pascal enthusiast can argue that most Pascal compilers are not restricted by the "Pascal standard". However, beware that different compilers implement these features in different ways and thus translation from one computer system to another may be difficult. The efficiency of a computer program is important for some of the procedures which we have presented which require hundreds of evaluations of complex functions. Inefficiencies can be grouped into in two categories: a) The inherent slowness of a language, e.g. BASIC, which is usually interpreted rather than compiled, and b) Inherent inefficiencies of some compilers. For example, we have two different Pascal compilers for our DEC PDP-11/73 running the TSX-plus operating system. We used a prime number generation program to compare our Pascal-2 compiler with our NBS-Pascal compiler, and found that the NBS-Pascal compiler required almost five times as much computer time for the same calculation. However, the Pascal-2 compiler was substantially more expensive. Of less importance, but still worth considering, are the optional features which the languages do not have in common. None of these are required for the least-squares application but are needed in some other applications. FORTRAN "labeled common blocks" are very useful and are not allowed by the Pascal, C, and BASIC standards. FORTRAN "equivalence statements" can also be very useful. Recursion, the ability of a procedure, or subprocedure, to either directly or indirectly call itself, can also be useful. The FORTRAN-77 and BASIC standards do not allow recursion, but some implementations of these languages do allow recursion. Another language feature which is attractive, but not essential, is the ability to write "structured" programs. FORTRAN-77 and BASIC allow structured programming, while Pascal and C require structured programming. It should be noted that it is sometimes very convenient not to be required to use structured programs, even though it is usually an excellent way to program. The last important point is to avoid writing routines which have already been programmed. Subroutine libraries exist to perform most of the linear algebra and eigensystem operations required for least-squares parameter estimation. Examples are the LINPACK, EISPACK, LLSQ, MINPACK and IMSL libraries of FORTRAN subroutines. These are all written in FORTRAN-4 and/or FORTRAN-77. Some computers will allow linking a routine written in one language to a program written in another language if certain specific conditions are met, but this is a difficult process at best. A better way to approach the encoding of an analysis procedure is to begin coding in the language of choice, i.e. the FORTRAN-77 standard, so that implementing preexisting routines to handle complicated, but standard operations is straightforward. There is little justification for ever converting a functional computer program from one language to another unless the computer system being used does not have a compiler for the first language. Again, we recommend that computer programs be written in either the FORTRAN-77 standard or, sometimes, if the computer is running the UNIX operating system, the C language standard. (804)-924-2496 Michael L. Johnson mlj8e@virginia.EDU Pharmacology Dept. uunet!virginia!mlj8e Box 448; Univ. of Va. mlj8e@virginia.BITNET Charlottesville, Va. 22908
cik@l.cc.purdue.edu (Herman Rubin) (04/17/88)
In article <773@virginia.acc.virginia.edu>, mlj8e@dale.acc.Virginia.EDU (Michael L. Johnson) writes: ........... > Before proceeding to code a computer program, an scientist must first > address a simple question which has very subtle implications. The question > is "What computer language should be used for this type of analysis?" There > are six primary considerations which need to be addressed before the > computer language is chosen. They are: > > 1) The computer language must be universal. > 2) Only the "language standard" should be used; i.e. no non-standard > extensions. > 3) The investigator should consider the language features required for > the particular problem. > 4) The computer program should be efficient. > 5) Options inherent within certain languages can make the particular > problem easier. > 6) There is no need to reinvent the wheel by rewriting software which has > already been written and is available at a lower cost than paying a > programmer to do it again. What you are asking for does not exist. I know of no language which recognizes the constructs that a reasonably capable mathematician will automatically use to attack a problem. There are some languages which come somewhat close to being universal, but they are very difficult to read and write. This problem is not necessary. Most, if not all, languages are compromised by the fact that the language designers took one or more of the following attitudes: You do not need or want to do that; you can accomplish the same thing (with possibly far greater cost) in this way; this occurs so rarely that including it is not worth while; you can make a mistake in using it. I am familiar with so many situations in which code which is efficient on one machine is woefully inefficient on another, and vice versa. Ideally, one should be able to write source code so that the compiler can produce efficient code on any machine. Unfortunately, this is not the case, and will not be the case as long as the language gurus restrict their languages as they do. I would make (4) the requirement which should be foremost for all except short programs. Item (6) is very important, and those features of imple- mentations (such as compiler-affixed underscores to global names) should be eliminated, and appropriate interfaces and editors of object (binary) programs added to make it easy to interface routines written in different languages. For a language to be universal, it must be able to handle all instructions, data types, operations, pseudo-operations, print statement formats, etc., which are desired. This is impossible, hence the language must allow the addition of objects unforeseen by the designer. Do not misunderstand me; I am in favor of the goal. We must keep in mind the limitations on what we know how to do reasonably well. But we must not try to only allow the users to do what the designers know how to do well. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (ARPA or UUCP) or hrubin@purccvm.bitnet
mlj8e@dale.acc.Virginia.EDU (Michael L. Johnson) (04/18/88)
In a recent article I was suggested that the most important consideration when deciding on a computer language should be effeciency. As a user, instead of a compiler writer I do not agree with this. As a scientist I am called upon to use my programs on a number of computers. Currently as the University of Virginia we use the following computers: CDC-855, Prime, AT&T 3B's, Sun's, Silicon Graphics, Convex, VAX-UNIX, VAX-VMS, PDP-11-UNIX, and PDP-11-nonUNIX. In any day/week I may use any or all of these computers to run my programs. I am therefore interested in developing computer programs will will run on all of these computers with little, or no, changes. Sure, we want a language which will run our programs fast. However, a few percent increase in speed is not worth translating these programs to a different language. If we get a program in BASIC we would never consider translating it to another language, we would find a computer which will run the BASIC program and use it as is. I would speculate that the only computer language which is available on the above list of computers in a reasonably compatible form is FORTRAN-77. Please note that we are interested in using the computer as a tool, as as such we just want the programs to run correctly. We do not care, too much, if our program is written in the latest greatest language. It is the answer and the minimum amount of our time required to get said answer which are important to us. We do not care about which language is better for a particular computer. (804)-924-2496 Michael L. Johnson mlj8e@virginia.EDU Pharmacology Dept. uunet!virginia!mlj8e Box 448; Univ. of Va. mlj8e@virginia.BITNET Charlottesville, Va. 22908
barmar@think.COM (Barry Margolin) (04/19/88)
In article <753@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: >In article <773@virginia.acc.virginia.edu>, mlj8e@dale.acc.Virginia.EDU (Michael L. Johnson) writes: >> 1) The computer language must be universal. >For a language to be universal, it must be able to handle all instructions, >data types, operations, pseudo-operations, print statement formats, etc., which >are desired. This is impossible, hence the language must allow the addition >of objects unforeseen by the designer. I think you are using a different definition of "universal" than the original author. I interpreted his criterion to mean that the language must be available on all computers that his application is likely to be run on. I'm not sure where your definition of "universal" comes from. At first I thought you might be using it as in "Universal Turing Machine", intending it to mean a language that can be used to compute anything that is computable. However, since most popular languages are Turing-equivalent, and a Turing Machine is universal, this must not be what you mean. Barry Margolin Thinking Machines Corp. barmar@think.com uunet!think!barmar