brainerd@unmvax.unm.edu (Walt Brainerd) (10/24/88)
After climbing to Sandia crest on a beautiful day yesterday and going to an NMSO concert of Mozart and Stravinsky last night, I am ready to forget about all the political garbage that appeared recently and provide some technical information for those interested. Comments from the public on the draft proposed Fortran 88 included the following: 1. The features for control of numeric precision are too complicated. 2. A facility for manipulation of bits is needed. 3. A "short integer" (and maybe a "long integer") data type is needed. 4. A feature to handle multiple character sets, particularly those with large numbers of characters, is needed. A single unified scheme proposed and accepted by ISO/WG5 in Paris solves, to a large degree, each of these problems. The solution is not particularly elegant and certainly is not like Pascal or C, but should be easy to learn and easy to implement. A sketch of this scheme follows. Each intrinsic data type (REAL, INTEGER, LOGICAL, and CHARACTER) has a parameter, called its KIND, associated with it. A KIND is intended to designate a machine representation for a particular data type. As an example, an implementation might have three REAL kinds, informally known as single, double, and quadruple precision. In Fortran 88 (as proposed), the KIND is an integer, much like an I/O unit number; these numbers are processor dependent, so that KIND=1, 2, and 4 might be single, double, and quadruple precision, or so that KIND=4, 8, and 16 also could be used. The requirements are that there must be two kinds, representing default real and double precision, and that a representation with a larger KIND value must have greater precision. Note that the value of the KIND number has nothing to do with the number of decimal digits of precision, as was the case with the original proposed precision scheme. One reason that this new proposal simplifies the implementation is that it is up to the user to make sure that actual and dummy arguments must match; there is no "passed in" precision. There is an intrinsic function SELECTED_REAL_KIND that produces the smallest kind value whose representation has at least a certain precision and range. For example SELECTED_REAL_KIND (8, 70) will produce a kind (if there is one) that has at least 8 decimal digits of precision and allows values between 10 ** -70 and 10 ** +70. This permits the programmer to select representations having required precision or range, albeit in a somewhat less direct manner than in the previous proposal. Parameters can be used effectively here: PARAMETER (MY_PRECISION = SLECTED_REAL_KIND (8, 70)) REAL (KIND = MY_PRECISION) :: X, Y, Z Of course, this can be even more effective when appearing in a module that, in effect, defines a real data type with MY_PRECISION. Then those declarations can be included into any program unit, ensuring that required declarations will be identical. For the integer data type, things are pretty much the same, except that there is only argument for the SELECTED_INT_KIND intrinsic. For example, SELECTED_INT_KIND (5) produces an integer type allowing representation of all integers between 10 ** -5 and 10 ** +5. For the CHARACTER data type, the implementation may include as many kinds as there are character types supported. This is an advantage over the NCHARACTER (national character) proposal previously considered, in that it allows for only one character set in addition to the default. It may well be that you want to have Chinese, Japanese, Arabic, and chemical symbol character sets all in one program. With this proposal you can, if you can get your vendor to support them all. Constants provide a problem. There must be a way to specify constants of various kinds in order to pass them as actual arguments and have them match the dummy argument. It has been proposed to put and underscore and a kind after a constant (all of which is optional, of course). To illustrate this with characters: PARAMETER (NAME_LENGTH = 20, KANJI = 3) CHARACTER (LEN = NAME_LENGTH, KIND = KANJI) ... NAME = '%*(#^&$'_KANJI (Sorry if that doesn't look like Kanji on your screen.) This proposal would allow some character strings to be stored in the standard ASCII and some in EBCDIC if that were desirable. PARAMETER (ASCII = 1, EBCDIC = 13) CHARACTER (LEN = NAME_LENGTH, KIND = ASCII) :: NAME_A CHARACTER (LEN = NAME_LENGTH, KIND = EBCDIC) :: NAME_E ... NAME_A = 'GEORGE'_ASCII; NAME_E = 'GEORGE'_EBCDIC For the LOGICAL data type, the default kind and at least one other (KIND=1) must be implemented. There are no storage association requirements for KIND=1, so the values may be stored as one bit, if desired, and as many bits as is feasible may be stored in a word. An implementation on a machine that is byte addressable may make available a LOGICAL kind for which each bit is stored in a byte, and a bit-addressable machine or one with excellent shifting and masking instructions may have a LOGICAL kind for which there is a value stored in each bit of a word or byte. And, of course, all of the above may be available in one implementation. What is usually thought of as a "word" of bits can be represented as an array of logicals of KIND=1. Of course, the .AND., .OR., and .NOT. operations are already available for such arrays and so can be used to implement "bit" manipulations. The proposal also includes ways of writing arrays of logicals ("bit strings") as hexadecimal, octal, and binary constants. It also includes B, O, and Z edit descriptors. Well, this is one thing that the awful "gang of five" was doing to respond in a positive vein to the public comments while X3J3 was debating who should be allowed to tell what to whom. Let's have a rational discussion of this and other technical issues. Another important thing that was done was add pointers; if you would like to see a sketch of the pointer feature proposed, say so, and it might be arranged. Most of the members of X3J3 seemed to like the pointer proposal when it was presented. ============================================================================= Walt Brainerd, Unicomp, Inc., 505/275-0800, brainerd@unmvax.unm.edu
ok@quintus.uucp (Richard A. O'Keefe) (10/25/88)
In article <2066@unmvax.unm.edu> brainerd@unmvax.unm.edu (Walt Brainerd) writes: >Each intrinsic data type (REAL, INTEGER, LOGICAL, and CHARACTER) has a >parameter, called its KIND, associated with it. A KIND is intended to >designate a machine representation for a particular data type. >There is an intrinsic function SELECTED_REAL_KIND that produces the >smallest kind value whose representation has at least a certain precision >and range. For example SELECTED_REAL_KIND (8, 70) will produce a kind >(if there is one) that has at least 8 decimal digits of precision and >allows values between 10 ** -70 and 10 ** +70. >For the integer data type, things are pretty much the same, except that there >is only argument for the SELECTED_INT_KIND intrinsic. For example, >SELECTED_INT_KIND (5) produces an integer type allowing representation >of all integers between 10 ** -5 and 10 ** +5. This is very PL/I-ish. DECLARE I DECIMAL FIXED(5,0); There are several things I dislike about this proposal. I'll concentrate on the integer case, because that is simpler. (1) It is too easy to make things machine-dependent when you didn't really mean to. For example, INTEGER(KIND=1) I instead of INTEGER(KIND=SELECTED_INT_KIND(6)) I Let's face it, the latter is so clumsy that people using one specific machine are likely to regard the former as more readable. It would be tempting for a vendor to provide INTEGER(KIND=1) == INTEGER*1 INTEGER(KIND=2) == INTEGER*2 INTEGER(KIND=4) == INTEGER*4 to make it easy for people to convert programs to the new style, and they are likely to stick with that translation. Having used COBOL, with COMPUTATIONAL-1, COMPUTATIONAL-2, and so on, it is difficult for me to regard this KIND= proposal with anything other than distrust and loathing. (2) Like PL/I, this notation does not let you say what you really mean. For example, suppose I want an integer which can represent numbers in the range 0..120 (factorial 6). If I could say INTEGER(LOW=0,HIGH=120) that would convey my intention precisely, and a compiler might notice that 7 or 8 bits will suffice. But if I have to say INTEGER(KIND=SELECTED_INT_KIND(3)) -- 2 is too small -- the compiler has to find a type big enough to hold -1000..1000. Now I might want to say more than this; for example I really might want to say INTEGER(LOW=0, HIGH=120, DIVISOR=2) But an interval is a good compromise, and saying SUBROUTINE FOO(N, A, B) PARAMETER (NMAX = 3000) INTEGER (LOW = 0, HIGH = NMAX) N INTEGER (LOW = 0, HIGH = NMAX*NMAX) K looks like a good idea to me. (I can think of one computer where a Fortran compiler would benefit from having a more precise idea of the subscript range of an array than "16 bits".) It cannot be significantly more complex to support LOW=..HIGH=.. parameters for INTEGER than one KIND=.., the compiler has only to determine SELECTED_INT_KIND(CEILING(LOG10(MAX(ABS(LOW),ABS(HIGH))))) or whatever. (3) Which brings me to my third point, which is that because the KIND= notation requires the programmer to do this calculation (and offhand I'm not sure it's right), it facilitates error. In the example above, we would have to write INTEGER (KIND = SELECTED_INT_KIND(4)) N INTEGER (KIND = SELECTED_INT_KIND(7)) N but the temptation is to think that the 4 must be doubled, giving 8. That might be too big for some machines. (3000**2 = 9000000, which fits in 9 decimal digits.) Worse, because the argument of SELECTED_ INT_KIND is in terms of the decimal logarithm of the numbers, there is a temptation to leave out that step, as I did. If you want your program to keep on working when the bounds change, you have to write INTEGER (KIND = SELECTED_INT_KIND(CEILING(LOG10(NMAX)))) N INTEGER (KIND = SELECTED_INT_KIND(CEILING(2*LOG10(NMAX)))) K It is *much* better to have the compiler do this calculation! And isn't it particularly silly to force the programmer to do base-10 logarithms in his head when the machine is binary? Summary: The INTEGER(KIND=...) notation is more difficult to use than a Pascal- or Ada-like INTEGER(LOW=...,HIGH=...) notation and is more likely to help programmers introduce errors and unintended machine-dependence. Perhaps the genuine numerical analysts reading this newsgroup would care to comment on the REAL(...) proposal. My limited experience suggests that it would be easier to specify expressions and say "this type needs to be able to represent numbers this big, and to distinguish numbers with relative difference this small". For example, REAL (BOUND = NMAX**6, RELERR = NMAX**-3) Now it is possible to write expressions which convert this to number of decimal digits (rather a silly thing to do when computers are base 16 [IBM 370], base 8 [Unisys A-series], or base 2 [most]) and decimal exponent range, but it is rather hairy, and the compiler can do that more easily than I can.