[comp.lang.fortran] Fortran 88 Parameterized Data Types

brainerd@unmvax.unm.edu (Walt Brainerd) (10/24/88)

After climbing to Sandia crest on a beautiful day yesterday
and going to an NMSO concert of Mozart and Stravinsky last night,
I am ready to forget about all the political garbage that appeared
recently and provide some technical information for those interested.

Comments from the public on the draft proposed Fortran 88 included
the following:

1.  The features for control of numeric precision are too complicated.

2.  A facility for manipulation of bits is needed.

3.  A "short integer" (and maybe a "long integer") data type is needed.

4.  A feature to handle multiple character sets, particularly those
    with large numbers of characters, is needed.

A single unified scheme proposed and accepted by ISO/WG5 in Paris
solves, to a large degree, each of these problems.  The solution is not
particularly elegant and certainly is not like Pascal or C, but should
be easy to learn and easy to implement.  A sketch of this scheme follows.

Each intrinsic data type (REAL, INTEGER, LOGICAL, and CHARACTER) has a
parameter, called its KIND, associated with it.  A KIND is intended to
designate a machine representation for a particular data type.

As an example, an implementation might have three REAL kinds, informally known
as single, double, and quadruple precision.  In Fortran 88 (as proposed),
the KIND is an integer, much like an I/O unit number; these numbers are
processor dependent, so that KIND=1, 2, and 4 might be single, double,
and quadruple precision, or so that KIND=4, 8, and 16 also could be used.
The requirements are that there must be two kinds, representing default
real and double precision, and that a representation with a larger KIND
value must have greater precision.  Note that the value of the KIND number
has nothing to do with the number of decimal digits of precision, as was
the case with the original proposed precision scheme.  One reason that this
new proposal simplifies the implementation is that it is up to the user to
make sure that actual and dummy arguments must match; there is no "passed in"
precision.

There is an intrinsic function SELECTED_REAL_KIND that produces the
smallest kind value whose representation has at least a certain precision
and range.  For example SELECTED_REAL_KIND (8, 70) will produce a kind
(if there is one) that has at least 8 decimal digits of precision and
allows values between 10 ** -70 and 10 ** +70.  This permits the programmer
to select representations having required precision or range, albeit in a
somewhat less direct manner than in the previous proposal.  Parameters
can be used effectively here:

PARAMETER (MY_PRECISION = SLECTED_REAL_KIND (8, 70))
REAL (KIND = MY_PRECISION) :: X, Y, Z

Of course, this can be even more effective when appearing in a module
that, in effect, defines a real data type with MY_PRECISION.  Then those
declarations can be included into any program unit, ensuring that required
declarations will be identical.

For the integer data type, things are pretty much the same, except that there
is only argument for the SELECTED_INT_KIND intrinsic.  For example,
SELECTED_INT_KIND (5) produces an integer type allowing representation
of all integers between 10 ** -5 and 10 ** +5.

For the CHARACTER data type, the implementation may include as many kinds
as there are character types supported.  This is an advantage over the
NCHARACTER (national character) proposal previously considered, in that
it allows for only one character set in addition to the default.  It may
well be that you want to have Chinese, Japanese, Arabic, and chemical symbol
character sets all in one program.  With this proposal you can, if you can
get your vendor to support them all.

Constants provide a problem.  There must be a way to specify constants of
various kinds in order to pass them as actual arguments and have them match
the dummy argument.  It has been proposed to put and underscore and a kind
after a constant (all of which is optional, of course).  To illustrate this
with characters:

PARAMETER (NAME_LENGTH = 20, KANJI = 3)
CHARACTER (LEN = NAME_LENGTH, KIND = KANJI)
   ...
NAME = '%*(#^&$'_KANJI

(Sorry if that doesn't look like Kanji on your screen.)

This proposal would allow some character strings to be stored in the standard
ASCII and some in EBCDIC if that were desirable.

PARAMETER (ASCII = 1, EBCDIC = 13)
CHARACTER (LEN = NAME_LENGTH, KIND = ASCII) :: NAME_A
CHARACTER (LEN = NAME_LENGTH, KIND = EBCDIC) :: NAME_E
   ...
NAME_A = 'GEORGE'_ASCII;  NAME_E = 'GEORGE'_EBCDIC

For the LOGICAL data type, the default kind and at least one other (KIND=1)
must be implemented.  There are no storage association requirements for
KIND=1, so the values may be stored as one bit, if desired, and as many bits
as is feasible may be stored in a word.  An implementation on a machine that
is byte addressable may make available a LOGICAL kind for which each bit
is stored in a byte, and a bit-addressable machine or one with excellent
shifting and masking instructions may have a LOGICAL kind for which
there is a value stored in each bit of a word or byte.  And, of course,
all of the above may be available in one implementation.  What is usually
thought of as a "word" of bits can be represented as an array of logicals
of KIND=1.  Of course, the .AND., .OR., and .NOT. operations are already
available for such arrays and so can be used to implement "bit" manipulations.
The proposal also includes ways of writing arrays of logicals ("bit strings")
as hexadecimal, octal, and binary constants.  It also includes B, O, and Z
edit descriptors.

Well, this is one thing that the awful "gang of five" was doing to respond
in a positive vein to the public comments while X3J3 was debating who should
be allowed to tell what to whom.  Let's have a rational discussion of this
and other technical issues.  Another important thing that was done was add
pointers; if you would like to see a sketch of the pointer feature proposed,
say so, and it might be arranged.  Most of the members of X3J3 seemed to like
the pointer proposal when it was presented.

=============================================================================
Walt Brainerd, Unicomp, Inc., 505/275-0800, brainerd@unmvax.unm.edu

ok@quintus.uucp (Richard A. O'Keefe) (10/25/88)

In article <2066@unmvax.unm.edu> brainerd@unmvax.unm.edu (Walt Brainerd) writes:
>Each intrinsic data type (REAL, INTEGER, LOGICAL, and CHARACTER) has a
>parameter, called its KIND, associated with it.  A KIND is intended to
>designate a machine representation for a particular data type.
>There is an intrinsic function SELECTED_REAL_KIND that produces the
>smallest kind value whose representation has at least a certain precision
>and range.  For example SELECTED_REAL_KIND (8, 70) will produce a kind
>(if there is one) that has at least 8 decimal digits of precision and
>allows values between 10 ** -70 and 10 ** +70.
>For the integer data type, things are pretty much the same, except that there
>is only argument for the SELECTED_INT_KIND intrinsic.  For example,
>SELECTED_INT_KIND (5) produces an integer type allowing representation
>of all integers between 10 ** -5 and 10 ** +5.

This is very PL/I-ish.
	DECLARE I DECIMAL FIXED(5,0);

There are several things I dislike about this proposal.  I'll concentrate
on the integer case, because that is simpler.

(1) It is too easy to make things machine-dependent when you didn't really
    mean to.  For example,
	INTEGER(KIND=1) I
    instead of
	INTEGER(KIND=SELECTED_INT_KIND(6)) I
    Let's face it, the latter is so clumsy that people using one specific
    machine are likely to regard the former as more readable.  It would
    be tempting for a vendor to provide
	INTEGER(KIND=1)	== INTEGER*1
	INTEGER(KIND=2) == INTEGER*2
	INTEGER(KIND=4) == INTEGER*4
    to make it easy for people to convert programs to the new style, and
    they are likely to stick with that translation.

    Having used COBOL, with COMPUTATIONAL-1, COMPUTATIONAL-2, and so on,
    it is difficult for me to regard this KIND= proposal with anything
    other than distrust and loathing.

(2) Like PL/I, this notation does not let you say what you really mean.
    For example, suppose I want an integer which can represent numbers
    in the range 0..120 (factorial 6).  If I could say
	INTEGER(LOW=0,HIGH=120)
    that would convey my intention precisely, and a compiler might notice
    that 7 or 8 bits will suffice.  But if I have to say
	INTEGER(KIND=SELECTED_INT_KIND(3))
    -- 2 is too small -- the compiler has to find a type big enough to
    hold -1000..1000.  Now I might want to say more than this; for
    example I really might want to say
	INTEGER(LOW=0, HIGH=120, DIVISOR=2)
    But an interval is a good compromise, and saying
	SUBROUTINE FOO(N, A, B)
	    PARAMETER (NMAX = 3000)
	    INTEGER (LOW = 0, HIGH = NMAX) N
	    INTEGER (LOW = 0, HIGH = NMAX*NMAX) K
    looks like a good idea to me.  (I can think of one computer where
    a Fortran compiler would benefit from having a more precise idea
    of the subscript range of an array than "16 bits".)

    It cannot be significantly more complex to support LOW=..HIGH=..
    parameters for INTEGER than one KIND=.., the compiler has only
    to determine SELECTED_INT_KIND(CEILING(LOG10(MAX(ABS(LOW),ABS(HIGH)))))
    or whatever.

(3) Which brings me to my third point, which is that because the KIND=
    notation requires the programmer to do this calculation (and offhand
    I'm not sure it's right), it facilitates error.  In the example above,
    we would have to write
	    INTEGER (KIND = SELECTED_INT_KIND(4)) N
	    INTEGER (KIND = SELECTED_INT_KIND(7)) N
    but the temptation is to think that the 4 must be doubled, giving 8.
    That might be too big for some machines.  (3000**2 = 9000000, which
    fits in 9 decimal digits.)  Worse, because the argument of SELECTED_
    INT_KIND is in terms of the decimal logarithm of the numbers, there
    is a temptation to leave out that step, as I did.  If you want your
    program to keep on working when the bounds change, you have to write

	INTEGER (KIND = SELECTED_INT_KIND(CEILING(LOG10(NMAX)))) N
	INTEGER (KIND = SELECTED_INT_KIND(CEILING(2*LOG10(NMAX)))) K

    It is *much* better to have the compiler do this calculation!
    And isn't it particularly silly to force the programmer to do
    base-10 logarithms in his head when the machine is binary?

Summary:
    The INTEGER(KIND=...) notation is more difficult to use than a Pascal-
    or Ada-like INTEGER(LOW=...,HIGH=...) notation and is more likely to
    help programmers introduce errors and unintended machine-dependence.


Perhaps the genuine numerical analysts reading this newsgroup would care
to comment on the REAL(...) proposal.  My limited experience suggests that
it would be easier to specify expressions and say "this type needs to be
able to represent numbers this big, and to distinguish numbers with
relative difference this small".  For example,
	REAL (BOUND = NMAX**6, RELERR = NMAX**-3)
Now it is possible to write expressions which convert this to number of
decimal digits (rather a silly thing to do when computers are base 16
[IBM 370], base 8 [Unisys A-series], or base 2 [most]) and decimal
exponent range, but it is rather hairy, and the compiler can do that more
easily than I can.