[comp.os.vms] FORT vs. zero-length strings

ZSYJKAA@WYOCDC1.BITNET (Jim Kirkpatrick 307 766-5303) (02/08/88)

Regarding the following, earlier posting --

     I tripped over the following this week (VMS 4.6, FORTRAN 4.71-271)
     ------------------------------------------------------------------------
     $TYPE TEST.FOR
         CHARACTER*8 RED
         RED = ''
         END
     $ FORTRAN TEST
     %FORT-E-ZERLENSTR, Zero-length string
[and so on]

What the REAL problem is, is that FORTRAN character variables always have
fixed length; if you say RED = 'HI' it really gets padded to RED = 'HI      '
(since you declared RED as CHARACTER*8).  Thus, you cannot have zero-length
character variables.  Somebody at DEC (or would it be ANSI) mis-interpreted
this as meaning you cannot have zero-length character strings.  It seems to
me, if I say RED=character-string, and if character-string is less than 8
characters long, it gets padded with (8-size) blanks;  why this does not apply
if character-string is zero characters long, in which case 8 padding bytes
should simply be added, baffles me.  Every computer system I've been on has
some mentally-retarded feature that doesn't handle some sort of "zero"
condition properly.  I remember a big argument with SDS (long ago!) that
their SORT utility should not abort if the input file contains zero records,
it should simply sort all zero records and build an output file containing
all zero records in proper order!  They insisted sorting zero records was an
error condition.

If anybody has the ANSI standard handy, could you look up the formal definition
of a character string and let us (or me) know if zero-length strings are OK?
I'd hate to flame DEC if ANSI deserves it instead.

SCHOMAKE@HNYKUN53.BITNET (Lambert Schomaker) (02/15/88)

To understand the problem one first has to know how character strings are
passed in Fortran-77. Contrary to many other languages, they are not passed
by mere reference (i.e., "pointer" or "address"). In f77, a character argument
is in fact a reference to a special table, the descriptor. In this table, we
can find, among other things, the pointer to the location where the data
actually are stored. Character argument (points to)----->
Descriptor:
     [class...][dtype...][length..........]
     [address.............................]   (points to)------> [string.....]

The class field (DEC/VMS DSC$B_CLASS) is 8bits, the type field (DSC$B_DTYPE) is
8bits, the length field (DSC$W_MAXSTRLEN) is 16bits which explains the maximum
string size of 64kbytes. In fact the character string is just one data type
possibly pointed to by such a descriptor. To my knowledge, VAX Fortran-77 only
uses descriptors for strings, though.

Now the important part: the length is the DECLARED length. What the designers
(ANSI?) forgot is that in practice you need USED length most of the time.
We are missing a "DSC$W_USDSTRLEN" field.

CHARACTER*132 STR
STR='FOO'            in most applications the trailing 129 blanks are a nuisance

There are several solutions to this problem. The dirtiest I have ever seen is
falling back to NULL termination, STR='foo'//CHAR(0), and using
INDEX(STR,CHAR(0))-1 to find USED string length. Another solution is to get
behold of the used length as soon as possible. In the constant assignment the
programmer knows 'foo' has three characters. When reading a string one should
use the Q format: READ(*,'(Q,A)') LS,STR. The next step is to pass the obtained
string to a subroutine in the following way: CALL MAKE_LOWER_CASE(STR(:LS)).
This way we make sure the trailing blanks do not require any processing.
In the subroutine, LEN(STR) will return the value of LS in the caller.
In concatenating strings, we explicitly keep track of the current string size
by covarying a separate integer variable:

       STR='foo'//SUBS(2:K)
       LS=3+K-2+1

If this is unwanted or boring, use a function LENU(str) which scans a string
backward until a non-blank is found and returns the used size. Too bad if you
meant some blanks to be there at the tail.

About zero-length strings.
Using the above rules, a zero length string is a string passed to a subroutine
as STR(I1:I2) where I2=I1-1 and I1.GT.0 (otherwise you get ugly memory access
violations).

        CHARACTER*5 STR
        LS=0
        CALL SUB(STR(:LS))
               .
        SUBROUTINE SUB(S)
        CHARACTER*(*) S
        WRITE(*,*) LEN(S)              will return length of zero.
        RETURN
        END

It is all a bit kludgy. Nevertheless, I wouldn't want to go back to the
old days of FORTRAN-IV string handling. At least f77 allows character functions.

                         Lambert Schomaker,           SCHOMAKER@HNYKUN53.BITNET

PS Does anybody know if the forthcoming F8x is different in this respect?