ZSYJKAA@WYOCDC1.BITNET (Jim Kirkpatrick 307 766-5303) (02/08/88)
Regarding the following, earlier posting -- I tripped over the following this week (VMS 4.6, FORTRAN 4.71-271) ------------------------------------------------------------------------ $TYPE TEST.FOR CHARACTER*8 RED RED = '' END $ FORTRAN TEST %FORT-E-ZERLENSTR, Zero-length string [and so on] What the REAL problem is, is that FORTRAN character variables always have fixed length; if you say RED = 'HI' it really gets padded to RED = 'HI ' (since you declared RED as CHARACTER*8). Thus, you cannot have zero-length character variables. Somebody at DEC (or would it be ANSI) mis-interpreted this as meaning you cannot have zero-length character strings. It seems to me, if I say RED=character-string, and if character-string is less than 8 characters long, it gets padded with (8-size) blanks; why this does not apply if character-string is zero characters long, in which case 8 padding bytes should simply be added, baffles me. Every computer system I've been on has some mentally-retarded feature that doesn't handle some sort of "zero" condition properly. I remember a big argument with SDS (long ago!) that their SORT utility should not abort if the input file contains zero records, it should simply sort all zero records and build an output file containing all zero records in proper order! They insisted sorting zero records was an error condition. If anybody has the ANSI standard handy, could you look up the formal definition of a character string and let us (or me) know if zero-length strings are OK? I'd hate to flame DEC if ANSI deserves it instead.
SCHOMAKE@HNYKUN53.BITNET (Lambert Schomaker) (02/15/88)
To understand the problem one first has to know how character strings are passed in Fortran-77. Contrary to many other languages, they are not passed by mere reference (i.e., "pointer" or "address"). In f77, a character argument is in fact a reference to a special table, the descriptor. In this table, we can find, among other things, the pointer to the location where the data actually are stored. Character argument (points to)-----> Descriptor: [class...][dtype...][length..........] [address.............................] (points to)------> [string.....] The class field (DEC/VMS DSC$B_CLASS) is 8bits, the type field (DSC$B_DTYPE) is 8bits, the length field (DSC$W_MAXSTRLEN) is 16bits which explains the maximum string size of 64kbytes. In fact the character string is just one data type possibly pointed to by such a descriptor. To my knowledge, VAX Fortran-77 only uses descriptors for strings, though. Now the important part: the length is the DECLARED length. What the designers (ANSI?) forgot is that in practice you need USED length most of the time. We are missing a "DSC$W_USDSTRLEN" field. CHARACTER*132 STR STR='FOO' in most applications the trailing 129 blanks are a nuisance There are several solutions to this problem. The dirtiest I have ever seen is falling back to NULL termination, STR='foo'//CHAR(0), and using INDEX(STR,CHAR(0))-1 to find USED string length. Another solution is to get behold of the used length as soon as possible. In the constant assignment the programmer knows 'foo' has three characters. When reading a string one should use the Q format: READ(*,'(Q,A)') LS,STR. The next step is to pass the obtained string to a subroutine in the following way: CALL MAKE_LOWER_CASE(STR(:LS)). This way we make sure the trailing blanks do not require any processing. In the subroutine, LEN(STR) will return the value of LS in the caller. In concatenating strings, we explicitly keep track of the current string size by covarying a separate integer variable: STR='foo'//SUBS(2:K) LS=3+K-2+1 If this is unwanted or boring, use a function LENU(str) which scans a string backward until a non-blank is found and returns the used size. Too bad if you meant some blanks to be there at the tail. About zero-length strings. Using the above rules, a zero length string is a string passed to a subroutine as STR(I1:I2) where I2=I1-1 and I1.GT.0 (otherwise you get ugly memory access violations). CHARACTER*5 STR LS=0 CALL SUB(STR(:LS)) . SUBROUTINE SUB(S) CHARACTER*(*) S WRITE(*,*) LEN(S) will return length of zero. RETURN END It is all a bit kludgy. Nevertheless, I wouldn't want to go back to the old days of FORTRAN-IV string handling. At least f77 allows character functions. Lambert Schomaker, SCHOMAKER@HNYKUN53.BITNET PS Does anybody know if the forthcoming F8x is different in this respect?