taylor@sun.soe.clarkson.edu (Ross Taylor) (08/24/90)
I am looking for help with a problem converting character strings
to real numbers.
Here is the background to the problem.
I have a binary data file created by a Turbo Pascal program.  The data
records are roughly 700 bytes each made up of a series of 4 byte reals
and 4 byte integers.  I need to read a data record with a FORTRAN
program (the file is unformatted, direct access) as a char string of
full length (approx 700 bytes), then process the string 4 bytes at a
time.  Each 4 byte substring is either a special sequence of bytes
(ASCII 255) indicating that some particular calculation is to be
performed OR string contains the IEEE representation of an integer or
real number and must be converted to one of these two formats (I always
know which). 
My problem is that I cannot figure out a way of doing the data
conversion THAT IS ABSOLUTELY STANDARD FORTRAN 77 AND COMPLETELY
PORTABLE and does not involve any physical disk access.  The code will
be running on a variety of machines (PC, VAX, IBM, Sun) under a variety
of OS's (DOS, VMS, CMS, unix) so portability is extremely important.
I tried using an internal file for the data conversion as shown below:
      FUNCTION TOREAL (STRING)
C
C     Function
C     --------
C
C        To convert a character string to a real number
C
C     Input
C     -----
C
C        STRING - Character string
C
C                 STRING should be four bytes in length.
C
C                 If STRING is not four bytes in length, the string
C                 is replaced with a 4 byte character string, each byte
C                 being ASCII chararacter number 255.
C
C     Output
C     ------
C
C        TOREAL - The real number
C
      CHARACTER STRING*(*), STR4*4, FILE*4
C
C     Check string is four bytes
C
      IF (LEN(STRING) .NE. 4) THEN
         WRITE (*,*) 'Function TOREAL called with invalid argument'
C
C        DO SOMETHING HERE
C
      ELSE
C
         STR4(1:4) = STRING(1:4)
C
C        Read from internal file as a real number
C
         READ (UNIT=STR4(1:4), FMT='(A)') X
C
C        Assign X to the function value
C
         TOREAL = X
C
      ENDIF
C
      RETURN
      END
This code compiles and executes perfectly with WATFOR77 on a PC.  It
also compiles without error using Microsoft Fortran 4.01 and FTN77/386.
However, the last two give errors during execution because there is an
ANSI violation in the READ statement (incompatible format and number).
I feel sure that the char to real data conversion problem has been
addressed many times by others.  Can someone please point me in the
right direction.
Many thanks in advance.
Ross Taylor
Department of Chemical Engineering
Clarkson University, Potsdam, NY 13699
email: taylor@sun.soe.clarkson.edumaine@elxsi.dfrf.nasa.gov (Richard Maine) (08/28/90)
On 24 Aug 90 13:33:37 GMT, taylor@sun.soe.clarkson.edu (Ross Taylor) said: Ross> I am looking for help with a problem converting character strings Ross> to real numbers.... Ross> Each 4 byte substring is either a special sequence of bytes Ross> (ASCII 255) indicating that some particular calculation is to be Ross> performed OR string contains the IEEE representation of an integer or Ross> real number and must be converted to one of these two formats (I always Ross> know which). Ross> My problem is that I cannot figure out a way of doing the data Ross> conversion THAT IS ABSOLUTELY STANDARD FORTRAN 77 AND COMPLETELY Ross> PORTABLE and does not involve any physical disk access. The code will Ross> be running on a variety of machines (PC, VAX, IBM, Sun) under a variety Ross> of OS's (DOS, VMS, CMS, unix) so portability is extremely important. If you are insistent on the parts about being absolutely standard and completely portable, I'm afraid you are out of luck. The standard does not even define the concept of a byte. You cannot guarantee that a character is 8 bits. For instance, old CDC Cyber systems had 6-bit characters, so you certainly aren't going to store an 8-bit byte in each of them. All of the systems you mentioned have 8-bit characters, so maybe you are ok there, but it is niether a standard nor completely portable assumption. Your sample code (omitted) is extremely non-portable. It has a reasonable chance of working only on systems that use IEEE floatting point and have the same byte order as the data file. That is rather severely restrictive. It is possible to write reasonably portable code that can determine the host system byte order (where applicable; forget it on systems that don't have "bytes"). Handling more general variations in floatting point format is really gory. The standard does not define any guarantee that you can read a binary file from one system on another system at all. The trick of reading the file as direct access does not work on all systems. On some systems a direct access file has non-portable special structure. Note also, if you are really picky about the standard, that Hollerith is not technically part of the standard. It is specified only in appendix C to the standard. The standard itself explicitly makes the distinction that the appendices are not part of the standard. You are using Hollerith when you try to use an "A" format for numeric data. I find the "best" approach to this class of problem to be modularization. It is pretty easy to do subroutines that do the job on various specific systems. Thus as long as you isolate the problem to a single small subroutine, you can provide a version of that subroutine for each supported system. The only other plausible approach I can think of is to remove the dependence on the host floatting point format by extracting the exponent and mantissa fields from the data and then using normal host arithmetic to put them together into a floatting point value expanding out the IEEE definition. Roughly something like value = isign*imantissa*(2.**(iexponent-ibias)) Fill in the details including the hidden bit handling. If you assume that characters are 8-bit quantities, the iChar intrinsic should reasonably portably get you an integer in the range 0-255 representing those 8 bits. Once you have the 4 integers representing the 4 bytes, it's not too hard to put them together into the sign, exponent, and mantissa values. I know the description in the paragraph above is a bit sketchy. I don't have actual code handy. Note that though this can be done "reasonably" portably, it is neither completely portable nor standard. -- Richard Maine maine@elxsi.dfrf.nasa.gov [130.134.64.6]