[comp.lang.fortran] Char to real

taylor@sun.soe.clarkson.edu (Ross Taylor) (08/24/90)

I am looking for help with a problem converting character strings
to real numbers.

Here is the background to the problem.

I have a binary data file created by a Turbo Pascal program.  The data
records are roughly 700 bytes each made up of a series of 4 byte reals
and 4 byte integers.  I need to read a data record with a FORTRAN
program (the file is unformatted, direct access) as a char string of
full length (approx 700 bytes), then process the string 4 bytes at a
time.  Each 4 byte substring is either a special sequence of bytes
(ASCII 255) indicating that some particular calculation is to be
performed OR string contains the IEEE representation of an integer or
real number and must be converted to one of these two formats (I always
know which). 

My problem is that I cannot figure out a way of doing the data
conversion THAT IS ABSOLUTELY STANDARD FORTRAN 77 AND COMPLETELY
PORTABLE and does not involve any physical disk access.  The code will
be running on a variety of machines (PC, VAX, IBM, Sun) under a variety
of OS's (DOS, VMS, CMS, unix) so portability is extremely important.

I tried using an internal file for the data conversion as shown below:

      FUNCTION TOREAL (STRING)
C
C     Function
C     --------
C
C        To convert a character string to a real number
C
C     Input
C     -----
C
C        STRING - Character string
C
C                 STRING should be four bytes in length.
C
C                 If STRING is not four bytes in length, the string
C                 is replaced with a 4 byte character string, each byte
C                 being ASCII chararacter number 255.
C
C     Output
C     ------
C
C        TOREAL - The real number
C
      CHARACTER STRING*(*), STR4*4, FILE*4
C
C     Check string is four bytes
C
      IF (LEN(STRING) .NE. 4) THEN
         WRITE (*,*) 'Function TOREAL called with invalid argument'
C
C        DO SOMETHING HERE
C
      ELSE
C
         STR4(1:4) = STRING(1:4)
C
C        Read from internal file as a real number
C
         READ (UNIT=STR4(1:4), FMT='(A)') X
C
C        Assign X to the function value
C
         TOREAL = X
C
      ENDIF
C
      RETURN
      END


This code compiles and executes perfectly with WATFOR77 on a PC.  It
also compiles without error using Microsoft Fortran 4.01 and FTN77/386.
However, the last two give errors during execution because there is an
ANSI violation in the READ statement (incompatible format and number).

I feel sure that the char to real data conversion problem has been
addressed many times by others.  Can someone please point me in the
right direction.

Many thanks in advance.

Ross Taylor
Department of Chemical Engineering
Clarkson University, Potsdam, NY 13699
email: taylor@sun.soe.clarkson.edu

maine@elxsi.dfrf.nasa.gov (Richard Maine) (08/28/90)

On 24 Aug 90 13:33:37 GMT, taylor@sun.soe.clarkson.edu (Ross Taylor) said:

Ross> I am looking for help with a problem converting character strings
Ross> to real numbers....

Ross> Each 4 byte substring is either a special sequence of bytes
Ross> (ASCII 255) indicating that some particular calculation is to be
Ross> performed OR string contains the IEEE representation of an integer or
Ross> real number and must be converted to one of these two formats (I always
Ross> know which). 

Ross> My problem is that I cannot figure out a way of doing the data
Ross> conversion THAT IS ABSOLUTELY STANDARD FORTRAN 77 AND COMPLETELY
Ross> PORTABLE and does not involve any physical disk access.  The code will
Ross> be running on a variety of machines (PC, VAX, IBM, Sun) under a variety
Ross> of OS's (DOS, VMS, CMS, unix) so portability is extremely important.

If you are insistent on the parts about being absolutely standard and
completely portable, I'm afraid you are out of luck.  The standard
does not even define the concept of a byte.  You cannot guarantee that
a character is 8 bits.  For instance, old CDC Cyber systems had 6-bit
characters, so you certainly aren't going to store an 8-bit byte in
each of them.  All of the systems you mentioned have 8-bit characters,
so maybe you are ok there, but it is niether a standard nor completely
portable assumption.

Your sample code (omitted) is extremely non-portable.  It has a reasonable
chance of working only on systems that use IEEE floatting point and have
the same byte order as the data file.  That is rather severely
restrictive.  It is possible to write reasonably portable code that
can determine the host system byte order (where applicable; forget it
on systems that don't have "bytes").  Handling more general variations
in floatting point format is really gory.

The standard does not define any guarantee that you can read a binary
file from one system on another system at all.  The trick of reading
the file as direct access does not work on all systems.  On some systems
a direct access file has non-portable special structure.

Note also, if you are really picky about the standard, that Hollerith
is not technically part of the standard.  It is specified only in
appendix C to the standard.  The standard itself explicitly makes
the distinction that the appendices are not part of the standard.
You are using Hollerith when you try to use an "A" format for numeric
data.

I find the "best" approach to this class of problem to be modularization.
It is pretty easy to do subroutines that do the job on various specific
systems.  Thus as long as you isolate the problem to a single small
subroutine, you can provide a version of that subroutine for each
supported system.

The only other plausible approach I can think of is to remove the
dependence on the host floatting point format by extracting the
exponent and mantissa fields from the data and then using normal
host arithmetic to put them together into a floatting point value
expanding out the IEEE definition.  Roughly something like
   value = isign*imantissa*(2.**(iexponent-ibias))
Fill in the details including the hidden bit handling.
If you assume that characters are 8-bit quantities, the iChar intrinsic
should reasonably portably get you an integer in the range 0-255
representing those 8 bits.  Once you have the 4 integers representing
the 4 bytes, it's not too hard to put them together into the sign,
exponent, and mantissa values.

I know the description in the paragraph above is a bit sketchy.  I don't
have actual code handy.  Note that though this can be done "reasonably"
portably, it is neither completely portable nor standard.
--

Richard Maine
maine@elxsi.dfrf.nasa.gov [130.134.64.6]