[comp.lang.fortran] Fortran File Formats Survey

cdb@hpclcdb.HP.COM (Carl Burch) (07/07/88)

     On systems like UN*X and MS-DOS with byte-stream file systems, the 
Fortran I/O library has to impose a data file format to support Fortran's 
record-oriented file model.  On HP-UX, these take the following forms :

Sequential Formatted file format :
     ASCII files delimited with the newline character (ACSII 10 decimal).

Sequential Unformatted file format :
     Binary data preceded and followed by four bytes holding the record 
     length (in bytes).  The "green word" at the end is necessary to 
     BACKSPACE the file correctly.

Direct Formatted file format :
     ASCII fixed-length records not physically separated.  Unwritten 
     bytes in the record are padded with blanks.

Direct Unformatted file format :
     Binary fixed-length records not physically separated.  Unwritten 
     bytes in the record are padded with zero bytes (ASCII Nulls).

     Bell's f77(1) compiler uses this scheme as well.

     On MS-DOS, (at least my copy of) Microsoft Fortran uses the above 
formats except that Direct Unformatted files are also padded with blanks 
and the Sequential Unformatted format uses only a one-byte "green word" to 
hold the length of each record. In the latter case, there is an escape 
value saying that the following record is full to the max (256?) and there 
will be following records.

     Given this much similarity, I wonder if we may have a de facto standard 
evolving here.  If we could do something about the data format in binary
files (e.g., the IEEE floating point format), it might be possible to use
systems like NFS considerably more transparently than currently possible.

    I'd like examples of other byte-stream file systems' Fortran compilers'
solutions to this problem.  Are they as similar as those around my shop?

							Carl Burch

corbett@beatnix.UUCP (Bob Corbett) (07/12/88)

In article <6690019@hpclcdb.HP.COM> cdb@hpclcdb.HP.COM (Carl Burch) writes:
>
>     Given this much similarity, I wonder if we may have a de facto standard 
>evolving here.  If we could do something about the data format in binary
>files (e.g., the IEEE floating point format), it might be possible to use
>systems like NFS considerably more transparently than currently possible.
>
>    I'd like examples of other byte-stream file systems' Fortran compilers'
>solutions to this problem.  Are they as similar as those around my shop?
>
>							Carl Burch

    I wish that FORTRAN file formats were as similar as Mr. Burch has so far
found them to be.  The fact that AT&T's f77 and HP-UX's f77 use the same file
formats comes as no surprise since HP's implementation was derived from AT&T's.
The AT&T file formats probably are the de facto standard for UNIX FORTRAN
implementations.  However, there are irritating variations even among UNIX
FORTRANs.  One such variation is that the size of the count fields for
sequential unformatted records are different on various machines.  For example,
f77 on the PDP-11 uses a 16-bit count field, while f77 on the VAX uses 32-bit
fields.

    Other annoying differences arise for sequential formatted files.  One
variation I has seen is to use the escape character (ASCII 27) to escape
characters.  In particular, an escape followed by a new-line character is
treated as a new-line character within the a record rather than an escape
followed by the end of record.  An escape followed by an escape is treated
as a single escape.  I believe the reason for this convention is that people
used A format to write binary data.  However, a case can be made that it
should be possible to write any character in the processor character set under
an A format.

    Another variation is to use a control character to denote end of file.
I used to believe that this variation arose to avoid having to truncate files
on close (a painful operation on System V based UNIX systems).  However, I
now believe it is done to emulate VMS FORTRAN.

    Operating systems other than UNIX use a wide variety of FORTRAN file
formats.  VMS FORTRAN features four major file formats:  fixed, variable,
segmented, and stream.  The file format to be used can be specified in the
OPEN statement at the time the file is created.  An existing file's format
is known to the OS.  A fixed-length record file consists of fixed-length
records with no physical separators.  A variable-length record file consists
of a byte count follow by the data record.  The byte count is two bytes long
for disk files and four bytes for tapes.  A variable-length record file
opened for relative access is stored on disk as a fixed-length record file.
Segmented record files are basically variable-length record files plus
control information that allows a single logical record to be stored as
one or more variable-length records.  Stream-type files use characters to
indicate end of record.  Stream-type files come in three varieties.  One
form uses a carriage return followed by a line feed to terminate records.
The other two use either carriage return alone or line feed alone to terminate
records.  Sequential formatted files and sequential unformatted segmented files
can contain embedded end of file records.  An end of file record consists
of a one-byte record containing a sub character (ASCII 26).  Sequential
formatted files may use any of the four file formats.  The default format for
sequential formatted files is variable.

    CDC FORTRAN for the CYBER 170 series uses four file formats:  Z, W, U, and
S.  There are four additional file formats (F, R, D, and T) that are not
commonly used.  CDC files consist of 60-bit words.  A Z file indicates end of
record by 12 zero bits in the low-order part of a word.  A record may have to
be padded.  If a record ends with blanks, those blanks may be trimmed.  A W
file precedes each data record with a control word which contains the length
of the data record.  U and S files are record manager files.  The record
manager stores the location and length of the data records apart from the
data records themselves apart from themselves.

    I have a description of IBM's file formats, but it is too complex to be
worth describing in detail.  Suffice it to say that the length of variable-
length records of all types are indicated by counts.

						Robert Paul Corbett
						ucbvax!sun!elxsi!corbett
						uunet!elxsi!corbett