[comp.lang.c] Say what!?

levy@ttrdc.UUCP (Daniel R. Levy) (02/20/88)

In article <1988Feb17.171813.15472@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
> > Compiler  apparently does not like eof markers that are at the end of
> > the last line in a file.  Error c1004 results. Putting a cr-lf at the
> > end of the file solved the problem.
> 
> Compiler is doing it right, although the complaint is perhaps undesirably
> cryptic.  X3J11:  "A source file that is not empty shall end in a new-line
> character."  (Page 6, line 38, 11 Jan 1988 draft)

Ahem, what does X3J11 have to say about source files on systems (like VMess)
that support "record" text files?  There need be no new line character in them,
but each record ("line") in the file is defined by some other mechanism,
e.g., a byte count prepended to each record.  (Of course the answer should be
that such source files are treated for the purpose of the standard as if they
ended with a newline character, but is this actually an explicit part of the
standard?  And if not, shouldn't it be?  And for that matter, what if, in
the middle of a record in such a file, an actual newline character is found?
How is the compiler supposed to treat that??)
-- 
|------------Dan Levy------------|  Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
|         an Engihacker @        |  	<most AT&T machines>}!ttrdc!ttrda!levy
| AT&T Computer Systems Division |  Disclaimer?  Huh?  What disclaimer???
|--------Skokie, Illinois--------|

gwyn@brl-smoke.ARPA (Doug Gwyn ) (02/21/88)

In article <2183@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes:
>Ahem, what does X3J11 have to say about source files on systems (like VMess)
>that support "record" text files?

Why should X3J11 have to say anything about this?  It is the job of
the implementor to meet the specs.  In fact this particular problem
has been solved many times already by C compiler vendors.

Note that the implementor only has to provide one form of text stream
and one form of binary stream; other file formats could be handled as
extensions.  Most vendors are likely to do whatever they can to support
as many file types as possible, because it will make their customers
happier.  On VMS, for example, RMS can be used to help map strange file
types into a smaller number of regular models.

scjones@sdrc.UUCP (Larry Jones) (02/21/88)

In article <2183@ttrdc.UUCP>, levy@ttrdc.UUCP (Daniel R. Levy) writes:
> In article <1988Feb17.171813.15472@utzoo.uucp>, henry@utzoo.uucp (Henry Spencer) writes:
[ every line in a file must end with '\n', including the last line]
> Ahem, what does X3J11 have to say about source files on systems (like VMess)
> that support "record" text files?  There need be no new line character in them,
> but each record ("line") in the file is defined by some other mechanism,
> e.g., a byte count prepended to each record.  (Of course the answer should be
> that such source files are treated for the purpose of the standard as if they
> ended with a newline character, but is this actually an explicit part of the
> standard?  And if not, shouldn't it be?  And for that matter, what if, in
> the middle of a record in such a file, an actual newline character is found?
> How is the compiler supposed to treat that??)

X3J11 says nothing about how to >implement< the standard; that's up to each
individual implementor.  If your system happens to match the model used by
the standard (like Unix and MS-DOS), then the implementation is easy.  If not
(like VMS and MVS), then you have to do more work.  Certainly the most common
solution is to translate record endings into newlines but this is not always
the right solution (e.g. files with fortran carriage control).  Some OSs
provide enough information to do this right (VMS), others don't.  Some
implementations allow the user to specify the right interpretation, others
don't.  Embedded newlines are a headache.

In any case, the model specified by the standard was carefully chosen so as
to be compatible with existing practice (the Unix model) while avoiding things
which are very hard to get right on some systems.  How do you represent a line
that doesn't end with a newline on a system with record files where record
end is taken as a newline?  How do you represent a zero length line in a
record file when zero length records are invalid?  How do you represent an
empty file when empty files are deleted when closed to avoid "wasting" disk
and directory space?  How do you represent variable length lines in a fixed
length record file?  These are the reasons for the various restrictions in
the standard's file model.

----
Larry Jones                         UUCP: uunet!sdrc!scjones
SDRC                                MAIL: 2000 Eastman Dr., Milford, OH  45150
                                    AT&T: (513) 576-2070
"When all else fails, read the directions."

dhesi@bsu-cs.UUCP (Rahul Dhesi) (02/22/88)

[ every line in a file must end with '\n', including the last line]

For files read and written from a Pascal program, the sequence-of-
characters-terminated-by-a-newline model has existed for some time.  No
doubt OS implementors have already solved the problem, if they were
going to solve it at all, to conform to ISO Pascal.

The VAX/VMS C manual describes the painful translation that goes on
so that the VMS C runtime system can convert between the newline model
and the internal record structure.  Since the compiler is just a
program, the same translation can be done when it reads a source file.

Newlines embedded in records are easy to deal with:  They simply
terminate a line, so the record will appear as two lines.  Any other
behavior contradicts the term "newline".

Under VMS, if you use the EDT editor, an invisible ^M character appears
to be at the end of each line.  You can force EDT to insert an embedded
carriage return by asking it to insert an arbitrary character by ASCII
code, and then typing 13.  But when you manipulate text (e.g. move text
from one place to another in the file), EDT translates the embedded
carriage return into a record terminator and the result is that you get
two lines and the embedded carriage return is gone.  (Disclaimer:  this
was so when I last checked.  VMS evolves rapidly so your version may
behave differently.)

OS implementors, use record files if you must, but please also allow
the user to create unstructured, newline-terminated files.  Else you
and your users are going to suffer as more and more programming
languages, to be portable, specify this type of file behavior.  You can
fight it just to be different, or you can gracefully give in and
support a universal text format.

Did you all know that DEC *is* giving in?  Its stream-LF format files
not only behave like UNIX-style newline-terminated text text files, but
are also freely executable without any protest from the operating
system.  DEC's C compiler is even more forgiving:  it will happily
compile C source files that are in stream-LF format *and* have a
spurious carriage return character at the end of each line of text.
IBM, as always, will take a little longer.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi