[comp.text] newline indicator

tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) (02/12/91)

The world's three most important operating systems-- Unix, MS-DOS,
and MacOS-- all employ different methods for separating lines from
each other.  Unix uses linefeed only, MS-DOS uses carriage return
and linefeed, and MacOS uses carriage return only.

My question is this: do any standards specify how lines should be
kept apart?  That is, do any of these three operating systems have
any justification (other than space savings in the case of Unix and
MacOS) for doing things they way they did?

henry@zoo.toronto.edu (Henry Spencer) (02/12/91)

In article <7813@exodus.Eng.Sun.COM> tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) writes:
>... Unix uses linefeed only, MS-DOS uses carriage return
>and linefeed, and MacOS uses carriage return only.
>
>My question is this: do any standards specify how lines should be
>kept apart?  That is, do any of these three operating systems have
>any justification (other than space savings in the case of Unix and
>MacOS) for doing things they way they did?

ASCII gives you a choice.  Normally, CR signifies move back to left margin
and LF signifies go down to next line, so some combination of those two
is the right choice for an end-of-line sequence (bearing in mind that
line boundaries could also be stored as out-of-band data, e.g. length counts,
in which case there *is* no such sequence).  However, ASCII also specifies
that a single character can be used as a line terminator if all parties
involved agree on this, and that if so, it shall be LF, aka newline.

So Unix is doing things right :-), MuShDOS is also technically right but
is doing things the hard way -- a single terminator makes life a lot easier
for software -- and MacOS is unequivocally broken.
-- 
"Read the OSI protocol specifications?  | Henry Spencer @ U of Toronto Zoology
I can't even *lift* them!"              |  henry@zoo.toronto.edu  utzoo!henry

williams@nssdcs.gsfc.nasa.gov (Jim Williams) (02/13/91)

In article <7813@exodus.Eng.Sun.COM> tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) writes:
>The world's three most important operating systems-- Unix, MS-DOS,
>and MacOS-- all employ different methods for separating lines from
>each other.  Unix uses linefeed only, MS-DOS uses carriage return
>and linefeed, and MacOS uses carriage return only.
>
>My question is this: do any standards specify how lines should be
>kept apart?  That is, do any of these three operating systems have
>any justification (other than space savings in the case of Unix and
>MacOS) for doing things they way they did?

I don't have copies of all the relevant standards, but I have read
most of them in the past, so here's the situataion as I understand it.

Prior to ISO Latin-1, there was no single character whose semantics was
that of "line separator", since the ANSI standards did not contain the
concept of a line.  Carriage return was exactly that: move the active
position to the beginning of the line.  Line feed was exactly that:
move the active position to the next line down.  Many systems therefore
used CR-LF as a "newline" indicator.  I seem to recall that use of
Line Feed alone was a permissible option under the original ANSI
standard, but that usage is now deprecated.  This make MS-DOS(!) the
most correct, Unix correct but outdated, and MacOS completely out of
it.  In defense of the Mac, I'll just point out that its much easier to
deal with file conversion when a single character is used as a line
separator.  There are some messy situations you can get into with the
MS-DOS to Unix conversion.

ASCII does define some separator characters: File Sep., Group Sep.,
Record Sep., and Unit Sep., but no semantics were given for these.  If
they are used hierarchically, then files contain groups which contain
records which contain units, but beyond that, the meaning is
"application specific".  It would have been reasonable to use Record
Separator to be a "newline" but I don't think anyone has.  One terminal
I worked with used Unit Separator as a newline.

I should also point out that there are no characters in 7-bit ASCII
whose meaning is "ignore the last character I sent".  Most systems use
Backspace or Delete for this, and neither mean that.  Backspace means
"move the active position backward by one", and delete is a no-op,
which should be ignored.  (Originally, delete characters were used to
erase, or "rub-out" mistyped characters in punched tape.  This is why
Delete is all 1s: it punched out all the holes in paper tape,
obliterating the mistyped character!)  The "cancel" character might have
been used, since it has a generalized meaning of "oops!".  Also there
is no line kill character, i.e., one that means "forget everything since
the last newline".  How could there be, when there is no concept of a
line!

Latin-1 has fixed some of that, but I doubt the fixes will ever be
popular.  In the C1 control set there is a character NEL, which means
Next Line, and is a proper newline character.  There is also a
character (CCH, I *think*) which means "ignore the last character
sent".  I don't think there is a "line kill" character, though.
The VT420 on my desk can handle NEL, but not CCH.

Things to read are ANSI X3.64, X3.134.* and the various ISO standards.

If I got any of this wrong, my apologies, and please send me mail,
so I don't do it again!

Jim
Spoken: Jim Williams             Domain: williams@nssdcs.gsfc.nasa.gov
Phone: +1 301 286-1131           UUCP:   uunet!mimsy!williams
USPS: NASA/GSFC, Code 633, Greenbelt, MD 20771
Motto: There is no 'd' in "kluge"!  It rhymes with "huge", not "sludge".

npn@cbnewsl.att.com (nils-peter.nelson) (02/13/91)

In article <7813@exodus.Eng.Sun.COM>, tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) writes:
> The world's three most important operating systems-- Unix, MS-DOS,
> and MacOS-- all employ different methods for separating lines from
> each other.  Unix uses linefeed only, MS-DOS uses carriage return
> and linefeed, and MacOS uses carriage return only.
> 
> My question is this: do any standards specify how lines should be
> kept apart?  That is, do any of these three operating systems have
> any justification (other than space savings in the case of Unix and
> MacOS) for doing things they way they did?


Oldest operating systems (mostly IBM, but also CDC,
GCOS (GE/Honeywell), Univac, etc) used a card-image
convention-- every line stored as 80 characters
(lots of trailing blanks).  It was quite an innovation
to go to "variable length records." In these, no motion
character was needed, although many included one any way.
Ken Thompson hated all these conventions, since they
made software very complex-- there were 10 or more different
"record types"-- and when he created Unix he chose the
simplest convention he could think of. He chose newline
(octal 12) because on the Model 37 Teletype that character
caused both flyback (carriage return) and one vertical space.

The convention is arbitrary, since some low-level terminal
driver still had to add delays after the motion character
on mechanical devices. At least the conventions you showed
can be mapped. Binary format is hopeless, and there is
no standard.

tml@tik.vtt.fi (Tor Lillqvist) (02/14/91)

In article <7813@exodus.Eng.Sun.COM> tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) writes:
   >The world's three most important operating systems-- Unix, MS-DOS,
   >and MacOS-- all employ different methods for separating lines from
   >each other.  Unix uses linefeed only, MS-DOS uses carriage return
   >and linefeed, and MacOS uses carriage return only.
   >
   >My question is this: do any standards specify how lines should be
   >kept apart?  That is, do any of these three operating systems have
   >any justification (other than space savings in the case of Unix and
   >MacOS) for doing things they way they did?

There are operating systems where text files contain no "line
separator" at all, but files contain "records", which in the case of
"text" files are "lines."  These systems typically have lots of file
types with attributes such as "variable/fixed length records," "record
length," etc.  For instance VMS and RTE (on the HP1000).

On RTE (the operating system from hell), the actual bytes
corresponding to a line containing the characters "abc" are:

	- a two-byte record length word.  The record length is the
	  length in bytes rotated right one bit.  (Beause
	  traditionally RTE was word-orientated, and record lengths
	  were counted in words).  Thus the 16-bit word 100001 (in octal).

	- The bytes 'a', 'b', 'c' and a padding ' ' to even length.

	- the record-length word again.

Remember that in Unix it is only the user-level programs that care
about line separators.  Nothing would need changing in the Unix kernel
if one started to use CR-LF pairs for line separation.  (Well, maybe
one thing: #! interpretation.)
--
Tor Lillqvist,
working, but not speaking, for the Technical Research Centre of Finland