[comp.unix.questions] Historical question: LF vs. CR\LF in text files

tomr@ashtate (Tom Rombouts) (05/30/90)

Forgive the bandwidth, but seeing that others beside myself are
having occassional problems relating to the differences between
UNIX vs. DOS (and CP/M, correct?) in handling end of lines, I am
wondering how this started.  Since UNIX came first, I am going to
guess that at some time, somewhere someone said "Hey - let's add
a carriage return!"  Does anyone know the (possibly amusing?) story
behind this?  What was the essential rationale?

Tom Rombouts, Torrance Techie  Voice: (213) 538-7108

mwarren@mips2.cr.bull.com (Mark Warren) (05/30/90)

In article <952@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
> ... relating to the differences between
>UNIX vs. DOS (and CP/M, correct?) in handling end of lines, I am
>wondering how this started.  Since UNIX came first, I am going to
>guess that at some time, somewhere someone said "Hey - let's add
>a carriage return!"
>
"Unix came first" ????  Ulp!  Makes me feel pretty old.  The simple
history relates to the olden days, when boys were boys, men were men,
and advanced computer terminals were Teletype Corp. ASR33's with cute
little 10 cps paper tape readers on the side.  Quite simply, as in a
standard typewriter (the old fashioned kind that did not have a
computer attached, or even an electrical cord), the real physical
indication of the end of a line was a carriage return, followed by a
new line.
-- 

 == Mark Warren                      Bull HN Information Systems Inc. ==
 == (508) 671-3171 (FAX 671-3020)    300 Concord Road     MS820A      ==
 == mwarren@granite.cr.bull.com      Billerica, MA 01821              ==

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/30/90)

In article <952@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
>Forgive the bandwidth, but seeing that others beside myself are
>having occassional problems relating to the differences between
>UNIX vs. DOS (and CP/M, correct?) in handling end of lines, I am
>wondering how this started.  Since UNIX came first, I am going to
>guess that at some time, somewhere someone said "Hey - let's add
>a carriage return!"  Does anyone know the (possibly amusing?) story
>behind this?  What was the essential rationale?

Actually, neither UNIX nor MS-DOS came first.  There have been many
conventions for indicating record sizes/boundaries in disk files,
including per-line byte counts, fixed record size with trailing-
blank padding, CR stream delimiters (e.g. Apple II), etc.  The real
original of CR/LF pairs can be traced back to teletypewriters,
which required both control characters to perform the "new-line"
function.  It was fairly natural to embed these in disk files so
that a simple dump of the file to a teletypewriter would print
properly.  An alternate interpretation of the LF code allowed by
the ASCII standard was the entire new-line function, and some newer
terminals were designed to support this (if I recall correctly, the
Teletype model 37 may have been one of these).  Apparently the UNIX
designers took advantage of this convention and adopted it as the
standard interpretation for files considered as text; it has the
advantage of simplifying code that processes text files.  Note that
even the universal C standard requires that this convention be
followed for files opened as "text streams", which means that C
implementations on CR/LF-using systems are obliged to watch for
line delimiters and convert the external CR/LF to and from the
internal C character code '\n' (typically same code as ASCII LF),
at least for stdio text streams.

My personal opinion is that the people who "designed" MS-DOS did
not pay sufficient attention to lessons that should have been
learned from the UNIX system, but too closely followed CP/M which
in turn was pretty much a direct rip-off of DEC's RT-11 design.

toma@tekgvs.LABS.TEK.COM (Tom Almy) (05/30/90)

In article <952@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
>Forgive the bandwidth, but seeing that others beside myself are
>having occassional problems relating to the differences between
>UNIX vs. DOS (and CP/M, correct?) in handling end of lines, I am
>wondering how this started.  Since UNIX came first, I am going to
>guess that at some time, somewhere someone said "Hey - let's add
>a carriage return!"  Does anyone know the (possibly amusing?) story
>behind this?  What was the essential rationale?

Turning back the clock, one delt with "records" where typically one
record equaled one line. There were no "carriage returns" or "line feeds".
But then (ignoring Flexowriters and other early terminals I never had the
chance to use) along came ASCII and the Teletype Model 33. ASCII defined
separate codes for the carriage return (which actually returned the print
head on the model 33, but at least it was a mechanical motion) and line
feed (which actually fed paper). The early *pre-UNIX* operating systems
tended to have these two characters represent the end of line because that
is what it took to run the printer.

Of course things weren't completely simple because one didn't want to hit
both the carriage return and line feed keys on the Teletype to enter lines
(you had to do this if you were in Off-Line mode) so carriage returns were
converted to cr/lf pairs on input. Also the extra byte did waste precious
storage.

I had used some systems that compressed the eol sequence to a single character
before UNIX existed (more later...).

UNIX didn't say "lets get rid of the carriage return and just use the line
feed". It invented a new code "new line". It just so happens to be internally
represented with the same code as the ASCII line feed. The TTY driver has
the responsibility of translating CR->NL on input and NL->CR/LF on output.
Note that the driver allows turning the translation off. About 10 years ago
I wrote a terminal emulator program that emulated a "UNIX" terminal -- the
carriage return key sent a "New Line", and the receipt of a "New Line" caused
a carriage return/line feed operation. This didn't last long because of
all the programs that ended up switching to RAW mode to bypass the conversion
behaved very poorly!

I have used several systems that went the single character route using the
Carriage Return. This is probably the most sensible because input conversion
is not necessary. Also standard typewriter practice is that the carriage
return operation (either key or lever on a manual typewriter (remember those?))
would also advance the line, but yet line advance could be independently
performed (with the knob on the end of the carriage).

I know that the net is full of UNIX-myopic people, but UNIX was not first nor
did it make the best move on this one.

Tom Almy
toma@tekgvs.labs.tek.com
Standard Disclaimers Apply

ergo@netcom.UUCP (Isaac Rabinovitch) (05/31/90)

tomr@ashtate (Tom Rombouts) writes:

>Forgive the bandwidth, but seeing that others beside myself are
>having occassional problems relating to the differences between
>UNIX vs. DOS (and CP/M, correct?) in handling end of lines, I am
>wondering how this started.  Since UNIX came first, I am going to
>guess that at some time, somewhere someone said "Hey - let's add
>a carriage return!"  Does anyone know the (possibly amusing?) story
>behind this?  What was the essential rationale?

Unix came before micro OSs, but the CR/LF convention is older than
Unix -- in fact it's older than computers, having been used on
electromechanical teletypes.

The developers of Unix didn't merely drop the carriage return character.
They renamed the line feed character "newline".  At the time some people
objected to this, pointing out that there was now no "down one line"
character.

Note that initially CP/M (and MS-DOS, which started out as a CP/M clone)
imitated pre-Unix mini OSs.  MS-DOS didn't start adding Unix-like features
until later.

peter@ficc.ferranti.com (Peter da Silva) (05/31/90)

In article <7581@tekgvs.LABS.TEK.COM> toma@tekgvs.LABS.TEK.COM (Tom Almy) writes:
> I have used several systems that went the single character route using the
> Carriage Return. This is probably the most sensible because input conversion
> is not necessary.

However, the ASCII code set includes the recommendation that *if* a single
character is used for "new-line" it be the "line-feed" character. The reason,
I assume, is that a single carriage-return is more useful for translating
Fortran carriage-control to ASCII.

> I know that the net is full of UNIX-myopic people, but UNIX was not first nor
> did it make the best move on this one.

They might not have been the first, but it's more useful to be able to
generate a single CR than a single LF on your output.
-- 
`-_-' Peter da Silva. +1 713 274 5180.  <peter@ficc.ferranti.com>
 'U`  Have you hugged your wolf today?  <peter@sugar.hackercorp.com>
@FIN  Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.

meissner@osf.org (Michael Meissner) (06/01/90)

In article <12661@netcom.UUCP> ergo@netcom.UUCP (Isaac Rabinovitch)
writes:

| The developers of Unix didn't merely drop the carriage return character.
| They renamed the line feed character "newline".  At the time some people
| objected to this, pointing out that there was now no "down one line"
| character.

In the eight bit world which uses ASCII as a subset, there is a Next
Line byte.  We had great fun in the X3J11 committee when it was
discovered that the appropriate standards body for character sets was
trying to obsolete the use of linefeed as a newline character.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so

guy@auspex.auspex.com (Guy Harris) (06/01/90)

 >The developers of Unix didn't merely drop the carriage return character.
 >They renamed the line feed character "newline".  At the time some people
 >objected to this, pointing out that there was now no "down one line"
 >character.

FYI, folks, UNIX came *after* Multics, and Multics used the LF character
as "newline".  UNIX didn't invent the idea....

paul@unhtel.uucp (Paul S. Sawyer) (06/01/90)

In article <7581@tekgvs.LABS.TEK.COM> toma@tekgvs.LABS.TEK.COM (Tom Almy) writes:
>...
>I have used several systems that went the single character route using the
>Carriage Return. This is probably the most sensible because input conversion
>is not necessary. Also standard typewriter practice is that the carriage
>return operation (either key or lever on a manual typewriter (remember those?))
>would also advance the line, but yet line advance could be independently
>performed (with the knob on the end of the carriage).
>
>I know that the net is full of UNIX-myopic people, but UNIX was not first nor
>did it make the best move on this one.

In the UNIX convention, LF is output as CR/LF, or the printer is set to
do CR/LF when it receives a LF;  This makes overstriking easy on printers with
a limited set of control codes.  (e.g., "bold <CR>bold <CR>bold" or
"_________ <CR>underline" )

But if CR were translated CR/LF on output, or the printer were to 
interpret CR as CR/LF, this flexibility is lost.  (The printers I'm thinking
of do not handle backspacing or reverse index/upline well if at all.)

Since these translations are usually transparent to the user, I think UNIX
DID do OK on this.  (Although I'll admit to the myopia.  B-)
-- 
Paul S. Sawyer              uunet!unh!unhtel!paul     paul@unhtel.UUCP
UNH Telecommunications        attmail!psawyer       p_sawyer@UNHH.BITNET
Durham, NH  03824-3523      VOX: +1 603 862 3262    FAX: +1 603 862 2030

hunt@dg-rtp.dg.com (Greg Hunt) (06/02/90)

In article <253@samna.UUCP>, jeff@samna.UUCP (Jeff Barber) writes:
|> 
|> UNIX's representation was (IMHO) a real innovation since it
|> simplifies breaking text into lines in software.  The cost is
|> that you need a device driver to interpret control characters
|> on the way in from and out to the terminal.

You're correct in that it makes processing the lines in a file much
easier, but incorrect in assuming that UNIX invented the idea.  It
has been around for alot longer than UNIX, and was invented by 
someone else (I don't know who).  One example I know of is the Data
General AOS/VS series of computers, which have always interpreted LF
or "Newline" as meaning CR/LF if you're not using raw I/O.  

UNIX borrowed lots of ideas from other OS's, and vice-versa.  One thing
UNIX should have (IMHO) learned from other OS's but didn't, is to use
the ASCII FF "Form Feed" character.  It makes document writing so much
easier, since all printers I know of proper handle it.  It causes the
paper to go to the top of the next page.  The standard UNIX tools always
count the number of blank lines and dump them into the file before
printing.  Not only does that waste bytes, but it also means that the
document can only be printed on other printers with exactly the same
number of lines as the printer it was written for has.  Using FF, the
document frequently can be printed on other printers without hassles.

I run into this all the time.  UNIX documents never print right on my
AOS/VS printer, but AOS/VS documents always print right on my UNIX
printer.  Oh well, nobody said UNIX didn't make some big mistakes.

--
Greg Hunt                        Internet: hunt@dg-rtp.dg.com
Data Management Development      UUCP:     {world}!mcnc!rti!dg-rtp!hunt
Data General Corporation
Research Triangle Park, NC       These opinions are mine, not DG's.

bob@omni.com (Bob Weissman) (06/02/90)

- UNIX's representation was (IMHO) a real innovation since it
- simplifies breaking text into lines in software.  The cost is
- that you need a device driver to interpret control characters
- on the way in from and out to the terminal.

The old TENEX operating system used ^_ (\037) as the end of line
character.  What a pain that was!

-- 
Bob Weissman
Internet:	bob@omni.com
UUCP:		...!{apple,pyramid,sgi,uunet}!omni!bob

peter@ficc.ferranti.com (Peter da Silva) (06/05/90)

In article <2323@borabora.omni.com> bob@omni.com (Bob Weissman) writes:
> The old TENEX operating system used ^_ (\037) as the end of line
> character.  What a pain that was!

That almost makes sense: that's Unit Seperator in ASCII. By an odd
co-incidence, that's the character generated by the "NEW LINE" key on
my old Intel terminal.
-- 
`-_-' Peter da Silva. +1 713 274 5180.  <peter@ficc.ferranti.com>
 'U`  Have you hugged your wolf today?  <peter@sugar.hackercorp.com>
@FIN  Dirty words: Zhghnyyl erphefvir vayvar shapgvbaf.

goudreau@larrybud.rtp.dg.com (Bob Goudreau) (06/05/90)

In article <1990Jun1.195910.29218@dg-rtp.dg.com>, hunt@dg-rtp.dg.com
(Greg Hunt) writes:
> In article <253@samna.UUCP>, jeff@samna.UUCP (Jeff Barber) writes:
> |> 
> |> UNIX's representation was (IMHO) a real innovation since it
> |> simplifies breaking text into lines in software.  The cost is
> |> that you need a device driver to interpret control characters
> |> on the way in from and out to the terminal.
> 
> You're correct in that it makes processing the lines in a file much
> easier, but incorrect in assuming that UNIX invented the idea.  It
> has been around for alot longer than UNIX, and was invented by 
> someone else (I don't know who).  One example I know of is the Data
> General AOS/VS series of computers, which have always interpreted LF
> or "Newline" as meaning CR/LF if you're not using raw I/O.  

While UNIX's use of LF as a unitary newline character may well have
been borrowed from some other OS, it most certainly was doing it
long before AOS/VS or even its predecessor AOS (which, BTW, are
operating systems, not computers) were.  Remember that UNIX dates
from 1969, which is only a year after DG was even incorporated.

> UNIX borrowed lots of ideas from other OS's, and vice-versa.  One thing
> UNIX should have (IMHO) learned from other OS's but didn't, is to use
> the ASCII FF "Form Feed" character....
> 
> I run into this all the time.  UNIX documents never print right on my
> AOS/VS printer, but AOS/VS documents always print right on my UNIX
> printer.  Oh well, nobody said UNIX didn't make some big mistakes.

I assume that by "UNIX documents" you refer here to the output of
text processors in general, and to nroff in particular.  (After all,
there's nothing preventing the use of ^L characters in human-generated
files; and besides, most UNIX utilities have no notion of page breaks.)
A minor irritant perhaps, but hardly a big mistake.  Certainly not as
big a mistake as having the OS make it difficult to print on all 66
lines of a standard line printer page, which is why those nroff-ed
documents don't print right on AOS/VS printers in the first place....

------------------------------------------------------------------------
Bob Goudreau				+1 919 248 6231
Data General Corporation
62 Alexander Drive			goudreau@dg-rtp.dg.com
Research Triangle Park, NC  27709	...!mcnc!rti!xyzzy!goudreau
USA