tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) (02/12/91)
The world's three most important operating systems-- Unix, MS-DOS, and MacOS-- all employ different methods for separating lines from each other. Unix uses linefeed only, MS-DOS uses carriage return and linefeed, and MacOS uses carriage return only. My question is this: do any standards specify how lines should be kept apart? That is, do any of these three operating systems have any justification (other than space savings in the case of Unix and MacOS) for doing things they way they did?
henry@zoo.toronto.edu (Henry Spencer) (02/12/91)
In article <7813@exodus.Eng.Sun.COM> tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) writes: >... Unix uses linefeed only, MS-DOS uses carriage return >and linefeed, and MacOS uses carriage return only. > >My question is this: do any standards specify how lines should be >kept apart? That is, do any of these three operating systems have >any justification (other than space savings in the case of Unix and >MacOS) for doing things they way they did? ASCII gives you a choice. Normally, CR signifies move back to left margin and LF signifies go down to next line, so some combination of those two is the right choice for an end-of-line sequence (bearing in mind that line boundaries could also be stored as out-of-band data, e.g. length counts, in which case there *is* no such sequence). However, ASCII also specifies that a single character can be used as a line terminator if all parties involved agree on this, and that if so, it shall be LF, aka newline. So Unix is doing things right :-), MuShDOS is also technically right but is doing things the hard way -- a single terminator makes life a lot easier for software -- and MacOS is unequivocally broken. -- "Read the OSI protocol specifications? | Henry Spencer @ U of Toronto Zoology I can't even *lift* them!" | henry@zoo.toronto.edu utzoo!henry
williams@nssdcs.gsfc.nasa.gov (Jim Williams) (02/13/91)
In article <7813@exodus.Eng.Sun.COM> tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) writes: >The world's three most important operating systems-- Unix, MS-DOS, >and MacOS-- all employ different methods for separating lines from >each other. Unix uses linefeed only, MS-DOS uses carriage return >and linefeed, and MacOS uses carriage return only. > >My question is this: do any standards specify how lines should be >kept apart? That is, do any of these three operating systems have >any justification (other than space savings in the case of Unix and >MacOS) for doing things they way they did? I don't have copies of all the relevant standards, but I have read most of them in the past, so here's the situataion as I understand it. Prior to ISO Latin-1, there was no single character whose semantics was that of "line separator", since the ANSI standards did not contain the concept of a line. Carriage return was exactly that: move the active position to the beginning of the line. Line feed was exactly that: move the active position to the next line down. Many systems therefore used CR-LF as a "newline" indicator. I seem to recall that use of Line Feed alone was a permissible option under the original ANSI standard, but that usage is now deprecated. This make MS-DOS(!) the most correct, Unix correct but outdated, and MacOS completely out of it. In defense of the Mac, I'll just point out that its much easier to deal with file conversion when a single character is used as a line separator. There are some messy situations you can get into with the MS-DOS to Unix conversion. ASCII does define some separator characters: File Sep., Group Sep., Record Sep., and Unit Sep., but no semantics were given for these. If they are used hierarchically, then files contain groups which contain records which contain units, but beyond that, the meaning is "application specific". It would have been reasonable to use Record Separator to be a "newline" but I don't think anyone has. One terminal I worked with used Unit Separator as a newline. I should also point out that there are no characters in 7-bit ASCII whose meaning is "ignore the last character I sent". Most systems use Backspace or Delete for this, and neither mean that. Backspace means "move the active position backward by one", and delete is a no-op, which should be ignored. (Originally, delete characters were used to erase, or "rub-out" mistyped characters in punched tape. This is why Delete is all 1s: it punched out all the holes in paper tape, obliterating the mistyped character!) The "cancel" character might have been used, since it has a generalized meaning of "oops!". Also there is no line kill character, i.e., one that means "forget everything since the last newline". How could there be, when there is no concept of a line! Latin-1 has fixed some of that, but I doubt the fixes will ever be popular. In the C1 control set there is a character NEL, which means Next Line, and is a proper newline character. There is also a character (CCH, I *think*) which means "ignore the last character sent". I don't think there is a "line kill" character, though. The VT420 on my desk can handle NEL, but not CCH. Things to read are ANSI X3.64, X3.134.* and the various ISO standards. If I got any of this wrong, my apologies, and please send me mail, so I don't do it again! Jim Spoken: Jim Williams Domain: williams@nssdcs.gsfc.nasa.gov Phone: +1 301 286-1131 UUCP: uunet!mimsy!williams USPS: NASA/GSFC, Code 633, Greenbelt, MD 20771 Motto: There is no 'd' in "kluge"! It rhymes with "huge", not "sludge".
npn@cbnewsl.att.com (nils-peter.nelson) (02/13/91)
In article <7813@exodus.Eng.Sun.COM>, tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) writes: > The world's three most important operating systems-- Unix, MS-DOS, > and MacOS-- all employ different methods for separating lines from > each other. Unix uses linefeed only, MS-DOS uses carriage return > and linefeed, and MacOS uses carriage return only. > > My question is this: do any standards specify how lines should be > kept apart? That is, do any of these three operating systems have > any justification (other than space savings in the case of Unix and > MacOS) for doing things they way they did? Oldest operating systems (mostly IBM, but also CDC, GCOS (GE/Honeywell), Univac, etc) used a card-image convention-- every line stored as 80 characters (lots of trailing blanks). It was quite an innovation to go to "variable length records." In these, no motion character was needed, although many included one any way. Ken Thompson hated all these conventions, since they made software very complex-- there were 10 or more different "record types"-- and when he created Unix he chose the simplest convention he could think of. He chose newline (octal 12) because on the Model 37 Teletype that character caused both flyback (carriage return) and one vertical space. The convention is arbitrary, since some low-level terminal driver still had to add delays after the motion character on mechanical devices. At least the conventions you showed can be mapped. Binary format is hopeless, and there is no standard.
tml@tik.vtt.fi (Tor Lillqvist) (02/14/91)
In article <7813@exodus.Eng.Sun.COM> tut@cairo.Eng.Sun.COM (Bill "Bill" Tuthill) writes: >The world's three most important operating systems-- Unix, MS-DOS, >and MacOS-- all employ different methods for separating lines from >each other. Unix uses linefeed only, MS-DOS uses carriage return >and linefeed, and MacOS uses carriage return only. > >My question is this: do any standards specify how lines should be >kept apart? That is, do any of these three operating systems have >any justification (other than space savings in the case of Unix and >MacOS) for doing things they way they did? There are operating systems where text files contain no "line separator" at all, but files contain "records", which in the case of "text" files are "lines." These systems typically have lots of file types with attributes such as "variable/fixed length records," "record length," etc. For instance VMS and RTE (on the HP1000). On RTE (the operating system from hell), the actual bytes corresponding to a line containing the characters "abc" are: - a two-byte record length word. The record length is the length in bytes rotated right one bit. (Beause traditionally RTE was word-orientated, and record lengths were counted in words). Thus the 16-bit word 100001 (in octal). - The bytes 'a', 'b', 'c' and a padding ' ' to even length. - the record-length word again. Remember that in Unix it is only the user-level programs that care about line separators. Nothing would need changing in the Unix kernel if one started to use CR-LF pairs for line separation. (Well, maybe one thing: #! interpretation.) -- Tor Lillqvist, working, but not speaking, for the Technical Research Centre of Finland