[comp.misc] How many different ASCII textfile formats are there?

brad@looking.on.ca (Brad Templeton) (05/27/91)

I'm making a program that maps textfiles, and I would like to hear of
any obscure ASCII textfile formats that might exist out there.

(I'm talking ASCII.  There are other formats such as EBCDIC, but a simple
newline translator is not enough for such.)

Of course there is just the Newline (LF) as on Unix, as well as CR-LF as
found on many micros, and just (CR) on some machines.

I also know of QNX which uses the ASCII RS character (0x1e).

Anybody know of others?
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

peter@ficc.ferranti.com (Peter da Silva) (05/28/91)

In article <1991May27.162515.665@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes:
> I'm making a program that maps textfiles, and I would like to hear of
> any obscure ASCII textfile formats that might exist out there.

DEC uses variable-record files, where a line consists of a character count
followed by that many bytes. There are also DEC file modes where a line
number is included as well. Some older systems (MODCOMP, for example) use
ASCII 80-column card images (blank padded, no terminators).

Luckily, it's unlikely that you'll have to deal with 80-column card images,
and DEC's C runtime library translates variable-record files to stream-linefeed
format.
-- 
Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180;
Sugar Land, TX  77487-5012;         `-_-' "Have you hugged your wolf, today?"

trumbull@unx.sas.com (Ed Trumbull) (05/30/91)

Brad Templeton asks for obscure ASCII text formats.  Here's Prime's:

   1.  Prime ASCII is "signed" -- extended ASCII characters go in bytes 0 to 
127, and "normal" ASCII characters go in bytes 128 to 255.
   2.  Text lines are terminated by a newline.  (The newline is ignored as 
terminal input...)
   3.  All occurrences of more than two spaces are compressed by replacing them
with a horizontal tab (DC1) and a byte count.

   
-- 
Ed Trumbull     <trumbull@sdcprm.pri.sas.com>   
SAS Institute, Inc. Cary, NC USA                

harrison@csl.dl.nec.com (Mark Harrison) (05/30/91)

In article <1991May27.162515.665@looking.on.ca>
brad@looking.on.ca (Brad Templeton) writes:

>I'm making a program that maps textfiles, and I would like to hear of
>any obscure ASCII textfile formats that might exist out there.

The DEC-20 was a 36-bit machine that encoded text files something
like this:  If the high bit was set, the other bits were the line
number.  Otherwise there were 5 seven-bit characters stuffed in
the word.

There was also a representation where there were 6 six-bit characters
in one word.  This was used in the linker, etc, but it could have
been used for text files.  I think that character set was called
SIXBIT.

The above is rather vague in my mind, and probably not entirely
accurate. :-)
-- 
Mark Harrison           | Note: harrison@ssd.dl.nec.com and
harrison@csl.dl.nec.com | necssd!harrison are not operating at
(214)518-5050           | present.  Please forward mail through the
                        | above address.  Sorry for the inconvenience.

parke@star.enet.dec.com (Bill Parke) (05/30/91)

In article <1991May29.174634.23343@csl.dl.nec.com>, harrison@csl.dl.nec.com (Mark Harrison) writes:
>
>The DEC-20 was a 36-bit machine that encoded text files something
>like this:  If the high bit was set, the other bits were the line
>number.  Otherwise there were 5 seven-bit characters stuffed in
>the word.

The DEC-10 and 20 7 bit ASCII was stored form the top (bit 0) of the
word to the bottom.   This leat the bottom bit unused and by convention
it became the "line number" bit.   This led to all sorts of amusing incidents
such as my progam which worked correctly but was almost the death of
a TA in my FORTRAN class.  It included the statement "WRITE I=6".

Hint, it correctly assigned 6 to I and yealded no compile errors.
>
>There was also a representation where there were 6 six-bit characters
>in one word.  This was used in the linker, etc, but it could have
>been used for text files.  I think that character set was called
>SIXBIT.

Sixbit was used for may symbol table and file name representations.  There
was also a form known as  RAD50 which represented the symbol name subset
of the alphabet and left 4 bits to encode data about the name for the linker.
This was the linkers internal form.

Sixbit file names could be made amusing with PIP under TOPS-10
by explicitly building the octal.  (Teners might recognize 
#001216120000.#000000 )

D**n, these wimpy machines with a fixed byte size.

>
>The above is rather vague in my mind, and probably not entirely
>accurate. :-)
>-- 
>Mark Harrison           | Note: harrison@ssd.dl.nec.com and
>harrison@csl.dl.nec.com | necssd!harrison are not operating at
>(214)518-5050           | present.  Please forward mail through the
>                        | above address.  Sorry for the inconvenience.
>
--
Bill Parke
VMS Development			decwrl!star.enet.dec.com!parke
Digital Equipment Corp		parke@star.enet.dec.com
110 Spit Brook Road ZK01-1/F22, Nashua NH 03063

The views expressed are my own.

buckland@cheddar.ucs.ubc.ca (Tony Buckland) (05/31/91)

In article <23015@shlump.lkg.dec.com> parke@star.enet.dec.com (Bill Parke) writes:
> ... my progam which worked correctly but was almost the death of
>a TA in my FORTRAN class.  It included the statement "WRITE I=6".
>
>Hint, it correctly assigned 6 to I and yealded no compile errors.

 I'm surprised it didn't assign to WRITEI, since blanks are of no
 significance in FORTRAN outside character constants.  A famous
 example of the consequences is the substitution for
 
      DO 1 I = 1,10
 
 of
 
      DO 1 I = 1 10
 
 which turns a DO-loop control statement into an assignment
 statement because of an omitted comma.

parke@star.enet.dec.com (Bill Parke) (05/31/91)

>From: buckland@cheddar.ucs.ubc.ca (Tony Buckland)
>Newsgroups: comp.misc
>In article <23015@shlump.lkg.dec.com> parke@star.enet.dec.com (Bill Parke) writes:
>> ... my progam which worked correctly but was almost the death of
>>a TA in my FORTRAN class.  It included the statement "WRITE I=6".
>>
>
> I'm surprised it didn't assign to WRITEI, since blanks are of no

The significant part of the equation is that WRITE is 5 characters, and
the DEC-10 word hpolds 5 ASCII characters and I turned on the bottom (36th)
bit which told the compiler that "WRITE" was a line number,  Therefor
the compiler saw linenumber(WRITE) identifier(i) equals(=) constant(1)
as the tokens from the line.

It was most fun having loops that didnt by having:

	linenumber("DO 10") equation (i=1) Linenumber(",3)
ended with:

	linenumber("10   ") tab-character equation(...)

Made it real hard for someone to copy your homework from a listing.

--
Bill Parke
VMS Development			decwrl!star.enet.dec.com!parke
Digital Equipment Corp		parke@star.enet.dec.com
110 Spit Brook Road ZK01-1/F22, Nashua NH 03063

The views expressed are my own.

blarson@blars (06/02/91)

In article <1991May29.173405.2181@unx.sas.com> trumbull@unx.sas.com (Ed Trumbull) writes:
>Brad Templeton asks for obscure ASCII text formats.  Here's Prime's:

If you are going to try the hopeless task of enumerating all ascii text
file formats, at least get them right.  (The answer to the question in
the subject is: one more than the number of unique ascii text files.)

>   1.  Prime ASCII is "signed" -- extended ASCII characters go in bytes 0 to 
>127, and "normal" ASCII characters go in bytes 128 to 255.

I dislike the "signed" term, but the parity bit is set on normal characters.

>   2.  Text lines are terminated by a newline.
Lines are terminated by a Linefeed (^J) and the following byte is
ignored if the number of characters in the line (including linefeed)
is odd.  Most programs use a NUL in the ignored position, and the file
comparison utility (CMPF) incorrectly pays attention to the ignored byte.

>   3.  All occurrences of more than two spaces are compressed by replacing them
>with a horizontal tab (DC1) and a byte count.

DC1 (^Q, not tab) denotes the following byte is a number of spaces, 0
to 255.  Most utilities only use this for 3 or more spaces.

4. trailing spaces may be trimmed by text processing programs.

5. printer files have even more strange things than text files.

>  (The newline is ignored as terminal input...)

The standard terminal driver ignores linefeed and converst return to
linefeed on input.  In latter versions of primos, it is possible to
turn off this misfeature.  (Although leaving your terminal that way
confuses the command processor.)  This realy has nothing to do with
text file formats.

-- 
blarson@usc.edu
		C news and rn for os9/68k!
-- 
Bob Larson (blars)	blarson@usc.edu			usc!blarson
	Hiding differences does not make them go away.
	Accepting differences makes them unimportant.