brad@looking.on.ca (Brad Templeton) (05/27/91)
I'm making a program that maps textfiles, and I would like to hear of any obscure ASCII textfile formats that might exist out there. (I'm talking ASCII. There are other formats such as EBCDIC, but a simple newline translator is not enough for such.) Of course there is just the Newline (LF) as on Unix, as well as CR-LF as found on many micros, and just (CR) on some machines. I also know of QNX which uses the ASCII RS character (0x1e). Anybody know of others? -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
peter@ficc.ferranti.com (Peter da Silva) (05/28/91)
In article <1991May27.162515.665@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: > I'm making a program that maps textfiles, and I would like to hear of > any obscure ASCII textfile formats that might exist out there. DEC uses variable-record files, where a line consists of a character count followed by that many bytes. There are also DEC file modes where a line number is included as well. Some older systems (MODCOMP, for example) use ASCII 80-column card images (blank padded, no terminators). Luckily, it's unlikely that you'll have to deal with 80-column card images, and DEC's C runtime library translates variable-record files to stream-linefeed format. -- Peter da Silva; Ferranti International Controls Corporation; +1 713 274 5180; Sugar Land, TX 77487-5012; `-_-' "Have you hugged your wolf, today?"
trumbull@unx.sas.com (Ed Trumbull) (05/30/91)
Brad Templeton asks for obscure ASCII text formats. Here's Prime's: 1. Prime ASCII is "signed" -- extended ASCII characters go in bytes 0 to 127, and "normal" ASCII characters go in bytes 128 to 255. 2. Text lines are terminated by a newline. (The newline is ignored as terminal input...) 3. All occurrences of more than two spaces are compressed by replacing them with a horizontal tab (DC1) and a byte count. -- Ed Trumbull <trumbull@sdcprm.pri.sas.com> SAS Institute, Inc. Cary, NC USA
harrison@csl.dl.nec.com (Mark Harrison) (05/30/91)
In article <1991May27.162515.665@looking.on.ca> brad@looking.on.ca (Brad Templeton) writes: >I'm making a program that maps textfiles, and I would like to hear of >any obscure ASCII textfile formats that might exist out there. The DEC-20 was a 36-bit machine that encoded text files something like this: If the high bit was set, the other bits were the line number. Otherwise there were 5 seven-bit characters stuffed in the word. There was also a representation where there were 6 six-bit characters in one word. This was used in the linker, etc, but it could have been used for text files. I think that character set was called SIXBIT. The above is rather vague in my mind, and probably not entirely accurate. :-) -- Mark Harrison | Note: harrison@ssd.dl.nec.com and harrison@csl.dl.nec.com | necssd!harrison are not operating at (214)518-5050 | present. Please forward mail through the | above address. Sorry for the inconvenience.
parke@star.enet.dec.com (Bill Parke) (05/30/91)
In article <1991May29.174634.23343@csl.dl.nec.com>, harrison@csl.dl.nec.com (Mark Harrison) writes: > >The DEC-20 was a 36-bit machine that encoded text files something >like this: If the high bit was set, the other bits were the line >number. Otherwise there were 5 seven-bit characters stuffed in >the word. The DEC-10 and 20 7 bit ASCII was stored form the top (bit 0) of the word to the bottom. This leat the bottom bit unused and by convention it became the "line number" bit. This led to all sorts of amusing incidents such as my progam which worked correctly but was almost the death of a TA in my FORTRAN class. It included the statement "WRITE I=6". Hint, it correctly assigned 6 to I and yealded no compile errors. > >There was also a representation where there were 6 six-bit characters >in one word. This was used in the linker, etc, but it could have >been used for text files. I think that character set was called >SIXBIT. Sixbit was used for may symbol table and file name representations. There was also a form known as RAD50 which represented the symbol name subset of the alphabet and left 4 bits to encode data about the name for the linker. This was the linkers internal form. Sixbit file names could be made amusing with PIP under TOPS-10 by explicitly building the octal. (Teners might recognize #001216120000.#000000 ) D**n, these wimpy machines with a fixed byte size. > >The above is rather vague in my mind, and probably not entirely >accurate. :-) >-- >Mark Harrison | Note: harrison@ssd.dl.nec.com and >harrison@csl.dl.nec.com | necssd!harrison are not operating at >(214)518-5050 | present. Please forward mail through the > | above address. Sorry for the inconvenience. > -- Bill Parke VMS Development decwrl!star.enet.dec.com!parke Digital Equipment Corp parke@star.enet.dec.com 110 Spit Brook Road ZK01-1/F22, Nashua NH 03063 The views expressed are my own.
buckland@cheddar.ucs.ubc.ca (Tony Buckland) (05/31/91)
In article <23015@shlump.lkg.dec.com> parke@star.enet.dec.com (Bill Parke) writes: > ... my progam which worked correctly but was almost the death of >a TA in my FORTRAN class. It included the statement "WRITE I=6". > >Hint, it correctly assigned 6 to I and yealded no compile errors. I'm surprised it didn't assign to WRITEI, since blanks are of no significance in FORTRAN outside character constants. A famous example of the consequences is the substitution for DO 1 I = 1,10 of DO 1 I = 1 10 which turns a DO-loop control statement into an assignment statement because of an omitted comma.
parke@star.enet.dec.com (Bill Parke) (05/31/91)
>From: buckland@cheddar.ucs.ubc.ca (Tony Buckland) >Newsgroups: comp.misc >In article <23015@shlump.lkg.dec.com> parke@star.enet.dec.com (Bill Parke) writes: >> ... my progam which worked correctly but was almost the death of >>a TA in my FORTRAN class. It included the statement "WRITE I=6". >> > > I'm surprised it didn't assign to WRITEI, since blanks are of no The significant part of the equation is that WRITE is 5 characters, and the DEC-10 word hpolds 5 ASCII characters and I turned on the bottom (36th) bit which told the compiler that "WRITE" was a line number, Therefor the compiler saw linenumber(WRITE) identifier(i) equals(=) constant(1) as the tokens from the line. It was most fun having loops that didnt by having: linenumber("DO 10") equation (i=1) Linenumber(",3) ended with: linenumber("10 ") tab-character equation(...) Made it real hard for someone to copy your homework from a listing. -- Bill Parke VMS Development decwrl!star.enet.dec.com!parke Digital Equipment Corp parke@star.enet.dec.com 110 Spit Brook Road ZK01-1/F22, Nashua NH 03063 The views expressed are my own.
blarson@blars (06/02/91)
In article <1991May29.173405.2181@unx.sas.com> trumbull@unx.sas.com (Ed Trumbull) writes: >Brad Templeton asks for obscure ASCII text formats. Here's Prime's: If you are going to try the hopeless task of enumerating all ascii text file formats, at least get them right. (The answer to the question in the subject is: one more than the number of unique ascii text files.) > 1. Prime ASCII is "signed" -- extended ASCII characters go in bytes 0 to >127, and "normal" ASCII characters go in bytes 128 to 255. I dislike the "signed" term, but the parity bit is set on normal characters. > 2. Text lines are terminated by a newline. Lines are terminated by a Linefeed (^J) and the following byte is ignored if the number of characters in the line (including linefeed) is odd. Most programs use a NUL in the ignored position, and the file comparison utility (CMPF) incorrectly pays attention to the ignored byte. > 3. All occurrences of more than two spaces are compressed by replacing them >with a horizontal tab (DC1) and a byte count. DC1 (^Q, not tab) denotes the following byte is a number of spaces, 0 to 255. Most utilities only use this for 3 or more spaces. 4. trailing spaces may be trimmed by text processing programs. 5. printer files have even more strange things than text files. > (The newline is ignored as terminal input...) The standard terminal driver ignores linefeed and converst return to linefeed on input. In latter versions of primos, it is possible to turn off this misfeature. (Although leaving your terminal that way confuses the command processor.) This realy has nothing to do with text file formats. -- blarson@usc.edu C news and rn for os9/68k! -- Bob Larson (blars) blarson@usc.edu usc!blarson Hiding differences does not make them go away. Accepting differences makes them unimportant.