spolsky-joel@CS.YALE.EDU (Joel Spolsky) (02/10/89)
In article <89Feb9.123853est.2662@godzilla.eecg.toronto.edu> noworol@eecg.toronto.edu (Mark Noworolski) writes: >Very frequently when I get stuff off the net I run into the problem of >no carriage returns. It appears that UNIX stores text a little differently >from Messdos. Depends how you get it off the net. If you are using Kermit, you should be able to set a switch to convert CR->CR/LF on downloading. It's in the manual somewhere. The trick is to make sure kermit knows it's a text file. Otherwise try this: while ((c=getchar()) != EOF) { putchar(c); if (c==13) putchar(10); } /* did I get that right? no guarantees, it's just me rattling off untested code again... */ --- Oh, if you're using XENIX doscp, there is a switch to control CR-CR/LF translation. +----------------+----------------------------------------------------------+ | Joel Spolsky | bitnet: spolsky@yalecs.bitnet uucp: ...!yale!spolsky | | | internet: spolsky@cs.yale.edu voicenet: 203-436-1483 | +----------------+----------------------------------------------------------+ #include <disclaimer.h>
pozar@hoptoad.uucp (Tim Pozar) (02/10/89)
In article <89Feb9.123853est.2662@godzilla.eecg.toronto.edu> noworol@eecg.toronto.edu (Mark Noworolski) writes: >Very frequently when I get stuff off the net I run into the problem of >no carriage returns. It appears that UNIX stores text a little differently >from Messdos. > >Bafore writing something to fix this problem... I figure somebody's probably >already done it. > >Can somebody send me a program to do this? >mark > Its on its way. Does anyone else need something like this? Tim -- ...sun!hoptoad!\ Tim Pozar >fidogate!pozar Fido: 1:125/406 ...lll-winken!/ PaBell: (415) 788-3904 USNail: KKSF / 77 Maiden Lane / San Francisco CA 94108
hardin@hpindda.HP.COM (John Hardin) (02/11/89)
noworol@eecg.toronto.edu (Mark Noworolski) writes: >Very frequently when I get stuff off the net I run into the problem of >no carriage returns. It appears that UNIX stores text a little differently >from Messdos. > >Bafore writing something to fix this problem... I figure somebody's probably >already done it. ---------- I wrote the following program to do just that: #include <stdio.h> #include <io.h> main() { int ch; ch = getchar(); while (ch != EOF) { putchar(ch); ch = getchar(); } } This was written in Turbo C 1.5, but I doubt that other compilers would have a problem with it (famous last words?). For compatibility with Unix, the putchar function adds a CR when you output an LF, so all you have to do is read the input file and write the output file, ignoring the Unix/DOS difference. If you compile and link the above to unix2dos.exe, then to run it just type unix2dos <unixfile >dosfile (substituting your own file names, of course). John Hardin hardin%hpindda@hplabs.hp.com -----------
arya@eros.Berkeley.EDU (Manish Arya) (02/11/89)
In article <89Feb9.123853est.2662@godzilla.eecg.toronto.edu> noworol@eecg.toronto.edu (Mark Noworolski) writes: >Very frequently when I get stuff off the net I run into the problem of >no carriage returns. It appears that UNIX stores text a little differently >from Messdos. Several file transfer protocols, including kermit and zmodem, will perform the necessary conversions on the fly when sending text files. If you have access to file transfer software implementing one of these protocols, you can save yourself alot of trouble by using it (as opposed to using some other protocol like xmodem and then "fixing" the files with a separate program). Furthermore, such protocols will "do the right thing" with other operating systems also, whose text file storage formats may be different from both UNIX and MSDOS. Incidentally, UNIX terminates each line of an ASCII file with a line feed (10 decimal). MSDOS, however, ends each line with a carriage return (13 decimal) and line feed. UNIX ends files with ^D (4 decimal) while MSDOS uses ^Z (26 decimal). - Manish Arya
pozar@hoptoad.uucp (Tim Pozar) (02/11/89)
noworol@eecg.toronto.edu (Mark Noworolski) wrote: > Very frequently when I get stuff off the net I run into the problem of > no carriage returns. It appears that UNIX stores text a little differently > from Messdos. > > Bafore writing something to fix this problem... I figure somebody's probably > already done it. > > Can somebody send me a program to do this? A number of people asked me for the programme... So here is the source. If you want the binaries you can download the programme from my bbs at +1 415 695 0759. Tim --- #include <stdio.h> /* * TOTXT * * Changes UNIX, MAC, or ASCII newline characters in their text files * into the local enviorment's newline convention. * */ /* * REVISION HISTORY * * 3.2 5.Dec.88 * Added ROT13 * * 3.1 9.Dec.87 * Added code to handle form-feeds (^L) properly. * * 3.0 29.Nov.87 * Added -p switch for printer margin handling. * * 2.0 21.Oct.87 * Added switches for tab handling and tab handling routines. * */ int ver = 3; /* Current version and revision numbers. */ int rev = 2; #define FALSE 0 #define TRUE 1 #define WIDTH 80 /* output width */ #define PL 66 /* total page length */ #define MT 3 /* top margin */ #define MB 3 /* bottom margin */ #define PO 8 /* page offset */ #define TABSPACE 8 /* tab positions */ /* Globals ... */ int hpos = 1; /* pointer to cursor position */ int vpos = 1; /* pointer to line number */ int rot13 = FALSE; /* Rotate the characters 13 places for encryption or decryption. */ int tabstrip = FALSE; /* flag to indicate if we are striping the tabs */ int printer = FALSE; /* printer mode */ main(argc, argv) int argc; char *argv[]; { FILE *fp, *fopen(); int i; if(argc == 1){ banner(); exit(0); } /* scan command line arguments, and look for files to work on. */ for (i = 1; i < argc; i++) { if (argv[i][0] != '-') { if((fp = fopen(argv[i],"rb")) == NULL) { printf("\rTOTXT: can't open %s \n",argv[i]); break; } else { filecopy(fp); fclose(fp); } } else { switch (argv[i][1]){ case 'R': case 'r': rot13 = TRUE; break; case 'T': case 't': tabstrip = TRUE; break; case 'P': case 'p': tabstrip = TRUE; printer = TRUE; break; default: printf(" I don't know the meaning of -%c.\n",argv[i][1]); banner(); exit(1); } } } } banner() { printf("TOTXT ver %d.%d Copyright 1987-1988 Timothy Pozar\n",ver,rev); printf("USAGE:\n"); printf("TOTXT [-[prt] filename.ext [filename.ext filename.ext ...]\n"); printf("\n"); printf(" Changes UNIX, MAC, IBM-PC, or CP/M newlines in their text files\n"); printf("into the local enviorment's newline convention. Also strips the high\n"); printf("bit off of the characters for word processed files, such as the output\n"); printf("files from WordStar(TM) MicroPro.\n"); printf(" Output is via the standard output device.\n"); printf(" Non-printing control characters will show as '^c'. Where c is the\n"); printf("control character's name in upper-case (eg. ^G = bell). ^@ and ^Z are\n"); printf("not displayed, but are tossed.\n"); printf(" Currently does not support wild cards.\n"); printf("\n"); printf("SWITCHES:\n"); printf(" -p = Printer mode:\n"); printf(" Strip tabs and replace with appropriate number of spaces.\n"); printf(" Inserts a top, bottom, and left hand margin.\n"); printf(" -t = strip tabs and replace with appropriate number of spaces.\n"); printf(" -r = ROT13 encryption or decryption.\n"); printf("\n"); printf(" Files will be processed in the order of the command line. If an\n"); printf("action switch is put after a file name, the file will not be processed\n"); printf("with the action specified by the switch.\n"); printf("eg. totxt FOO -t BOZO\n"); printf(" The file named FOO will not have it's tabs striped out, but the\n"); printf("file BOZO will.\n"); } filecopy(fp) FILE *fp; { int c; /* char to test and output */ int i,n; /* All around variable */ int CRFLAG = FALSE; /* 'NEWLINE' flags ... */ int LFFLAG = FALSE; while ((c = getc(fp)) != EOF){ if ((vpos == PL - MB + 1) && (printer)){ for (i = 0; i < MB; i++){ /* Do bottom margin */ printf("\n"); } vpos = 1; /* At top of page again */ hpos = 1; } if ((vpos == 1) && (printer)){ for (i = 0; i < MT; i++){ /* Create top margin */ printf("\n"); } vpos = MT + 1; /* At start of text vertical position */ hpos = 1; } if ((hpos == 1) && (printer)){ for (i = 0; i < PO; i++){ /* Do page offset */ printf(" "); } hpos = PO + 1; /* At start of text cursor position */ /* printf("%d",vpos); */ /* Start of line number hack */ } c = c & 0x7F; switch (c) { case 0: /* ^@ and */ case 26: /* ^Z Throw these guys out... */ break; case 9: /* Tab test */ n = totab(); if (!tabstrip) /* if not striping tabs then put out a real tab... */ putchar(c); while (n > 0) { if (tabstrip) /* ... and forget putting out spaces */ putchar(' '); ++hpos; --n; } break; case 10: /* Line feed test */ if (!CRFLAG){ printf("\n"); LFFLAG = TRUE; hpos = 1; vpos++; } break; case 12: /* Form feed test */ if (printer){ /* Eject the page */ eject(); } else { /* Just stick it out there */ putchar(c); CRFLAG = FALSE; LFFLAG = FALSE; ++hpos; } break; case 13: /* Carriage return test */ if (!LFFLAG){ printf("\n"); CRFLAG = TRUE; hpos = 1; vpos++; } break; default: if (((c >= 0) && (c <= 0x8)) || ((c >= 0xD) && (c <= 0x1F))){ putchar('^'); putchar(c + 0x40); ++hpos; } else { if (rot13){ if (((c >= 'A') && (c <= 'M')) || ((c >= 'a') && (c <= 'm'))){ putchar(c + 0xd); } if (((c >= 'N') && (c <= 'Z')) || ((c >= 'n') && (c <= 'z'))){ putchar(c - 0xd); } if (!(((c >= 'A') && (c <= 'Z')) || ((c >= 'a') && (c <= 'z')))){ putchar(c); } } else { putchar(c); } CRFLAG = FALSE; LFFLAG = FALSE; ++hpos; } break; } } if (printer) { /* Eject paper from printer */ eject(); } } totab() { int i; i = hpos - 1; while (i >= TABSPACE){ i = i - TABSPACE; } return (TABSPACE - i); } /* * Ejects the page via newlines. * */ eject() { int i; for (i = vpos; i < PL + 1; i++) { printf("\n"); } vpos = 1; hpos = 1; } -- ...sun!hoptoad!\ Tim Pozar >fidogate!pozar Fido: 1:125/406 ...lll-winken!/ PaBell: (415) 788-3904 USNail: KKSF / 77 Maiden Lane / San Francisco CA 94108
bkbarret@sactoh0.UUCP (Brent K. Barrett) (02/11/89)
In article <89Feb9.123853est.2662@godzilla.eecg.toronto.edu>, noworol@eecg.toronto.edu (Mark Noworolski) writes: > Can somebody send me a program to do this? Ok, you didn't say what language is was to be in, or what form, so here's a C program to do it for you. ---- /* * addcr.c * Copyright 1989 by Brent Barrett */ #define VERSION "1.0" #include "stdio.h" #include "io.h" #include "fcntl.h" #include "sys\stat.h" void can_not_open(char *fn); main(argc, argv) int argc; char *argv[]; { int in, ou; int j, l, i; unsigned long bytes_in = 0l, bytes_out = 0l; char in_buf[2048], ou_buf[3064]; if (argc < 3) { printf("\n Usage: C>addcr <input_file> <output_file>\n"); exit(0); } printf("ADDCR version %s\nCopyright 1989 by Brent Barrett\n", VERSION); if ((in=open(argv[1], O_RDONLY|O_BINARY)) == -1) can_not_open(argv[1]); if ((ou=open(argv[2], O_WRONLY|O_CREAT|O_BINARY, S_IREAD|S_IWRITE)) == -1) can_not_open(argv[2]); printf("\nWorking..."); while((l=_read(in, in_buf, 2048))) { bytes_in += l; for (i=0, j=0; i<l; i++) { if (in_buf[i] == 10) { ou_buf[j++] = 13; ou_buf[j++] = 10; } else { ou_buf[j++] = in_buf[i]; } } bytes_out += j; if (_write(ou, ou_buf, j) != j) { printf("\n\nError on output! Disk full?\n"); exit(1); } } printf("done.\n\n"); printf("%lu bytes read\n", bytes_in); printf("%lu bytes written\n", bytes_out); close(in); close(ou); } void can_not_open(fn) char *fn; { printf("\n ERROR opening %s\n", fn); exit(1); } ---- VI isn't nice to the upload regarding formatting, but you get the idea. -- "Somebody help me! I'm trapped in this computer!" Brent Barrett ..pacbell!sactoh0!bkbarret GEMAIL: B.K.BARRETT
wales@valeria.cs.ucla.edu (Rich Wales) (02/11/89)
In article <9717@pasteur.Berkeley.EDU> arya@eros.Berkeley.EDU (Manish Arya) writes: Incidentally, UNIX terminates each line of an ASCII file with a line feed (10 decimal). MSDOS, however, ends each line with a carriage return (13 decimal) and line feed. UNIX ends files with ^D (4 decimal) while MSDOS uses ^Z (26 decimal). A minor correction: UNIX text files *DO NOT* end in a ^D. UNIX text files (unlike DOS text files) have *NO* special end-of-file character. -- Rich Wales // UCLA Computer Science Department // +1 (213) 825-5683 3531 Boelter Hall // Los Angeles, California 90024-1596 // USA wales@CS.UCLA.EDU ...!(uunet,ucbvax,rutgers)!cs.ucla.edu!wales "The best diplomat I know is a fully charged phaser bank."
les@chinet.chi.il.us (Leslie Mikesell) (02/11/89)
In article <89Feb9.123853est.2662@godzilla.eecg.toronto.edu> noworol@eecg.toronto.edu (Mark Noworolski) writes: >Very frequently when I get stuff off the net I run into the problem of >no carriage returns. It appears that UNIX stores text a little differently >from Messdos. A lot of DOS programs don't really care about carriage returns which is handy when you run a unix machine as a network DOS file server. MKS vi, WP 5.0, Jove, LIST, Microsoft 'C' and (I think) Microsoft Word will take such files with no complaints and will add the CR's if you write the file back out. I'm sure many other programs work the same way, although TYPE'ing or PRINTing a file from DOS will not work without the CR's. Several communications programs will adjust 'on-the-fly', kermit has the 'set file type text' command and sb/rb have the -a option for text files. Les Mikesell
arwillms@crocus.waterloo.edu (Allan Willms) (02/11/89)
In article <89Feb9.123853est.2662@godzilla.eecg.toronto.edu> noworol@eecg.toronto.edu (Mark Noworolski) writes: >Very frequently when I get stuff off the net I run into the problem of >no carriage returns. It appears that UNIX stores text a little differently >from Messdos. On UNIX use: sed 's/$/^M/' unixfile > dosfile where ^M is CR or Control-M and gotten by ^V^M
spolsky-joel@CS.YALE.EDU (Joel Spolsky) (02/11/89)
In article <9717@pasteur.Berkeley.EDU> arya@eros.Berkeley.EDU.UUCP (Manish Arya) writes: > UNIX ends files with ^D (4 > decimal) while MSDOS uses ^Z (26 decimal). > >- Manish Arya False. Unix doesn't terminate files with anything. ^D is just the keyboard command to send an "EOF" to a program. By the way, the ^Z in DOS is not necessary either. As DOS knows the exact length of files, there is no need for an explicit EOF character. +----------------+----------------------------------------------------------+ | Joel Spolsky | bitnet: spolsky@yalecs.bitnet uucp: ...!yale!spolsky | | | internet: spolsky@cs.yale.edu voicenet: 203-436-1483 | +----------------+----------------------------------------------------------+ #include <disclaimer.h>
bkbarret@sactoh0.UUCP (Brent K. Barrett) (02/12/89)
In article <9717@pasteur.Berkeley.EDU>, arya@eros.Berkeley.EDU (Manish Arya) writes: - *CRUNCH* - > Incidentally, UNIX terminates each line of an ASCII file with a line > feed (10 decimal). MSDOS, however, ends each line with a carriage > return (13 decimal) and line feed. UNIX ends files with ^D (4 > decimal) while MSDOS uses ^Z (26 decimal). More accurately, MSDOS optionally uses ^Z to end a file. That's something it kept from CP/M to maintain compatibility. MSDOS text files do not need to be ^Z terminated. -- "Somebody help me! I'm trapped in this computer!" Brent Barrett ..pacbell!sactoh0!bkbarret GEMAIL: B.K.BARRETT
dhesi@bsu-cs.UUCP (Rahul Dhesi) (02/12/89)
In article <20387@shemp.CS.UCLA.EDU> wales@CS.UCLA.EDU (Rich Wales) writes:
UNIX text files *DO NOT* end in a ^D. UNIX text files (unlike DOS
text files) have *NO* special end-of-file character.
...And it's just as well; a file ought to end where it ends, not
somewhere else.
--
Rahul Dhesi UUCP: <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi
ARPA: bsu-cs!dhesi@iuvax.cs.indiana.edu
rogers@orion.SRC.Honeywell.COM (Brynn Rogers) (02/14/89)
In article <89Feb9.123853est.2662@godzilla.eecg.toronto.edu> noworol@eecg.toronto.edu (Mark Noworolski) writes: >Very frequently when I get stuff off the net I run into the problem of >no carriage returns. It appears that UNIX stores text a little differently >from Messdos. > >Bafore writing something to fix this problem... I figure somebody's probably >already done it. > >Can somebody send me a program to do this? >mark > I wrote a couple of shell scripts that use sed to make the conversion back and forth. Also I have a couple to convert filenames from lower case to UPPER and back again. Here they are: (note: they all are executible and work on whatever arguments they are given. wildcards are okay.) ==> mkdos <== ;;; this one goes from unix to dos files ;;; the sed command is 's/$/^M/g' and you get the ^M by ;;; typeing backslash return #! /bin/csh -f foreach filename ($*) echo processing $filename cat $filename | sed 's/$/ mv ~/.mkdos.tmp $filename end echo "Done" ==> mkunix <== #! /bin/csh -f foreach filename ($*) echo processing $filename cat $filename | sed "s/ mv ~/.mkunix.tmp $filename end echo "done " ==> mklc <== ;; make lower case #! /bin/csh -f foreach name ($*) mv $name `echo $name | tr '[A-Z]' '[a-z]'` end ==> mkuc <== ;; make upper case #! /bin/csh -f foreach name ($*) mv $name `echo $name | tr '[a-z]' '[A-Z]'` end 'Seek out new life and civilizations' | Brynn Rogers Honeywell S&RC "Honey, come see what I | UUCP: rogers@orion.uucp found in the refrigerator!" | !: {umn-cs,ems,bthpyb}!srcsip!rogers | Internet: rogers@src.honeywell.com
cdold@starfish.Convergent.COM (Clarence Dold) (02/15/89)
From article <6489@hoptoad.uucp>, by pozar@hoptoad.uucp (Tim Pozar): > noworol@eecg.toronto.edu (Mark Noworolski) wrote: >> Very frequently when I get stuff off the net I run into the problem of >> no carriage returns. It appears that UNIX stores text a little differently >> from Messdos. With Microsoft C on MSDOS, their concept of a text file allows for rather simple translation from UNIX to MSDOS: open the file, fgets a line, fputs a line. The output is an MSDOS CR/LF file. If you run the program again, magic!, no change. Apparently, in 'text' mode, Microsoft C accepts LF-delimited lines, but always writes CR/LF-delimited lines. A snippet of code: while ((fgets(inline,LINESZ,input)) != NULL ){ fputs(inline,stdout); } -- Clarence A Dold - cdold@starfish.Convergent.COM (408) 434-2083 ...pyramid!ctnews!professo!dold MailStop 18-011 P.O.Box 6685, San Jose, CA 95150-6685
rzh@lll-lcc.llnl.gov (Roger Hanscom) (02/16/89)
Regarding the end-of-line problem when moving between UNIX and MSDOS, try pulling the file into your favorite text editor and writing it back out. MicroEmacs sees the UNIX newline (LF) as an end-of-line and converts it to MSDOS (CR/LF) when it writes the file back out. Will other editors do this as well?? Not very high-tech, but hey, it works! (maybe?) ================================================================== roger rzh%freedom.llnl.gov@lll-lcc.llnl.gov Seen on a PYT's car near Walnut Creek, CA.: "sit on a happy face." ==================================================================
vail@tegra.UUCP (Johnathan Vail) (02/17/89)
editor and writing it back out. MicroEmacs sees the UNIX newline (LF) as an end-of-line and converts it to MSDOS (CR/LF) when it writes the file back out. Will other editors do this as well?? Not very high-tech, but hey, JOVE does this as well. Freemacs doesn't. Although it is very convenient sometimes a *real* character editor probably shouldn't do this. I just have mint function that will do it for me. That way in case I really want a ^J in a line my editor won't make these assumtpions for me. "Don't try this at home kids" _____ | | Johnathan Vail | tegra!N1DXG@ulowell.edu |Tegra| (508) 663-7435 | N1DXG @ 145.110-, 444.2+, 448.625- -----
gary@dvnspc1.Dev.Unisys.COM (Gary Barrett) (02/18/89)
In article <89Feb9.123853est.2662@godzilla.eecg.toronto.edu>, noworol@eecg.toronto.edu (Mark Noworolski) writes: > Very frequently when I get stuff off the net I run into the problem of > no carriage returns. It appears that UNIX stores text a little differently > from Messdos. > > Bafore writing something to fix this problem... I figure somebody's probably > already done it. > > Can somebody send me a program to do this? > mark > > -- > There's a really fine line between stupid and clever. > > Nigel - Lead Guitar, Spinal Tap > > noworol@ecf.toronto.edu There is indeed a fine line. This problem(?) is not restricted to MS-DOS versus UNIX. Nor is it restricted to CR/LF versus LF. There are a number of machines still in existence which identify end-of-line as a single CR. The problem is historic. It is also related to how teletypes and printers used to (still do) interpret the ASCII control characters during a print operation. Nonetheless, you are in luck with regard to CR/LF <-> LF translation. Most file transfer utilities allow you to customize ASCII transfer sessions such that end-of-lines will be magically converted just fine. Check the program you are using. You may already have the capability. Otherwise, check into getting a shareware version of Procomm. I think that it provides that capability.
malpass@vlsi.ll.mit.edu (Don Malpass) (02/21/89)
>> Can somebody send me a program to do this?
Like everyone else, I've been annoyed (and our unix laser printer gets
annoyed) by the DOS <CR> which tags along with the <LF> in
dos-originated files. Unix's "tr" (translate character) program
provides the easiest solution when in unix. I put a script called
"strip_cr" in my bin directory because I've needed to use it so often.
It discards any <CR> in the called file, parks the stripped copy in
/tmp, and renames it back to the original file. If you need wildcard
expansion or multiple files, write a "foreach" script.
# strip_cr: discards <CR> leaving only <LF>.
echo 'NOT a pipe. Renames output back to input.'
tr -d '\015' < "$1" >! /usr/tmp/dm.temp
mv /usr/tmp/dm.temp "$1"
--
Don Malpass [malpass@LL-vlsi.arpa], [malpass@gandalf.ll.mit.edu]
A flush beats a full house.
Plumber's Motto
fritz@friday.UUCP (Fritz Whittington) (02/25/89)
In article <273@vlsi.ll.mit.edu> malpass@ll-vlsi.arpa.UUCP (Don Malpass) writes: >It discards any <CR> in the called file, parks the stripped copy in >/tmp, and renames it back to the original file. If you need wildcard . . . ># strip_cr: discards <CR> leaving only <LF>. >echo 'NOT a pipe. Renames output back to input.' >tr -d '\015' < "$1" >! /usr/tmp/dm.temp >mv /usr/tmp/dm.temp "$1" >Don Malpass [malpass@LL-vlsi.arpa], [malpass@gandalf.ll.mit.edu] A useful addition is: tr -d '\032\015' < "$1" >! /usr/tmp/dm.temp to also delete those annoying <ctl-Z>'s that often are at the end of files. I also suggest tr -d '\032\015' < "$1" >! /usr/tmp/dm.temp$$ mv /usr/tmp/dm.temp$$ "$1" to generate temp names unique to your PID, and keep two users from clobbering each other's files. ---- Fritz Whittington Texas Instruments, Incorporated I don't even claim these opinions myself! MS 3105 UUCP: killer!ernest!friday!fritz 8505 Forest Lane AT&T: (214)480-6302 Dallas, Texas 75243