erc@pai.UUCP (Eric Johnson) (10/15/90)
This is for those of you who have SCO's OpenDesktop with a DOS under UNIX, or any other DOS under UNIX that has this problem. The problem is this: when you use a DOS-based copy command to copy a text file onto your system (from a PC floppy, say), that DOS text file is full of CR/LFs (instead of the UNIX line feed) and has a trailing Ctrl-Z. On SCO, there is a program to take care of this, called dtox. Unfortunately, dtox is a filter. That is, you call it with something like: dtox dosfile > unixfile This is nice, but I have a big problem. I have 30 to 40 files I want to un-DOS at a time. I want to be able to type something like undos *.txt And have a program go to work stripping all the extra DOS characters out of the files. In addition, dtox didn't seem to deal with the trailing Ctrl-Z properly. So, here is my (hacked) solution, undos.c undos takes all the files on its command line and converts the format to a UNIX text format (from a DOS text format). It's simple, dumb, and I'm sure you can come up with a better, more efficient method. Oh well, it works. Please note that this is NOT in the public domain. It is copyrighted in my name, but I used essentially the very liberal terms of the X Window copyrights. (More liberal than the GNU public license.) You can freely distribute this program so long as you keep my copyright message intact. There is absolutely no warranty of any kind with this software--you are on your own. You should be able to compile undos with a UNIX command like: cc -o undos undos.c I'm posting this in hope it helps save time for others out there. If it doesn't save you any time, it's not worth your bother. -Eric ----------------cut here for undos.c--------------------------------------- /* * undos.c * * Copyright 1990 Eric F. Johnson * * Permission to use, copy, modify, distribute, and sell this software and its * documentation for any purpose is hereby granted without fee, provided that * the above copyright notice appear in all copies and that both that * copyright notice and this permission notice appear in supporting * documentation, and that the name of E F Johnson not be used in advertising * or publicity pertaining to distribution of the software without specific, * written prior permission. I (Eric Johnson) make no representations about the * suitability of this software for any purpose. It is provided "as is" * without express or implied warranty. * * I DISCLAIM ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL I * BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION * OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. * * Author: Eric F. Johnson * * undos.c * * Program to strip carriage returns and control-Z's * from a DOS-based text file. This program acts * like the SCO program dtox, but in acts on the * file in place, as well as strips the trailing * control-Z from the DOS file. By saying this program * acts on a text file in place, I mean that it will * overwrite the source file. * * Usage is: * undos file1 file2 file3 ... * * Where each file is an ASCII text file in DOS format. * * 12 October 90 * */ #include <stdio.h> main( argc, argv ) int argc; char *argv[]; { /* main */ int i; char *temp_file, *mktemp(); /* * Get a temporary file name * to use for storing the * un-DOS-ed file until * we're done. */ temp_file = mktemp( "dosXXXXX" ); if ( argc < 2 ) { fprintf( stderr, "Error: Usage is undos dosfile1 dosfile2...\n" ); } for( i = 1; i < argc; i++ ) { printf( "Converting %s to a UNIX text file.\n", argv[i] ); /* * Remove CR/LFs and Ctrl-Zs */ undos_file( argv[i], temp_file ); /* * Delete temp_file when done */ unlink( temp_file ); } exit( 0 ); } /* main */ undos_file( source_file, temp_file ) char source_file[]; char temp_file[]; { /* undos_file */ FILE *in_file, *outfile; int c; in_file = fopen( source_file, "r" ); outfile = fopen( temp_file, "w" ); if ( ( in_file == (FILE *) NULL ) || ( outfile == (FILE *) NULL ) ) { fprintf( stderr, "Error in opening files %s or %s\n", source_file, temp_file ); return( -1 ); } while( !feof( in_file ) ) { c = fgetc( in_file ); if ( !feof( in_file ) ) { if ( ( c == 26 ) || /* Ctrl-Z */ ( c > '~' ) ) { c = '\n'; } if ( c != '\r' ) { fputc( c, outfile ); } } } fclose( in_file ); fclose( outfile ); /* * Now, copy the file back */ in_file = fopen( temp_file, "r" ); outfile = fopen( source_file, "w" ); if ( ( in_file == (FILE *) NULL ) || ( outfile == (FILE *) NULL ) ) { fprintf( stderr, "Error in opening files %s or %s\n", source_file, temp_file ); unlink( temp_file ); return( -1 ); } while( !feof( in_file ) ) { c = fgetc( in_file ); if ( !feof( in_file ) ) { fputc( c, outfile ); } } fclose( in_file ); fclose( outfile ); return( 0 ); } /* undos_file */ /* * end of file */ ----------------cut here --------------------------------------- -- Eric F. Johnson phone: +1 612 894 0313 BTI: Industrial Boulware Technologies, Inc. fax: +1 612 894 0316 automation systems 415 W. Travelers Trail email: erc@pai.mn.org and services Burnsville, MN 55337 USA
tneff@bfmny0.BFM.COM (Tom Neff) (10/16/90)
In article <1477@pai.UUCP> erc@pai.UUCP (Eric Johnson) writes: >dtox. Unfortunately, dtox is a filter. That is, you call it >with something like: > > dtox dosfile > unixfile > >This is nice, but I have a big problem. I have 30 to 40 files I >want to un-DOS at a time. The solution is to learn how to use the shell. for f in *.txt do g=`echo $f | sed -e 's/txt$/out/'` # sample.txt -> sample.out dtox < $f > $g done I bet even SCO supports this construct. ;-) -- War is like love; it always \%\%\% Tom Neff finds a way. -- Bertold Brecht %\%\%\ tneff@bfmny0.BFM.COM
johnl@esegue.segue.boston.ma.us (John R. Levine) (10/16/90)
In article <1477@pai.UUCP> erc@pai.UUCP (Eric Johnson) writes: >The problem is this: when you use a DOS-based copy command to copy a text >file onto your system (from a PC floppy, say), that DOS text file >is full of CR/LFs (instead of the UNIX line feed) and has a trailing >Ctrl-Z. [172 line program follows] Here's a six-line shell script that does the same thing. I call it uncr. #!/bin/sh # Get rid of carriage returns in files # Dedicated to the public domain, do anything with it you want. -jrl for i do echo $i: qfile=`dirname $i`/QQ`basename $i` mv $i $qfile && tr -d \\015\\032 <$qfile >$i && rm $qfile done -- John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650 johnl@esegue.segue.boston.ma.us, {ima|spdcc|world}!esegue!johnl Atlantic City gamblers lose $8200 per minute. -NY Times
jgd@rsiatl.UUCP (John G. DeArmond) (10/16/90)
erc@pai.UUCP (Eric Johnson) writes: >This is for those of you who have SCO's OpenDesktop with a DOS >under UNIX, or any other DOS under UNIX that has this problem. >The problem is this: when you use a DOS-based copy command to copy a text >file onto your system (from a PC floppy, say), that DOS text file >is full of CR/LFs (instead of the UNIX line feed) and has a trailing >Ctrl-Z. On SCO, there is a program to take care of this, called >dtox. Unfortunately, dtox is a filter. That is, you call it >with something like: [program with BIG copyright deleted.] Please don't take this wrong but your approach, while probably necessary in a DOS tool-less environment, is terrible for Unix. Here's how you do it without any programming. Get to know Mr. Shell. He is your friend. Here's how: for i in `ls *.txt` do # takes care of read-only temp file name collisions rm -f /tmp/$i >/dev/null 2>&1 tr -d '\032''\015' <$i >/tmp/$i if [ -z $? -a -f /tmp/$i] then mv -f /tmp/$i $i else rm -f /tmp/$i >/dev/null 2>&1 # just in case echo "tr returned an error on file $i" exit fi done If you want to put this in a shell script, simply substitute this for the first line: for i in `ls $*` What this script does is first execute the command in back-ticks ("ls *.txt") and then steps through the list of files via the shell variable "i". Each file is run through tr (translate) invoked in its "dump" mode (-d). Tr is told to dump ^M (octal 015) and ^Z (octal 032). The return code from tr is stored in the shell intrinsic "$?". If tr is successful, this value will be 0. The "if" statement checks to see if tr ran ok AND if the temporary file was created ok and if so moves the temporary file back on top of the original. There are even simpler ways to do this, but this is what popped out of my head when reading your post. There are several unaddressed error conditions in this script, such as when a temp file name collision occurs and the temp file is not owned by you, but these problems are left as an exercise to the reader :-) You could, of course, use dtox in place of tr but this solution is unix vendor-independent. You could also use sed, awk, Perl (if installed) and who knows what else. In other words, get with the Unix tools show, man :-) Minor programming note. I don't usually critique coding practices on the net but in this case I gotta. Your approach is terribly inefficient, requiring twice as much system resource as necessary. Namely, you first process the input file a character at a time (which is OK for a quick hack) and then you copy the temp file back onto the input file a character at a time (NO NO). The easist way to move the temp file back onto the original is to use a system() call with mv. Example: sprintf(tmpstr,"mv %s %s"", tmpname, filename); system(tmpstr); For a bit of error checking, you could fork() and exec() mv and look at the return code from wait(). Or, assuming the files are both on the same file system, you could simply rm() the old file, link() the old name to the temp file and rm() the temp file. That is the most efficient way of doing it. While one could (successfully) argue that a system() or fork() system call would be more expensive than processing small files a byte at a time, for typical files, this would not be the case. And for machines that process I/O system calls slowly (NCR towers come to mind), even small files would seriously degrade performance, especially if you are doing a lot of them. John John De Armond, WD4OQC | "The truly ignorant in our society are those people Radiation Systems, Inc. | who would throw away the parts of the Constitution Atlanta, Ga | they find inconvienent." -me Defend the 2nd {emory,uunet}!rsiatl!jgd| with the same fervor as you do the 1st.
rcd@ico.isc.com (Dick Dunn) (10/17/90)
erc@pai.UUCP (Eric Johnson) writes about converting DOS text files (with CR-LF line terminators and final ^Z) for UNIX. > ...On SCO, there is a program to take care of this, called > dtox... Much to my surprise, he's right that SCO really did make it a program. Back in the days of UNIX, we would have used one of the little filter programs (like tr or sed) that came with the system. Oh well, "forward into the past" and "programmer's full employment" and all that. >...Unfortunately, dtox is a filter. That is, you call it > with something like: > > dtox dosfile > unixfile Why is that a problem? A filter is just slightly more general: You can apply a filter to files; a program written to handle only files can't be used in a pipe sequence. But let's forge ahead... > This is nice, but I have a big problem. I have 30 to 40 files I > want to un-DOS at a time. I want to be able to type something > like > undos *.txt A big problem? Why not (in sh notation): for f in *.txt do cp $f /tmp/d$$ dtox /tmp/d$$ >$f done rm /tmp/d$$ or go all the way to a UNIX approach and replace the dtox line with tr -d '\015\032' </tmp/d$$ >$f -- Dick Dunn rcd@ico.isc.com -or- ico!rcd Boulder, CO (303)449-2870 ...Never offend with style when you can offend with substance.
rpeglar@csinc.UUCP (Rob Peglar) (10/17/90)
In article <1990Oct16.134008.22319@esegue.segue.boston.ma.us>, johnl@esegue.segue.boston.ma.us (John R. Levine) writes: > In article <1477@pai.UUCP> erc@pai.UUCP (Eric Johnson) writes: > >The problem is this: when you use a DOS-based copy command to copy a text > >file onto your system (from a PC floppy, say), that DOS text file > >is full of CR/LFs (instead of the UNIX line feed) and has a trailing > >Ctrl-Z. [172 line program follows] I give up. doscp -m a:file.ext dir/file Rob -- Rob Peglar Comtrol Corp. 2675 Patton Rd., St. Paul MN 55113 A Control Systems Company (800) 926-6876 ...uunet!csinc!rpeglar
ken@metaware.metaware.com (ken) (10/17/90)
In article <1990Oct16.134008.22319@esegue.segue.boston.ma.us> johnl@esegue.segue.boston.ma.us (John R. Levine) writes: >In article <1477@pai.UUCP> erc@pai.UUCP (Eric Johnson) writes: >>The problem is this: when you use a DOS-based copy command to copy a text >>file onto your system (from a PC floppy, say), that DOS text file >>is full of CR/LFs (instead of the UNIX line feed) and has a trailing >>Ctrl-Z. [172 line program follows] Here's yet another solution. This closely emulates Sun's dos2unix program. #include <stdio.h> #include <ctype.h> #include <errno.h> main(argc, argv) int argc; char *argv[]; { int c; FILE *ifp=NULL, *ofp=NULL; extern void exit(); if (argc != 3 && argc != 2 && argc != 1) printf("\n\tUsage: %s [infile [outfile]]\n\n", argv[0]); else { switch (argc) { case 1: ifp = stdin; ofp = stdout; break; case 2: if ((ifp = fopen(argv[1], "r")) == NULL) { perror(argv[1]); exit(errno); } ofp = stdout; break; case 3: if ((ifp = fopen(argv[1], "r")) == NULL) { perror(argv[1]); exit(errno); } if ((ofp = fopen(argv[2], "w")) == NULL) { perror(argv[1]); exit(errno); } break; } while ((c = getc(ifp)) != EOF) { if ((c != ' putc(c, ofp); } if (ifp != NULL) fclose(ifp); if (ofp != NULL) fclose(ofp); exit(0); } exit(-1); }
shore@mtxinu.COM (Melinda Shore) (10/19/90)
In article <4339@rsiatl.UUCP> jgd@rsiatl.UUCP (John G. DeArmond) writes: >While one could (successfully) argue that a system() or fork() system >call would be more expensive than processing small files a byte at a time, >for typical files, this would not be the case. And for machines that >process I/O system calls slowly (NCR towers come to mind), even small >files would seriously degrade performance, especially if you are doing >a lot of them. I'm not going to get into whether or not the program was great code, but it's worth pointing out that using stdio *is* a reasonably efficient general-case approach. Remember that the library is doing i/o buffering for you in BUFSIZ chunks, which allows you to do what looks like single-character processing on top of buffered i/o. Also, underneath it all, the OS is not going to be doing a disk read for every read() - that's what the buffer cache is all about. (It may do a memory/memory copy, but that's another matter.) Anyway, the point is that you shouldn't be afraid to use stdio if you're worried about efficiency. I never use the system() library routine on SCO. Well, I never use it anyway (it gets the shell involved and does more than I usually want done), but it seems to me that it's particularly to be avoided with SCO because of the way it resets certain signal handlers, in particular SIGCLD. You can avoid doing the copy yourself if the files are on the same filesystem by doing something like link(oldfile, newfile); unlink(oldfile); If the files are on different filesystems somebody is going to have to do the copy, whether it's mv or you do it yourself. Again, stdio will handle the buffering for you and doing getc()/putc() kinds of things isn't inherently inefficient. -- Melinda Shore shore@mtxinu.com mt Xinu ..!uunet!mtxinu.com!shore
emanuele@overlf.UUCP (Mark A. Emanuele) (10/23/90)
In article <232@csinc.UUCP>, rpeglar@csinc.UUCP (Rob Peglar) writes: > > I give up. > > doscp -m a:file.ext dir/file just try doing doscp -m a:*.* dir and see what happens. doscp can't expand wildcards on the dos drive. what you have to do is this for i in `dosls a:` do doscp -m a:${i} dir done -- Mark A. Emanuele V.P. Engineering Overleaf, Inc. 500 Route 10 Ledgewood, NJ 07852-9639 attmail!overlf!emanuele (201) 927-3785 Voice (201) 927-5781 fax emanuele@overlf.UUCP