gisle@ifi.uio.no (Gisle Hannemyr) (12/30/89)
Posting-number: Volume 9, Issue 93 Submitted-by: gisle@ifi.uio.no (Gisle Hannemyr) Archive-name: pep/part02 # This is a shell archive [ part 2 of of 5 ] # Remove everything above and including the cut line. # Then run the rest of the file through /bin/sh (not csh). #--cut here-----cut here-----cut here-----cut here-----cut here-----cut here--# #!/bin/sh # shar: Shell Archiver # Execute the following text with /bin/sh to create the file(s): # Doc/pep.1l # This archive created: Fri Dec 29 14:42:42 1989 # Wrapped by: Gisle Hannemyr (gisle@ifi.uio.no) echo shar: extracting pep.1l sed 's/^XX//' << \SHAR_EOF > pep.1l XX.\" @(#)pep.1l 2.0 89/12/10 [gh] XX.\" Usage: XX.\" nroff -man pep.1l XX.TH PEP 1L "28 December 1989" "Version 2.1" XX.SH NAME XXpep \- a file detergent XX.SH SYNOPSIS XX.B pep XX[ XX.B \-a XX] XX[ XX.B \-b XX] XX[ XX.B \-c XX[ XX.I size XX]] XX[ XX.B \-d + | \- XX] XX.if n .ti +5 XX[ XX.B "\-e [ 0 | 1 | 2" XX]] XX[ XX.B \-g XX.I file XX] XX[ XX.B \-h XX] XX[ XX.B \-i + | \- XX] XX.if n .ti +5 XX[ XX.B \-k + | \- XX] XX[ XX.B \-m + | \- XX] XX[ XX.B \-o XX[ XX.B b XX]] XX[ XX.B \-p XX] XX.if n .ti +5 XX[ XX.B \-s XX[ XX.I size XX]] XX[ XX.B \-t XX[ XX.I size XX]] XX[ XX.B \-u XX.I terminator XX] XX[ XX.B \-v XX] XX.if n .ti +5 XX[ XX.B \-w + | \- XX] XX[ XX.B \-x XX] XX[ XX.B \-z XX] XX[ XX.I filename XX.B .\|.\|. XX] XX.SH DESCRIPTION XX.LP XX.B Pep XXis a filter program to "clean" files. It is named after a XXpopular Norwegian detergent. XX.PP XX.B Pep XXmay be used to remove control characters, strip parity bits, XXinterpret ANSI escape sequences, compress tabulation, XXextract strings and convert character sets. Nine out of ten hackers XXprefer "pep" to soap (which may very well explain why some of XXthem smell the way they do). XX.PP XX.B Pep XXis a filter. Its default operation is to read from standard input XX(the keyboard) and write on standard output (the terminal). XX.PP XXYou may also specify the name of one or more files as the last XXargument on the command line. Most versions of XX.B pep XX(not the version compiled for the DEC VMS operating system) XXallow ambiguous filename arguments, were a single XX.I filename XXargument may specify several files. XX.PP XXYou may instruct XX.B pep XXto write the result back onto the original input file with the XX.B \-o XXoption. If you use this option, the original file will be lost. XXIf you want to keep the original file (something that usually will XXbe the case when you do things like extracting strings from an XXexecutable file), you should make a copy of the file before applying XX.B XXpep, XXand filter the copy rather than the original. XXSome of the functions in XX.B XXpep XX(in particular those selected with the XX.B \-b XXand XX.B \-s XXoptions) may remove a lot of material from files, and it may be unfortunate if XXthis happens to the wrong file. It is probably a good idea to always use XX.B pep XXon copies until you have some experience with the various XX.BR pep \-options. XXYou may also use the XX.B b XXargument on the XX.B \-o XXoption to save the original in a .BAK-file. XX.PP XXTo get a brief summary of the command line syntax and all the options, XXyou need to specify the XX.B \-h XXoption. Just type the command: XX.sp 0.5 XX.RS XX.B pep \-h XX.RE XX.PP XXfollowed by the RETURN key. Note that just XX.B pep XXwill not give you this summary. The command: XX.sp 0.5 XX.RS XX.B pep XX.RE XX.PP XXwill start XX.B pep XXas a filter, and it will just echo back whatever you type, until you XXtype the end of file character (usually CTRL-D or CTRL-Z). XX.PP XXWhen XX.B pep XXis running as filter, it is reading from the standard input and XXwriting to the standard output. In this state, XX.B pep XXwill be very much less verbose than it usually is. It will still XXprint error messages, but very little else. Note that while: XX.sp 0.5 XX.RS XX.nf XX.B pep < foobar.in > foobar.out XX.B pep \-ob foobar.txt XX.fi XX.RE XX.PP XXwill do more or less the same job, the first will do it quietly, XXin the tradition of Unix filters; the latter will print the XXcopyright notice, a detailed list of the things it will do, XXand finally a list and line count XXof all the files it processes as it plods along. XX.PP XX.B Pep XXwill remove some "noise" from files, even if no options are specified. XXThe following is the default behavior: XX.RS XX.TP 3 XX\(bu XXremove trailing spaces; XX.TP 3 XX\(bu XXterminate each line with the canonical line terminator (usually LF, CR or both); XX.TP 3 XX\(bu XXremove underlining intended for backspacing printers; XX.TP 3 XX\(bu XXremove control characters (character codes < 32) except canonical line XXterminator, FF and TAB; XX.TP 3 XX\(bu XXbreak the line before the FF if a line contains an FF anywhere except in the XXfirst column. XX.RE XX.PP XXIf you want to check what XX.B pep XXactually intend to do to your file before it does it, you may make it XXpause with the XX.B \-p XXoption. For example: XX.sp 0.5 XX.RS XX.B pep \-p foobar.txt XX.RE XX.PP XXwill make XX.B pep XXstop after displaying a list of the conversions it will apply to the XXfile. The user is prompted and may choose to proceed XX(hitting the RETURN key), or abort XXthe program without doing anything (hitting CTRL-C). XX.PP XXThe user may want other conversions than the default action described XXabove. A number of conversion functions may be selected by specifying one or XXmore options on the command line. XX.PP XXSome of the options require an additional argument switch, and must be XXfollowed by a "+" or a "\-", other options XXrequire a number or a filename argument. XXMost of the options may be combined with other options, but a few are XXmutually exclusive. If the user specifies invalid options or option XXarguments, then XX.B pep XXwill abort with an error message and return an error exit code on XXoperating systems that support exit codes. XX.SH OPTIONS XX.TP XX.B \-a XXWrite out information about XX.B XXpep. XX.TP XX.B \-b XXRemove all characters not in the original 7-bit character set (ISO 646). XXI.e. remove the characters which are encoded from 128 to 255. XX(If this option is combined with the XX.B \-x XXoption, it will print the codes for these characters in hexadecimal XXinstead of removing them.) XXThe XX.B \-b XXoption is powerful, and may remove a lot of bytes if you use it XXon the wrong file. Only use it if you know exactly how the eight bit is XXused in the file you intend to filter. Also note that the options XX.B i, d, k, g, m, w XXor XX.B z XXin most cases are better suited to XXprocess files where the eight bit is set. XX.TP XX\fB\-c \fR[ \fIsize \fR] XXCompress space into tabulation. I.e. insert TAB characters when XXreplacing a run of two or more SPACE characters would produce a XXsmaller output file. XXThis function is the opposite of the function invoked with the XX.B \-t XXoption. XX.IP XXThe default tabulation size is 8, XXbut you may specify any other tabulation with the optional numeric XXargument. XX.TP XX.B \-d + | \- XXConvert to or from the ISO 8859/1 8 bit character set and the Norwegian XXversion of the ISO 646 7 bit character set. If the argument is "+", XXthe file is converted XX.I to XXISO 8859/1. If the argument is "\-", XXthe file is converted XX.I from XXISO 8859/1. The ISO 8859/1 character set is also XXknown as the "DEC Multinational Character Set". XX.TP XX\fB\-e \fR[ \fB0 | 1 | 2 \fR] XXInterpret ANSI screen control sequences (also known as ANSI ESCAPE XXsequences). This function makes XX.B pep XXemulate cursor positioning and other functions on an ANSI-terminal. XX.IP XX.B Pep XXwill complain about "strange" (i.e. implementation dependent) use of XXANSI escape sequences. XX.IP XX.B Pep XXwill normally save a screen image on the output file when one of XXtwo events occur: 1) When the screen is full and scrolls up; XXor 2) just before a screen image is erased with the "erase screen" XXANSI screen control sequence. In some cases important fields XXon the screen will be overwritten or erased. There XXis no good solution to this XXproblem, but XX.B pep XXprovides the user with some opportunity to guard against overwriting XXand erasure. This is done by specifying an additional numeric argument XXto the XX.B \-e XXoption. This numeric indicate the level of protection XXand is interpreted as follows: XX.sp 0.5 XX.RS XX.RS XX.TP 3 XX0: XXno protection \(em fields may be erased and overwritten XX(this is the default); XX.TP XX1: XXsequences that erase fields are ignored; XX.TP XX2: XXsequences that erase or overwrite fields are ignored. XX.RE XX.RE XX.TP XX\fB\-g \fIfile \fR XXRead the conversion table from a file. The name of the file must be XXappended as the argument to this option. XX.IP XXThe file itself is a standard ASCII text file where each line should XXcontain two decimal numbers. The first number is the character code XXto convert XX.I from, XXand the second number is the character code to convert XX.I to. XXA "#" character and all the following characters up to a NEWLINE is XXconsidered a comment, and is ignored. Comments are however echoed XXon the screen along with the other comments XX.B pep XXmakes, unless the comment line starts with a "##". XX.IP XXBelow is an example of how such a conversion file may look: XX.sp 0.5 XX.PP XX.ft B XX.nf XX.RS XX.RS XX# Convert from Macintosh to IBM-PC XX##This line is not echoed on the screen. XX# MAC IBM XX 174 146 XX 175 157 XX 129 143 XX 190 145 XX 191 155 XX 140 134 XX# EOF XX.RE XX.RE XX.fi XX.ft R XX.TP XX.B \-h XXWrite a brief summary of XX.B pep XXoptions, and exit. XX.TP XX.B \-i + | \- XXConvert to or from the IBM 8 bit character set (Code Page 850 Multilingual) XXand the Norwegian XXversion of the ISO 646 7 bit character set. If the argument is "+", XXthe file is converted XX.I to XXCP 850. If the argument is "\-", XXthe file is converted XX.I from XXCP 850. The CP 850 character set (or a subset of it) XXis what is used in the IBM PC, AT, and PS/2 series of XXcomputers and their clones. Note that some machines with XXAmerican PROMs have a yen- and cent character in XXthe position rightfully belonging to upper and lower case XXversions of the Norwegian character XXwritten as an "o" with a slash across it (often referred to as XX.IR oslash ). XX.TP XX.B \-k + | \- XXConvert to or from a 8 bit character set and the XXISO 646 7 bit character set. This is a modified version XXof the XX.B \-i XXfunction, hacked to preserve both the XX.I backslash XXcharacter and the upper case XX.I oslash XXcharacter as required by, among others, the "KnowledgeMan" package. These XXcharacters share the same code (92 decimal) in 7 bit ISO 646, XXbut uses different codes (92 is backslash, 157 is oslash) in XX8 bit CP 850. To get around this, two backslashes in ISO 646 XXwill be converted to the upper case oslash character in CP 850, while XXa single backslash will be preserved \(em and vice versa. XX.IP XXIf this option is combined with the XX.B \-d XXor XX.B \-m XXoption, the DEC/ISO or the Macintosh character sets is used as base XXinstead of CP 850. XX.TP XX.B \-m + | \- XXConvert to or from the Apple Macintosh 8 bit character set and the Norwegian XXversion of the ISO 646 7 bit character set. If the argument is "+", XXthe file is converted XX.I to XXthe Macintosh character set; if the argument is "\-", XXthe file is converted XX.I from XXthe Macintosh character set. XXSee description of XX.B \-v XXoption below and XXnote in "bugs" section below about treatment of "end-of-line" and XX"end-of-paragraph". XX.TP XX\fB\-o \fR[ \fBb \fR] XX.B Pep XXwill usually write the result of conversions on the standard output XX.I (stdout). XXThis option instead instructs XX.B pep XXto replace each named input file with a file containing the result XXof filtering the file through XX.B pep. XXIf the option is augmented with the argument XX.B b XX(i.e. XX.BR \-ob ), XXthen XX.B pep XXwill create a backup copy of the original input file on a file XXwith extension .BAK. If you just specify XX.B \-o XXthe original file is deleted. XX.IP XXThe VMS version of XX.B pep XXwill always run as if this option was specified. This is because XXVMS does not support useful redirection or pipes. Therefore, it is never XXnecessary to specify the XX.B \-o XXoption under VMS, but users should still specify XX.B \-ob XXif they want a backup copy of the original input file. XX.TP XX.B \-p XXWrite out a brief description the conversion functions that XXwill be activated by the current XXset of options, and pause. The user may review the list of XXconversion functions and abort (by hitting CTRL-C) if they do not have XXthe intended effect. XX.TP XX\fB\-s \fR[ \fIsize \fR] XXFind strings in extremely "noisy" files. XX.IP XX.BR Pep 's XXconcept of a string is that it is a sequence of "printable" characters XXof a certain length. The default minimum length of this sequence is XX4, but this may be changed by the user by supplying an optional XXnumeric argument that becomes the minimum length of the sequence. XX.IP XXThe default definition of a "printable" character is a symbol with XXencoding above 31 decimal (i.e. 32 to 255) plus certain XXcommon control characters (TAB, CR and LF). This definition XXis almost always too liberal, and will include a lot of "noise" in XXthe output. One or more of the options XX.B \-b, \-d, \-i, \-m XXor XX.B \-z XXshould be specified in addition to XX.B \-s XXin order to narrow the definition and the search space. XXIn my experience, the XX.B \-b XXoption is a particularly XXuseful additional filter when searching for strings. XX.TP XX\fB\-t \fR[ \fIsize \fR] XXExpand tabulation, replacing the TAB character with a suitable number XXof spaces. The default tabulation size is 8, but the optional XXnumeric argument XX.I size XXmay be used to set tabulation to any desired size. XX.TP XX\fB\-u r | n | s | - | # | \fInumber \fR XX.BR Pep 's XXdefault behaviour is to terminate lines with whatever is the XXcanonical line terminator (the standard way to terminate XXa text line) on the assumed target system for the output file. XXThis means CR/LF on a microcomputer system, LF on a UNIX system, XXand CR if the target is a Macintosh). The assumed target system XXis usually the system XX.B pep XXis running on, unless you request folding to the character set XXof another computer system. Then, that computer system becomes XXthe assumed target. XX.IP XXThe XX.B \-u XXoption allows you to override this assumption. XXYou do this by specifying explicit (in decimal) the numeric ASCII XXvalue of the end of line character you want in your output file. XXFor example, to make sure XXlines are terminated by LF (the standard for UNIX text files), XXyou may use XX.BR \-u10 , XXbecause 10 is the ASCII value of the newline (LF) control character. XXInstead of a numeric argument, you may specify XX.BR r , XXfor carrige return (CR), XX.BR n , XXfor newline (LF), XX.BR s , XXfor record separator (RS), the symbol XX.BR - , XXfor no line terminator, or the symbol XX.B # XXto get carrige return followed by a newline (CR/LF). XX.TP XX.B \-v XXNormally, XX.B pep XXwill terminate each line with the canonical line terminator. XXSome typesetting programs and word processors, however, require XXthat no hard line terminator is present within a paragraph, and XXthat only paragraphs are hard terminated. If you want to XXimport a file to such a typesetting program or word processor, XXyou may instruct XX.B pep XXto terminate paragraphs XX.I only XXwith this option. XX.IP XXSee note in "bugs" section below about treatment of "end-of-line" and XX"end-of-paragraph". XX.TP XX.B \-w + | \- XXThis slightly obsolete option converts files to and from the XXWordStar version 3.2 "document" mode. If the argument is "+", XXthe file is converted XX.I to XXWordStar document mode; if the argument is "\-", XXthe file is converted XX.I from XXWordStar document mode into plain ASCII text. XX.TP XX.B \-x XXExpand unprintable characters. This option XXwill make XX.B pep XXexpand the characters it would otherwise remove from the file by XXprinting the character encoding of these characters in XXhexadecimal between angle brackets. XX.TP XX.B \-z XXZero the eight bit (a.k.a. the parity bit) on all characters in the file. XX.SH ENVIRONMENT XX.PP XX.B Pep XXknows a single environment variable: XX.BR PEP , XXwhich may be XXused to indicate the lookup path for files with conversion XXtables. Below is some examples on how to set this in some XXoperating systems: XX.sp 0.5 XX.RS XX.nf XX\fBset PEP=c:\eusr\elib \fR(MS-DOS) XX\fBsetenv PEP /usr/local/lib \fR(UNIX) XX\fBdefine PEP "DISK_USR:<LOCAL.LIB>" \fR(VMS) XX.fi XX.RE XX.PP XXThe command to set this environment variable should usually be XXpart of the command file that is read during login (this may XXbe named XX.B "AUTOEXEC.BAT, LOGIN.COM, .profile" XXor XX.B .login XXdepending upon your choice of operating system. Please note XXthat environment variables do not exist under CP/M. XX.SH EXAMPLES XXSome of the examples below use i/o redirection and pipes, XXas indicated with the symbols ">" and "<" (redirection) XXand "|" (pipe symbol). These examples XXonly apply to operating systems that support XXredirection and pipes. XX.PP XX.TP 3 XX.B pep \-h XXPrint a quick summary of all available options, and exit. XX.TP XX.B "pep" XXRead input from standard input (the keyboard), and write XXthe result on standard output (the screen) until the user XXtypes the end of file character (usually CTRL-D (UNIX) or XXCTRL-Z (MS-DOS)). This is of limited practical use by XXitself, usually this command is inserted into the middle of a XXcommand where the standard input and standard output are pipes. XX.TP XX.B "pep < foo.bar XXDisplay a slightly cleaned-up version of the file XX.I foo.bar XXon the screen. XX.TP XX.B "pep < foo.bar > foo.txt" XXRead the file XX.I foo.bar, XXclean it, and write the result on the file XX.I foo.txt. XX.TP XX.B "pep foo.bar > foo.txt" XXRead the file XX.I foo.bar, XXclean it, and write the result on the file XX.I foo.txt. XX.TP XX.B "pep foo1.bar foo2.bar > foo.txt" XXRead the files XX.I "foo1.bar" XXand XX.I foo2.bar, XXclean them, and XXcatenate the result on the file XX.I foo.txt. XX.TP XX.B "pep \-o foo.fil bar.fil" XXClean the files XX.I foo.fil XXand XX.I bar.fil, XXreplacing the XXoriginal files with the cleaned-up versions. XX.TP XX.B "pep \-ob foo.fil bar.fil" XXClean the files XX.I foo.fil XXand XX.I bar.fil, XXreplacing the XXoriginal files with the cleaned-up versions. The original XXfiles are preserved as XX.I foo.bak XXand XX.I bar.bak. XX.TP XX.B "pep \-i+ \-o program.dok" XXConvert the Norwegian text in the file XX.I "program.dok" XXto use XXthe IBM-PC 8 bit character set. Please note that this XXconversion may not be 100 percent correct. For instance, XXthe pipe symbol "|" will be converted to the lower case Norwegian XX.I oslash XXcharacter. XXThis is because the pipe symbol and the character share the XXsame ASCII code (124) in the Norwegian version of the 7-bit character XXset, but they have different codes when XXusing 8-bit character sets. XX.TP XX.B "pep \-e2 \-o kermit.log" XXInterpret ANSI screen control sequences in the file XX.I kermit.log. XXSet guard to level 2 (no deletion or overwriting). XX.IP XXIn this example, it is assumed that the file XX.I kermit.log XXis a log record of an on-line session with some Bulletin Board System (BBS). XXSuch files may be created with the command "log session" in the popular XX.I kermit XXcommunication program. Most other communication programs have XXsimilar commands. Many BBSs uses XXuses ANSI sequences for simple graphics, highlighting and XXother special effects, and you will get a much more XXmore readable session log if you run it through XX.B pep XXwith the XX.B \-e XXoption turned on. XX.TP XX.B "test | pep \-e > test.scr" XXRun the program XX.I test, XXand pipe its output to XX.B pep, XXwhich interprets any ANSI sequences and store the resulting screen XXimages in the file XX.I test.scr. XXNote that this is only XXpossible on operating systems that support pipes (i.e. UNIX and MS-DOS). XX.IP XXThe screen images will now be on standard text files which have the same XXgeneral layout as the original screen images. This may be useful if XXyou need text versions of the screen images for inclusion in manuals or XXfor prototypes. XX.TP XX.B "nroff \-man \-Tlpr pep.1l | pep > pep.doc" XXGenerate a plain text version of this manual, without XXbackspaces or double strikes XX.RB ( nroff XXis the standard Unix text formatter). XX.TP XX.B "pep \-d- \-o *.txt" XXConvert all files with extension XX.B .txt XXfrom DEC/ISO character set to Norwegian 7-bit ASCII characters. XX.TP XX.B "pep \-gibm2mac \-ur \-< foo.ibm > foo.mac" XXUse the conversion table in the file XX.I "ibm2mac" XXto convert XXthe character set in the file XX.I foo.ibm. XXStore the result on the file XX.I foo.mac, XXwhere each line should be terminated by a single CR character. XX.TP XX.B "pep \-m\- < foo.mac | pep \-i+ > foo.ibm" XXConvert Apple Macintosh encoded Norwegian characters in the file XX.I "foo.mac" XXto IBM-PC (Code Page 850) encoding. This is an alternative way to XXaccomplish the same thing as the conversion done in the previous XXexample. XX.TP XX.B "pep \-w- \-o *.*" XXConvert all files in the current directory from WordStar document XXmode to 7-bit ASCII. XX.TP XX.B "pep \-w+ \-t4 < foo.txt > foo.ws" XXConvert the file XX.I "foo.txt" XXto WordStar document mode format, also expanding tabulation (tabstop = 4) XXto space characters. The result is stored on a file named XX.I foo.ws. XX.B Pep XXuses a simple pattern recognition mechanism to recognize pages, XXparagraphs, soft white space and soft hyphens. It will probably XXnot do a 100% conversion, but the file will be much easier to XXedit in WordStar than the original. XX.TP XX.B "pep \-z \-x < foo.dat > foo.dmp" XXStrip the 8th bit and expand control characters to hex XXdigits in the file XX.I foo.dat, XXand store the result on the file XX.I foo.dmp. XX.IP XXExpanding the unprintable characters to hexadecimal makes it easier to XXinspect a file in an ordinary text editor, and to post-process it XXby a customized filter you may create yourself XXwith the search/replace and macro XXfacilities found in many editors today. XX.TP XX.B "pep \-s6 \-b < pep.exe" XXExtract "strings" from the file XX.I pep.exe. XXThe strings are just listed on standard output (the screen). XX"Strings" are in this context assumed to be any sequence of characters XXthat are at least 6 characters long. The XX.B \-b XXoption excludes characters with codes in the range 128 to 255 from XXthe search. It is almost always a good idea to combine the XX.B \-b XXoption with XX.B \-s XXoption, otherwise to much garbage is picked up by the filter. XX.TP XX.B "pep \-t4 \-c8 \-o foo.c" XXIf both tab expansion XX.B \-t XXand tab compression XX.B \-c XXis specified, then XX.B pep XXwill repack the tabulation. This is useful if you want to convert XXa file from one tab-size to another (e.g. to convert non-standard XX4 character tabulation into standard 8 character tabulation). XXIn this example, two TAB characters in the file XX.I foo.c XXare replaced by a single tab character: and any TAB character that cannot be XXpaired up is replaced by the appropriate number of spaces. XX.TP XX.B "pep \-t \-c \-o foo.c" XXRemove redundant space characters in existing tabulation in the file XX.I foo.c. XXWhat happens is that tabulation on each line is first expanded and XXthen compressed again, which effectively XXremoves any space characters "inside" a tabulation. XX.SH DIAGNOSTICS XX.PP XXIf you specify an option that XX.B pep XXdoes not recognize, then XX.B pep XXwill XXwrite a summary of usage and abort. Other errors on the XXcommand line will result in XX.B pep XXwriting an error message XXbefore aborting. XX.PP XXOn operating systems that support exit codes, XX.B pep XXwill return an exit code upon termination. XX.PP XXIf XX.B pep XXis interpreting ANSI escape sequences and notices XXsyntactical or semantical errors in the way they are used, a XXwarning is printed on the screen, prefixed with the string XX"ansi:". This means that it is also possible to use XX.B pep XXto check if programs use ANSI sequences in a portable way. XX.SH FILES XX.TP 10 XX.B pep, pep.exe, pep.cmd XXexecutable file (actual name depends upon which operating system you use). XX.TP XX.B mac2ibm XXsmall example of a user supplied conversion table XXto convert from the Macintosh character set to that used on XXthe Norwegian version of the original IBM-PC (the sample file XXonly covers the Norwegian characters \(em to complete it is XXleft as an exercise to the reader :-) ). XX.TP XX.B ibm2mac XXinverse of XX.B mac2ibm: XXconversion table from a small subset of XXIBM CP 850 to Macintosh character set. XX.TP XX.B ebc2ns7 XXconversion table from the IBM EBCDIC character set to the Norwegian XXversion of the ASCII 7-bit character set (ISO646 NS4551). XX.TP XX.B ibm2ro8 XXconversion table from the IBM-PC 8-bit character XXset to Hewlett-Packard ROMAN8. XX.TP XX.B ro82ibm XXinverse of XX.B ibm2ro8: XXconversion table from ROMAN8 XXto IBM-PC character set. XX.TP XX.B ibm2iso XXconversion table from the IBM-PC CP 850 8-bit character XXset to ISO 8859/1. XX.TP XX.B iso2ibm XXinverse of XX.B ibm2iso: XXconversion table from ISO 8859/1 to CP 850. XX.SH AUTHOR XX.PP XXCopyright \(co 1989 Gisle Hannemyr. XX.PP XX.B Pep XXmay be freely distributed and copied, as long as this file XXis included in the distribution and that these statements XXabout authorship and copyright is not altered or removed. XX.PP XXBug reports, improvements, comments, suggestions and flames to: XX.ti +0.2i XXSnail: Gisle Hannemyr, Brageveien 3A, 0452 Oslo, Norway. XX.ti +0.2i XXEmail: gisle@nr.uninett (EAN); XX.ti +0.9i XXgisle@ifi.uio.no (Internet); XX.ti +0.9i XX\|.\|.\|.\|!mcvax!ifi!gisle (UUCP); XX.ti +0.9i XX(and several BBS mailboxes). XX.SH ACKNOWLEDGMENTS XX.PP XXThanks to Robert Andersson, for the SYS-V XX.I "rename" XXfunction; and to XXKnut Borge, Bjoern Larsen, Knut Omang and Geir-Harald Strand, XXfor elucidation of the unspeakeable mysteries of VMS. XXSpecial thanks are due Inge Arnesen for finding and fixed a bug, XX(and to Nils-Eivind Naas for bringing it to my attention). XX XXSeveral people have contributed ideas and/or bug reports. XXIn addition to those mentioned above, XXOla Garstad, Ottar Grimstad, XXTor Sjoewall, and Jens-Henrik Soerensen XXshould be mentioned. My apologies if anyone XXis forgotten. XX.SH SEE ALSO XX.LP XX.BR dd (1), XX.BR detex (1L), XX.BR convert (VMS), XX.BR expand (1), XX.BR od (1V), XX.BR strings (1), XX.BR tr (1), XX.BR unexpand (1). XX.PP XX.BR Detex (1L) XXis a lex-based program to convert LaTex and TeX manuscripts into plain XXASCII text. It is available from the author upon request. Those marked XXVMS are standard VMS utilities. The others are standard UNIX utilities. XX.SH BUGS XX.PP XXThere is a very strong Norwegian bias in XX.B pep. XXIn particular, XXthere exists several national versions of the ISO 646 7-bit XXcharacter set; but all built-in functions to convert between this XXand various 8-bit character sets (i.e. XX.B \-d, \-i, \-k XXand XX.BR \-m ) XXbluntly assumes the standard Norwegian version of the ISO 646. For XX.B pep XXto work with other national 7-bit character sets, the XXcompiled in conversion tables (type FOLDMATRIX for those who read the XXsource code) need to be extended. XX.PP XXThe VMS version of XX.B pep XXruns with the XX.B \-o XXoption permanently enabled. This is because VMS does not support an XXuseful i/o redirection or pipe mechanism. XX.PP XXThe VMS Record Management Service (RMS) knows of several record formats. XXYou can see what record format a file is by using the VMS DCL command XX.I "DIRECTORY/FULL" XXand examine the field "Record format". XXOn VMS systems, XX.B Pep XXwill always generate output files with record format set to "Stream_LF", XXbut some programs may require that the output file is in other XXformats. To fix this, it might be necessary to run the output of XX.B pep XXthrough the VMS XX.B CONVERT XXutility. Please see the DEC VMS manuals for details. XX.PP XXThe Macintosh "text only" format uses the carriage return (CR) character XX(ASCII 13) as terminator. Most text processors (e.g. MacWrite) XXseems capable of handling two conventions: XXOne is to use CR to terminate each line (and two or more XXconsequtive CR's between paragraphs); the other is to use CR between XXparagraphs only. XX.B Pep XXis also capable of handling both conventions. The default behaviour XXis to terminate each line, but the XX.B \-v XXoption may be used to terminate paragraphs only. XXPlease note that XX.B pep XXuses a rather simplistic heuristic to identify the end of a paragraph, XXit bluntly assumes that paragraphs are separated by blank lines. XX.PP XXIf you use the XX.B \-o XXoption, then the original input file will XXbe overwritten. Before you are familiar with XX.B pep, XXyou may XXfind that it sometimes removes more material than you expect XXfrom a file. It may be a good idea to always make a copy XXof the original file before you start experimenting with XX.B pep, XXor you may add the XX.B XX"b" XXargument to the XX.B XX\-o XXoption XX.B XX(\-ob). XX.PP XXThe built-in IBM-PC, DEC and Macintosh conversion tables XXconverts to and from the Norwegian version of 7-bit "ASCII" XXcharacters. You should use the XX.B \-g XXoption and "general" conversion tables for all other purposes. XX.PP XX.B Pep XXonly knows the ANSI sequences implemented in the XXstandard MS-DOS console driver XX.I XXANSI.SYS. XX.PP XXThere cannot be a space character between an option and the XXoption's argument (e.g. you'll have to use XX.B XX"\-gfoo.bar", XXnot XX.B XX"\-g foo.bar"). XX.PP XXPep will only filter "regular" files. It will skip directories, sockets XXand "special" files. XX.PP XXLinks are the GOTOs of file systems. If you run a hard linked file XXthrough pep using the XX.B \-o XXoption, the link will not be preserved. Pep will just skip soft XXlinked files. XX.PP XX.B Pep XXsearches for the conversion tables requested with the XX.B XX\-g XXoption in the following order: first the current directory, XXthen the directory of the file XX.I PEP.EXE XX(MS-DOS only), and finally the directory pointed to by the XX.B PEP XXenvironment XXvariable. XX.PP XX.B Pep XXknows nothing about the COFF-format and the XX.B \-s XXoption is XXprimitive compared to the UNIX command XX.IR strings (1). XXSo if you are on a UNIX-system \(em forget about the XX.B \-s XXoption and use XX.IR strings (1) XXinstead. XX.PP XX.B Pep XXwill not convert Word Perfect documents into plain ASCII. XXThis much requested function is, however, built into Word Perfect. XXIt is named "store as DOS-text" and is activated by pressing XXCTRL-F5 (at least in Word Perfect 4.2). XX.\" EOF SHAR_EOF if test 28373 -ne "`wc -c pep.1l`" then echo shar: error transmitting pep.1l '(should have been 28373 characters)' fi # End of shell archive exit 0