[comp.sources.misc] PEP filter program part 2 of 5

gisle@ifi.uio.no (Gisle Hannemyr) (12/30/89)

Posting-number: Volume 9, Issue 93
Submitted-by: gisle@ifi.uio.no (Gisle Hannemyr)
Archive-name: pep/part02

# This is a shell archive [ part 2 of of 5 ]
# Remove everything above and including the cut line.
# Then run the rest of the file through /bin/sh (not csh).
#--cut here-----cut here-----cut here-----cut here-----cut here-----cut here--#
#!/bin/sh
# shar: Shell Archiver
# Execute the following text with /bin/sh to create the file(s):
#	Doc/pep.1l
# This archive created: Fri Dec 29 14:42:42 1989
# Wrapped by: Gisle Hannemyr (gisle@ifi.uio.no)
echo shar: extracting pep.1l
sed 's/^XX//' << \SHAR_EOF > pep.1l
XX.\" @(#)pep.1l 2.0 89/12/10 [gh]
XX.\" Usage:
XX.\"    nroff -man pep.1l
XX.TH PEP 1L "28 December 1989" "Version 2.1"
XX.SH NAME
XXpep \- a file detergent
XX.SH SYNOPSIS
XX.B pep
XX[
XX.B \-a
XX]
XX[
XX.B \-b
XX]
XX[
XX.B \-c
XX[
XX.I size
XX]]
XX[
XX.B \-d + | \-
XX]
XX.if n .ti +5
XX[
XX.B "\-e [ 0 | 1 | 2"
XX]]
XX[
XX.B \-g
XX.I file
XX]
XX[
XX.B \-h
XX]
XX[
XX.B \-i + | \-
XX]
XX.if n .ti +5
XX[
XX.B \-k + | \-
XX]
XX[
XX.B \-m + | \-
XX]
XX[
XX.B \-o
XX[
XX.B b
XX]]
XX[
XX.B \-p
XX]
XX.if n .ti +5
XX[
XX.B \-s
XX[
XX.I size
XX]]
XX[
XX.B \-t
XX[
XX.I size
XX]]
XX[
XX.B \-u
XX.I terminator
XX]
XX[
XX.B \-v
XX]
XX.if n .ti +5
XX[
XX.B \-w + | \-
XX]
XX[
XX.B \-x
XX]
XX[
XX.B \-z
XX]
XX[
XX.I filename
XX.B .\|.\|.
XX]
XX.SH DESCRIPTION
XX.LP
XX.B Pep
XXis a filter program to "clean" files.  It is named after a
XXpopular Norwegian detergent.
XX.PP
XX.B Pep
XXmay be used to remove control characters, strip parity bits,
XXinterpret ANSI escape sequences, compress tabulation,
XXextract strings and convert character sets.  Nine out of ten hackers
XXprefer "pep" to soap (which may very well explain why some of
XXthem smell the way they do).
XX.PP
XX.B Pep
XXis a filter.  Its default operation is to read from standard input
XX(the keyboard) and write on standard output (the terminal).
XX.PP
XXYou may also specify the name of one or more files as the last
XXargument on the command line.  Most versions of
XX.B pep
XX(not the version compiled for the DEC VMS operating system)
XXallow ambiguous filename arguments, were a single
XX.I filename
XXargument may specify several files.
XX.PP
XXYou may instruct
XX.B pep
XXto write the result back onto the original input file with the
XX.B \-o
XXoption.  If you use this option, the original file will be lost.
XXIf you want to keep the original file (something that usually will
XXbe the case when you do things like extracting strings from an
XXexecutable file), you should make a copy of the file before applying
XX.B
XXpep,
XXand filter the copy rather than the original.
XXSome of the functions in
XX.B
XXpep
XX(in particular those selected with the
XX.B \-b
XXand
XX.B \-s
XXoptions) may remove a lot of material from files, and it may be unfortunate if
XXthis happens to the wrong file.  It is probably a good idea to always use
XX.B pep
XXon copies until you have some experience with the various
XX.BR pep \-options.
XXYou may also use the
XX.B b
XXargument on the
XX.B \-o
XXoption to save the original in a .BAK-file.
XX.PP
XXTo get a brief summary of the command line syntax and all the options,
XXyou need to specify the
XX.B \-h
XXoption.  Just type the command:
XX.sp 0.5
XX.RS
XX.B pep \-h
XX.RE
XX.PP
XXfollowed by the RETURN key.  Note that just
XX.B pep
XXwill not give you this summary.  The command:
XX.sp 0.5
XX.RS
XX.B pep
XX.RE
XX.PP
XXwill start
XX.B pep
XXas a filter, and it will just echo back whatever you type, until you
XXtype the end of file character (usually CTRL-D or CTRL-Z).
XX.PP
XXWhen
XX.B pep
XXis running as filter, it is reading from the standard input and
XXwriting to the standard output.  In this state,
XX.B pep
XXwill be very much less verbose than it usually is.  It will still
XXprint error messages, but very little else.  Note that while:
XX.sp 0.5
XX.RS
XX.nf
XX.B pep < foobar.in > foobar.out
XX.B pep \-ob foobar.txt
XX.fi
XX.RE
XX.PP
XXwill do more or less the same job, the first will do it quietly,
XXin the tradition of Unix filters; the latter will print the
XXcopyright notice, a detailed list of the things it will do,
XXand finally a list and line count
XXof all the files it processes as it plods along.
XX.PP
XX.B Pep
XXwill remove some "noise" from files, even if no options are specified.
XXThe following is the default behavior:
XX.RS
XX.TP 3
XX\(bu
XXremove trailing spaces;
XX.TP 3
XX\(bu
XXterminate each line with the canonical line terminator (usually LF, CR or both);
XX.TP 3
XX\(bu
XXremove underlining intended for backspacing printers;
XX.TP 3
XX\(bu
XXremove control characters (character codes < 32) except canonical line
XXterminator, FF and TAB;
XX.TP 3
XX\(bu
XXbreak the line before the FF if a line contains an FF anywhere except in the
XXfirst column.
XX.RE
XX.PP
XXIf you want to check what
XX.B pep
XXactually intend to do to your file before it does it, you may make it
XXpause with the
XX.B \-p
XXoption.  For example:
XX.sp 0.5
XX.RS
XX.B pep \-p foobar.txt
XX.RE
XX.PP
XXwill make
XX.B pep
XXstop after displaying a list of the conversions it will apply to the
XXfile.  The user is prompted and may choose to proceed
XX(hitting the RETURN key), or abort
XXthe program without doing anything (hitting CTRL-C).
XX.PP
XXThe user may want other conversions than the default action described
XXabove.  A number of conversion functions may be selected by specifying one or
XXmore options on the command line.
XX.PP
XXSome of the options require an additional argument switch, and must be
XXfollowed by a "+" or a "\-", other options
XXrequire a number or a filename argument.
XXMost of the options may be combined with other options, but a few are
XXmutually exclusive.  If the user specifies invalid options or option
XXarguments, then
XX.B pep
XXwill abort with an error message and return an error exit code on
XXoperating systems that support exit codes.
XX.SH OPTIONS
XX.TP
XX.B \-a
XXWrite out information about
XX.B
XXpep.
XX.TP
XX.B \-b
XXRemove all characters not in the original 7-bit character set (ISO 646).
XXI.e. remove the characters which are encoded from 128 to 255.
XX(If this option is combined with the
XX.B \-x
XXoption, it will print the codes for these characters in hexadecimal
XXinstead of removing them.)
XXThe
XX.B \-b
XXoption is powerful, and may remove a lot of bytes if you use it
XXon the wrong file.  Only use it if you know exactly how the eight bit is
XXused in the file you intend to filter.  Also note that the options
XX.B i, d, k, g, m, w
XXor
XX.B z
XXin most cases are better suited to
XXprocess files where the eight bit is set.
XX.TP
XX\fB\-c \fR[ \fIsize \fR]
XXCompress space into tabulation.  I.e. insert TAB characters when
XXreplacing a run of two or more SPACE characters would produce a
XXsmaller output file.
XXThis function is the opposite of the function invoked with the
XX.B \-t
XXoption.
XX.IP
XXThe default tabulation size is 8,
XXbut you may specify any other tabulation with the optional numeric
XXargument.
XX.TP
XX.B \-d + | \-
XXConvert to or from the ISO 8859/1 8 bit character set and the Norwegian
XXversion of the ISO 646 7 bit character set.  If the argument is "+",
XXthe file is converted
XX.I to
XXISO 8859/1.  If the argument is "\-",
XXthe file is converted
XX.I from
XXISO 8859/1.  The ISO 8859/1 character set is also
XXknown as the  "DEC Multinational Character Set".
XX.TP
XX\fB\-e \fR[ \fB0 | 1 | 2 \fR]
XXInterpret ANSI screen control sequences (also known as ANSI ESCAPE
XXsequences).  This function makes
XX.B pep
XXemulate cursor positioning and other functions on an ANSI-terminal.
XX.IP
XX.B Pep
XXwill complain about "strange" (i.e. implementation dependent) use of
XXANSI escape sequences.
XX.IP
XX.B Pep
XXwill normally save a screen image on the output file when one of
XXtwo events occur:  1) When the screen is full and scrolls up;
XXor 2) just before a screen image is erased with the "erase screen"
XXANSI screen control sequence.  In some cases important fields
XXon the screen will be overwritten or erased.  There
XXis no good solution to this
XXproblem, but
XX.B pep
XXprovides the user with some opportunity to guard against overwriting
XXand erasure.  This is done by specifying an additional numeric argument
XXto the
XX.B \-e
XXoption.  This numeric indicate the level of protection
XXand is interpreted as follows:
XX.sp 0.5
XX.RS
XX.RS
XX.TP 3
XX0:
XXno protection \(em fields may be erased and overwritten
XX(this is the default);
XX.TP
XX1:
XXsequences that erase fields are ignored;
XX.TP
XX2:
XXsequences that erase or overwrite fields are ignored.
XX.RE
XX.RE
XX.TP
XX\fB\-g \fIfile \fR
XXRead the conversion table from a file.  The name of the file must be
XXappended as the argument to this option.
XX.IP
XXThe file itself is a standard ASCII text file where each line should
XXcontain two decimal numbers.  The first number is the character code
XXto convert
XX.I from,
XXand the second number is the character code to convert
XX.I to.
XXA "#" character and all the following characters up to a NEWLINE is
XXconsidered a comment, and is ignored.  Comments are however echoed
XXon the screen along with the other comments
XX.B pep
XXmakes, unless the comment line starts with a "##".
XX.IP
XXBelow is an example of how such a conversion file may look:
XX.sp 0.5
XX.PP
XX.ft B
XX.nf
XX.RS
XX.RS
XX# Convert from Macintosh to IBM-PC
XX##This line is not echoed on the screen.
XX# MAC IBM
XX  174 146
XX  175 157
XX  129 143
XX  190 145
XX  191 155
XX  140 134
XX# EOF
XX.RE
XX.RE
XX.fi
XX.ft R
XX.TP
XX.B \-h
XXWrite a brief summary of
XX.B pep
XXoptions, and exit.
XX.TP
XX.B \-i + | \-
XXConvert to or from the IBM 8 bit character set (Code Page 850 Multilingual)
XXand the Norwegian
XXversion of the ISO 646 7 bit character set.  If the argument is "+",
XXthe file is converted
XX.I to
XXCP 850.  If the argument is "\-",
XXthe file is converted
XX.I from
XXCP 850.  The CP 850 character set (or a subset of it)
XXis what is used in the IBM PC, AT, and PS/2 series of
XXcomputers and their clones.  Note that some machines with
XXAmerican PROMs have a yen- and cent character in
XXthe position rightfully belonging to upper and lower case
XXversions of the Norwegian character
XXwritten as an "o" with a slash across it (often referred to as
XX.IR oslash ).
XX.TP
XX.B \-k + | \-
XXConvert to or from a 8 bit character set and the
XXISO 646 7 bit character set.  This is a modified version
XXof the
XX.B \-i
XXfunction, hacked to preserve both the
XX.I backslash
XXcharacter and the upper case
XX.I oslash
XXcharacter as required by, among others, the "KnowledgeMan" package.  These
XXcharacters share the same code (92 decimal) in 7 bit ISO 646,
XXbut uses different codes (92 is backslash, 157 is oslash) in
XX8 bit CP 850.  To get around this, two backslashes in ISO 646
XXwill be converted to the upper case oslash character in CP 850, while
XXa single backslash will be preserved \(em and vice versa.
XX.IP
XXIf this option is combined with the
XX.B \-d
XXor
XX.B \-m
XXoption, the DEC/ISO or the Macintosh character sets is used as base
XXinstead of CP 850.
XX.TP
XX.B \-m + | \-
XXConvert to or from the Apple Macintosh 8 bit character set and the Norwegian
XXversion of the ISO 646 7 bit character set.  If the argument is "+",
XXthe file is converted
XX.I to
XXthe Macintosh character set; if the argument is "\-",
XXthe file is converted
XX.I from
XXthe Macintosh character set.
XXSee description of
XX.B \-v
XXoption below and
XXnote in "bugs" section below about treatment of "end-of-line" and
XX"end-of-paragraph".
XX.TP
XX\fB\-o \fR[ \fBb \fR]
XX.B Pep
XXwill usually write the result of conversions on the standard output
XX.I (stdout).
XXThis option instead instructs
XX.B pep
XXto replace each named input file with a file containing the result
XXof filtering the file through
XX.B pep.
XXIf the option is augmented with the argument
XX.B b
XX(i.e.
XX.BR \-ob ),
XXthen
XX.B pep
XXwill create a backup copy of the original input file on a file
XXwith extension .BAK.  If you just specify
XX.B \-o
XXthe original file is deleted.
XX.IP
XXThe VMS version of
XX.B pep
XXwill always run as if this option was specified.  This is because
XXVMS does not support useful redirection or pipes.  Therefore, it is never
XXnecessary to specify the
XX.B \-o
XXoption under VMS, but users should still specify
XX.B \-ob
XXif they want a backup copy of the original input file.
XX.TP
XX.B \-p
XXWrite out a brief description the conversion functions that
XXwill be activated by the current
XXset of options, and pause.  The user may review the list of
XXconversion functions and abort (by hitting CTRL-C) if they do not have
XXthe intended effect.
XX.TP
XX\fB\-s \fR[ \fIsize \fR]
XXFind strings in extremely "noisy" files.
XX.IP
XX.BR Pep 's
XXconcept of a string is that it is a sequence of "printable" characters
XXof a certain length.  The default minimum length of this sequence is
XX4, but this may be changed by the user by supplying an optional
XXnumeric argument that becomes the minimum length of the sequence.
XX.IP
XXThe default definition of a "printable" character is a symbol with
XXencoding above 31 decimal (i.e. 32 to 255) plus certain
XXcommon control characters (TAB, CR and LF).  This definition
XXis almost always too liberal, and will include a lot of "noise" in
XXthe output.  One or more of the options
XX.B \-b, \-d, \-i, \-m
XXor
XX.B \-z
XXshould be specified in addition to
XX.B \-s
XXin order to narrow the definition and the search space.
XXIn my experience, the
XX.B \-b
XXoption is a particularly
XXuseful additional filter when searching for strings.
XX.TP
XX\fB\-t \fR[ \fIsize \fR]
XXExpand tabulation, replacing the TAB character with a suitable number
XXof spaces.  The default tabulation size is 8, but the optional
XXnumeric argument
XX.I size
XXmay be used to set tabulation to any desired size.
XX.TP
XX\fB\-u r | n | s | - | # | \fInumber \fR
XX.BR Pep 's
XXdefault behaviour is to terminate lines with whatever is the
XXcanonical line terminator (the standard way to terminate
XXa text line) on the assumed target system for the output file.
XXThis means CR/LF on a microcomputer system, LF on a UNIX system,
XXand CR if the target is a Macintosh).  The assumed target system
XXis usually the system
XX.B pep
XXis running on, unless you request folding to the character set
XXof another computer system.  Then, that computer system becomes
XXthe assumed target.
XX.IP
XXThe
XX.B \-u
XXoption allows you to override this assumption.
XXYou do this by specifying explicit (in decimal) the numeric ASCII
XXvalue of the end of line character you want in your output file.
XXFor example, to make sure
XXlines are terminated by LF (the standard for UNIX text files),
XXyou may use
XX.BR \-u10 ,
XXbecause 10 is the ASCII value of the newline (LF) control character.
XXInstead of a numeric argument, you may specify
XX.BR r ,
XXfor carrige return (CR),
XX.BR n ,
XXfor newline (LF),
XX.BR s ,
XXfor record separator (RS), the symbol
XX.BR - ,
XXfor no line terminator, or the symbol
XX.B #
XXto get carrige return followed by a newline (CR/LF).
XX.TP
XX.B \-v
XXNormally,
XX.B pep
XXwill terminate each line with the canonical line terminator.
XXSome typesetting programs and word processors, however, require
XXthat no hard line terminator is present within a paragraph, and
XXthat only paragraphs are hard terminated.  If you want to
XXimport a file to such a typesetting program or word processor,
XXyou may instruct
XX.B pep
XXto terminate paragraphs
XX.I only
XXwith this option.
XX.IP
XXSee note in "bugs" section below about treatment of "end-of-line" and
XX"end-of-paragraph".
XX.TP
XX.B \-w + | \-
XXThis slightly obsolete option converts files to and from the
XXWordStar version 3.2 "document" mode.  If the argument is "+",
XXthe file is converted
XX.I to
XXWordStar document mode; if the argument is "\-",
XXthe file is converted
XX.I from
XXWordStar document mode into plain ASCII text.
XX.TP
XX.B \-x
XXExpand unprintable characters.  This option
XXwill make
XX.B pep
XXexpand the characters it would otherwise remove from the file by
XXprinting the character encoding of these characters in
XXhexadecimal between angle brackets.
XX.TP
XX.B \-z
XXZero the eight bit (a.k.a. the parity bit) on all characters in the file.
XX.SH ENVIRONMENT
XX.PP
XX.B Pep
XXknows a single environment variable:
XX.BR PEP ,
XXwhich may be
XXused to indicate the lookup path for files with conversion
XXtables.  Below is some examples on how to set this in some
XXoperating systems:
XX.sp 0.5
XX.RS
XX.nf
XX\fBset PEP=c:\eusr\elib				\fR(MS-DOS)
XX\fBsetenv PEP /usr/local/lib				\fR(UNIX)
XX\fBdefine PEP "DISK_USR:<LOCAL.LIB>"		\fR(VMS)
XX.fi
XX.RE
XX.PP
XXThe command to set this environment variable should usually be
XXpart of the command file that is read during login (this may
XXbe named
XX.B "AUTOEXEC.BAT, LOGIN.COM, .profile"
XXor
XX.B .login
XXdepending upon your choice of operating system.  Please note
XXthat environment variables do not exist under CP/M.
XX.SH EXAMPLES
XXSome of the examples below use i/o redirection and pipes,
XXas indicated with the symbols ">" and "<" (redirection)
XXand "|" (pipe symbol).  These examples
XXonly apply to operating systems that support
XXredirection and pipes.
XX.PP
XX.TP 3
XX.B pep \-h
XXPrint a quick summary of all available options, and exit.
XX.TP
XX.B "pep"
XXRead input from standard input (the keyboard), and write
XXthe result on standard output (the screen) until the user
XXtypes the end of file character (usually CTRL-D (UNIX) or
XXCTRL-Z (MS-DOS)).  This is of limited practical use by
XXitself, usually this command is inserted into the middle of a
XXcommand where the standard input and standard output are pipes.
XX.TP
XX.B "pep < foo.bar
XXDisplay a slightly cleaned-up version of the file
XX.I foo.bar
XXon the screen.
XX.TP
XX.B "pep < foo.bar > foo.txt"
XXRead the file
XX.I foo.bar,
XXclean it, and write the result on the file
XX.I foo.txt.
XX.TP
XX.B "pep foo.bar > foo.txt"
XXRead the file
XX.I foo.bar,
XXclean it, and write the result on the file
XX.I foo.txt.
XX.TP
XX.B "pep foo1.bar foo2.bar > foo.txt"
XXRead the files
XX.I "foo1.bar"
XXand
XX.I foo2.bar,
XXclean them, and
XXcatenate the result on the file
XX.I foo.txt.
XX.TP
XX.B "pep \-o foo.fil bar.fil"
XXClean the files
XX.I foo.fil
XXand
XX.I bar.fil,
XXreplacing the
XXoriginal files with the cleaned-up versions.
XX.TP
XX.B "pep \-ob foo.fil bar.fil"
XXClean the files
XX.I foo.fil
XXand
XX.I bar.fil,
XXreplacing the
XXoriginal files with the cleaned-up versions.  The original
XXfiles are preserved as
XX.I foo.bak
XXand
XX.I bar.bak.
XX.TP
XX.B "pep \-i+ \-o program.dok"
XXConvert the Norwegian text in the file
XX.I "program.dok"
XXto use
XXthe IBM-PC 8 bit character set.  Please note that this
XXconversion may not be 100 percent correct.  For instance,
XXthe pipe symbol "|" will be converted to the lower case Norwegian
XX.I oslash
XXcharacter.
XXThis is because the pipe symbol and the character share the
XXsame ASCII code (124) in the Norwegian version of the 7-bit character
XXset, but they have different codes when
XXusing 8-bit character sets.
XX.TP
XX.B "pep \-e2 \-o kermit.log"
XXInterpret ANSI screen control sequences in the file
XX.I kermit.log.
XXSet guard to level 2 (no deletion or overwriting).
XX.IP
XXIn this example, it is assumed that the file
XX.I kermit.log
XXis a log record of an on-line session with some Bulletin Board System (BBS).
XXSuch files may be created with the command "log session" in the popular
XX.I kermit
XXcommunication program.  Most other communication programs have
XXsimilar commands.  Many BBSs uses
XXuses ANSI sequences for simple graphics, highlighting and
XXother special effects, and you will get a much more
XXmore readable session log if you run it through
XX.B pep
XXwith the
XX.B \-e
XXoption turned on.
XX.TP
XX.B "test | pep \-e > test.scr"
XXRun the program
XX.I test,
XXand pipe its output to
XX.B pep,
XXwhich interprets any ANSI sequences and store the resulting screen
XXimages in the file
XX.I test.scr.
XXNote that this is only
XXpossible on operating systems that support pipes (i.e. UNIX and MS-DOS).
XX.IP
XXThe screen images will now be on standard text files which have the same
XXgeneral layout as the original screen images.  This may be useful if
XXyou need text versions of the screen images for inclusion in manuals or
XXfor prototypes.
XX.TP
XX.B "nroff \-man \-Tlpr pep.1l | pep > pep.doc"
XXGenerate a plain text version of this manual, without
XXbackspaces or double strikes
XX.RB ( nroff
XXis the standard Unix text formatter).
XX.TP
XX.B "pep \-d- \-o *.txt"
XXConvert all files with extension
XX.B .txt
XXfrom DEC/ISO character set to Norwegian 7-bit ASCII characters.
XX.TP
XX.B "pep \-gibm2mac \-ur \-< foo.ibm > foo.mac"
XXUse the conversion table in the file
XX.I "ibm2mac"
XXto convert
XXthe character set in the file
XX.I foo.ibm.
XXStore the result on the file
XX.I foo.mac,
XXwhere each line should be terminated by a single CR character.
XX.TP
XX.B "pep \-m\- < foo.mac | pep \-i+ > foo.ibm"
XXConvert Apple Macintosh encoded Norwegian characters in the file
XX.I "foo.mac"
XXto IBM-PC (Code Page 850) encoding.  This is an alternative way to
XXaccomplish the same thing as the conversion done in the previous
XXexample.
XX.TP
XX.B "pep \-w- \-o *.*"
XXConvert all files in the current directory from WordStar document
XXmode to 7-bit ASCII.
XX.TP
XX.B "pep \-w+ \-t4 < foo.txt > foo.ws"
XXConvert the file
XX.I "foo.txt"
XXto WordStar document mode format, also expanding tabulation (tabstop = 4)
XXto space characters.  The result is stored on a file named
XX.I foo.ws.
XX.B Pep
XXuses a simple pattern recognition mechanism to recognize pages,
XXparagraphs, soft white space and soft hyphens.  It will probably
XXnot do a 100% conversion, but the file will be much easier to
XXedit in WordStar than the original.
XX.TP
XX.B "pep \-z \-x < foo.dat > foo.dmp"
XXStrip the 8th bit and expand control characters to hex
XXdigits in the file
XX.I foo.dat,
XXand store the result on the file
XX.I foo.dmp.
XX.IP
XXExpanding the unprintable characters to hexadecimal makes it easier to
XXinspect a file in an ordinary text editor, and to post-process it
XXby a customized filter you may create yourself
XXwith the search/replace and macro
XXfacilities found in many editors today.
XX.TP
XX.B "pep \-s6 \-b < pep.exe"
XXExtract "strings" from the file
XX.I pep.exe.
XXThe strings are just listed on standard output (the screen).
XX"Strings" are in this context assumed to be any sequence of characters
XXthat are at least 6 characters long.  The
XX.B \-b
XXoption excludes characters with codes in the range 128 to 255 from
XXthe search.  It is almost always a good idea to combine the
XX.B \-b
XXoption with
XX.B \-s
XXoption, otherwise to much garbage is picked up by the filter.
XX.TP
XX.B "pep \-t4 \-c8 \-o foo.c"
XXIf both tab expansion
XX.B \-t
XXand tab compression
XX.B \-c
XXis specified, then
XX.B pep
XXwill repack the tabulation.  This is useful if you want to convert
XXa file from one tab-size to another (e.g. to convert non-standard
XX4 character tabulation into standard 8 character tabulation).
XXIn this example, two TAB characters in the file
XX.I foo.c
XXare replaced by a single tab character: and any TAB character that cannot be
XXpaired up is replaced by the appropriate number of spaces.
XX.TP
XX.B "pep \-t \-c \-o foo.c"
XXRemove redundant space characters in existing tabulation in the file
XX.I foo.c.
XXWhat happens is that tabulation on each line is first expanded and
XXthen compressed again, which effectively
XXremoves any space characters "inside" a tabulation.
XX.SH DIAGNOSTICS
XX.PP
XXIf you specify an option that
XX.B pep
XXdoes not recognize, then
XX.B pep
XXwill
XXwrite a summary of usage and abort.  Other errors on the
XXcommand line will result in
XX.B pep
XXwriting an error message
XXbefore aborting.
XX.PP
XXOn operating systems that support exit codes,
XX.B pep
XXwill return an exit code upon termination.
XX.PP
XXIf
XX.B pep
XXis interpreting ANSI escape sequences and notices
XXsyntactical or semantical errors in the way they are used, a
XXwarning is printed on the screen, prefixed with the string
XX"ansi:".  This means that it is also possible to use
XX.B pep
XXto check if programs use ANSI sequences in a portable way.
XX.SH FILES
XX.TP 10
XX.B pep, pep.exe, pep.cmd
XXexecutable file (actual name depends upon which operating system you use).
XX.TP
XX.B mac2ibm
XXsmall example of a user supplied conversion table
XXto convert from the Macintosh character set to that used on
XXthe Norwegian version of the original IBM-PC (the sample file
XXonly covers the Norwegian characters \(em to complete it is
XXleft as an exercise to the reader :-) ).
XX.TP
XX.B ibm2mac
XXinverse of
XX.B mac2ibm:
XXconversion table from a small subset of
XXIBM CP 850 to Macintosh character set.
XX.TP
XX.B ebc2ns7
XXconversion table from the IBM EBCDIC character set to the Norwegian
XXversion of the ASCII 7-bit character set (ISO646 NS4551).
XX.TP
XX.B ibm2ro8
XXconversion table from the IBM-PC 8-bit character
XXset to Hewlett-Packard ROMAN8.
XX.TP
XX.B ro82ibm
XXinverse of
XX.B ibm2ro8:
XXconversion table from ROMAN8
XXto IBM-PC character set.
XX.TP
XX.B ibm2iso
XXconversion table from the IBM-PC CP 850 8-bit character
XXset to ISO 8859/1.
XX.TP
XX.B iso2ibm
XXinverse of
XX.B ibm2iso:
XXconversion table from ISO 8859/1 to CP 850.
XX.SH AUTHOR
XX.PP
XXCopyright \(co 1989 Gisle Hannemyr.
XX.PP
XX.B Pep
XXmay be freely distributed and copied, as long as this file
XXis included in the distribution and that these statements
XXabout authorship and copyright is not altered or removed.
XX.PP
XXBug reports, improvements, comments, suggestions and flames to:
XX.ti +0.2i
XXSnail: Gisle Hannemyr, Brageveien 3A, 0452 Oslo, Norway.
XX.ti +0.2i
XXEmail: gisle@nr.uninett (EAN);
XX.ti +0.9i
XXgisle@ifi.uio.no (Internet);
XX.ti +0.9i
XX\|.\|.\|.\|!mcvax!ifi!gisle (UUCP);
XX.ti +0.9i
XX(and several BBS mailboxes).
XX.SH ACKNOWLEDGMENTS
XX.PP
XXThanks to Robert Andersson, for the SYS-V
XX.I "rename"
XXfunction; and to
XXKnut Borge, Bjoern Larsen, Knut Omang and Geir-Harald Strand,
XXfor elucidation of the unspeakeable mysteries of VMS.
XXSpecial thanks are due Inge Arnesen for finding and fixed a bug,
XX(and to Nils-Eivind Naas for bringing it to my attention).
XX
XXSeveral people have contributed ideas and/or bug reports.
XXIn addition to those mentioned above,
XXOla Garstad, Ottar Grimstad,
XXTor Sjoewall, and Jens-Henrik Soerensen
XXshould be mentioned.  My apologies if anyone
XXis forgotten.
XX.SH SEE ALSO
XX.LP
XX.BR dd (1),
XX.BR detex (1L),
XX.BR convert (VMS),
XX.BR expand (1),
XX.BR od (1V),
XX.BR strings (1),
XX.BR tr (1),
XX.BR unexpand (1).
XX.PP
XX.BR Detex (1L)
XXis a lex-based program to convert LaTex and TeX manuscripts into plain
XXASCII text.  It is available from the author upon request.  Those marked
XXVMS are standard VMS utilities.  The others are standard UNIX utilities.
XX.SH BUGS
XX.PP
XXThere is a very strong Norwegian bias in
XX.B pep.
XXIn particular,
XXthere exists several national versions of the ISO 646 7-bit
XXcharacter set; but all built-in functions to convert between this
XXand various 8-bit character sets (i.e.
XX.B \-d, \-i, \-k
XXand
XX.BR \-m )
XXbluntly assumes the standard Norwegian version of the ISO 646. For
XX.B pep
XXto work with other national 7-bit character sets, the
XXcompiled in conversion tables (type FOLDMATRIX for those who read the
XXsource code) need to be extended.
XX.PP
XXThe VMS version of
XX.B pep
XXruns with the
XX.B \-o
XXoption permanently enabled.  This is because VMS does not support an
XXuseful i/o redirection or pipe mechanism.
XX.PP
XXThe VMS Record Management Service (RMS) knows of several record formats.
XXYou can see what record format a file is by using the VMS DCL command
XX.I "DIRECTORY/FULL"
XXand examine the field "Record format".
XXOn VMS systems,
XX.B Pep
XXwill always generate output files with record format set to "Stream_LF",
XXbut some programs may require that the output file is in other
XXformats.  To fix this, it might be necessary to run the output of
XX.B pep
XXthrough the VMS
XX.B CONVERT
XXutility.  Please see the DEC VMS manuals for details.
XX.PP
XXThe Macintosh "text only" format uses the carriage return (CR) character
XX(ASCII 13) as terminator.  Most text processors (e.g. MacWrite)
XXseems capable of handling two conventions:
XXOne is to use CR to terminate each line (and two or more
XXconsequtive CR's between paragraphs); the other is to use CR between
XXparagraphs only.
XX.B Pep
XXis also capable of handling both conventions.  The default behaviour
XXis to terminate each line, but the
XX.B \-v
XXoption may be used to terminate paragraphs only.
XXPlease note that
XX.B pep
XXuses a rather simplistic heuristic to identify the end of a paragraph,
XXit bluntly assumes that paragraphs are separated by blank lines.
XX.PP
XXIf you use the
XX.B \-o
XXoption, then the original input file will
XXbe overwritten.  Before you are familiar with
XX.B pep,
XXyou may
XXfind that it sometimes removes more material than you expect
XXfrom a file.  It may be a good idea to always make a copy
XXof the original file before you start experimenting with
XX.B pep,
XXor you may add the
XX.B
XX"b"
XXargument to the
XX.B
XX\-o
XXoption
XX.B
XX(\-ob).
XX.PP
XXThe built-in IBM-PC, DEC and Macintosh conversion tables
XXconverts to and from the Norwegian version of 7-bit "ASCII"
XXcharacters.  You should use the
XX.B \-g
XXoption and "general" conversion tables for all other purposes.
XX.PP
XX.B Pep
XXonly knows the ANSI sequences implemented in the
XXstandard MS-DOS console driver
XX.I
XXANSI.SYS.
XX.PP
XXThere cannot be a space character between an option and the
XXoption's argument (e.g. you'll have to use
XX.B
XX"\-gfoo.bar",
XXnot
XX.B
XX"\-g foo.bar").
XX.PP
XXPep will only filter "regular" files.  It will skip directories, sockets
XXand "special" files.
XX.PP
XXLinks are the GOTOs of file systems.  If you run a hard linked file
XXthrough pep using the
XX.B \-o
XXoption, the link will not be preserved.  Pep will just skip soft
XXlinked files.
XX.PP
XX.B Pep
XXsearches for the conversion tables requested with the
XX.B
XX\-g
XXoption in the following order: first the current directory,
XXthen the directory of the file
XX.I PEP.EXE
XX(MS-DOS only), and finally the directory pointed to by the
XX.B PEP
XXenvironment
XXvariable.
XX.PP
XX.B Pep
XXknows nothing about the COFF-format and the
XX.B \-s
XXoption is
XXprimitive compared to the UNIX command
XX.IR strings (1).
XXSo if you are on a UNIX-system \(em forget about the
XX.B \-s
XXoption and use
XX.IR strings (1)
XXinstead.
XX.PP
XX.B Pep
XXwill not convert Word Perfect documents into plain ASCII.
XXThis much requested function is, however, built into Word Perfect.
XXIt is named "store as DOS-text" and is activated by pressing
XXCTRL-F5 (at least in Word Perfect 4.2).
XX.\" EOF
SHAR_EOF
if test 28373 -ne "`wc -c pep.1l`"
then
echo shar: error transmitting pep.1l '(should have been 28373 characters)'
fi
#	End of shell archive
exit 0