[comp.os.minix] egrep doc

dono@killer.DALLAS.TX.US (Don OConnell) (02/26/89)

This has a readme me from V 1.1 of gnu e/grep, and the man page from v1.2
(haven't compiled it yet). It has more features than the one suppplied with 
minix.

Don O'Connell					killer!dono
-------------------------Cut Here-----------------------------
#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  egrep.1 readme
# Wrapped by dono@killer on Sat Feb 25 23:25:58 1989
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f egrep.1 -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"egrep.1\"
else
echo shar: Extracting \"egrep.1\" \(11445 characters\)
sed "s/^X//" >egrep.1 <<'END_OF_egrep.1'
X
X
X
X     GGGGRRRREEEEPPPP((((1111))))          GGGGNNNNUUUU PPPPrrrroooojjjjeeeecccctttt ((((1111999988888888 DDDDeeeecccceeeemmmmbbbbeeeerrrr 11113333))))           GGGGRRRREEEEPPPP((((1111))))
X
X
X
X     NNNNAAAAMMMMEEEE
X          grep, egrep - print lines matching a regular expression
X
X     SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS
X          ggggrrrreeeepppp [ ----CCCCVVVVbbbbcccchhhhiiiillllnnnnssssvvvvwwwwxxxx ] [ ----_n_u_m ] [ ----AAAABBBB _n_u_m ] [ [ ----eeee ] _e_x_p_r |
X          ----ffff _f_i_l_e ] [ _f_i_l_e_s ... ]
X
X     DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN
X          _G_r_e_p searches the files listed in the arguments (or standard
X          input if no files are given) for all lines that contain a
X          match for the given _e_x_p_r.  If any lines match, they are
X          printed.
X
X          Also, if any matches were found, _g_r_e_p will exit with a
X          status of 0, but if no matches were found it will exit with
X          a status of 1.  This is useful for building shell scripts
X          that use _g_r_e_p as a condition for, for example, the _i_f
X          statement.
X
X          When invoked as _e_g_r_e_p the syntax of the _e_x_p_r is slightly
X          different; See below.
X
X     RRRREEEEGGGGUUUULLLLAAAARRRR EEEEXXXXPPPPRRRREEEESSSSSSSSIIIIOOOONNNNSSSS
X               (grep)    (egrep)   (explanation)
X
X               _c         _c         a single (non-meta) character
X                                   matches itself.
X
X               .         .         matches any single character except
X                                   newline.
X
X               \?        ?         postfix operator; preceeding item
X                                   is optional.
X
X               *         *         postfix operator; preceeding item 0
X                                   or more times.
X
X               \+        +         postfix operator; preceeding item 1
X                                   or more times.
X
X               \|        |         infix operator; matches either
X                                   argument.
X
X               ^         ^         matches the empty string at the
X                                   beginning of a line.
X
X               $         $         matches the empty string at the end
X                                   of a line.
X
X               \<        \<        matches the empty string at the
X                                   beginning of a word.
X
X
X
X
X     Page 1                                          (printed 2/26/89)
X
X
X
X
X
X
X     GGGGRRRREEEEPPPP((((1111))))          GGGGNNNNUUUU PPPPrrrroooojjjjeeeecccctttt ((((1111999988888888 DDDDeeeecccceeeemmmmbbbbeeeerrrr 11113333))))           GGGGRRRREEEEPPPP((((1111))))
X
X
X
X               \>        \>        matches the empty string at the end
X                                   of a word.
X
X               [_c_h_a_r_s]   [_c_h_a_r_s]   match any character in the given
X                                   class; if the first character after
X                                   [ is ^, match any character not in
X                                   the given class; a range of
X                                   characters may be specified by
X                                   _f_i_r_s_t-_l_a_s_t; for example, \W (below)
X                                   is equivalent to the class
X                                   [^A-Za-z0-9]
X
X               \( \)     ( )       parentheses are used to override
X                                   operator precedence.
X
X               \_d_i_g_i_t    \_d_i_g_i_t    \_n matches a repeat of the text
X                                   matched earlier in the regexp by
X                                   the subexpression inside the nth
X                                   opening parenthesis.
X
X               \         \         any special character may be
X                                   preceded by a backslash to match it
X                                   literally.
X
X               (the following are for compatibility with GNU Emacs)
X
X               \b        \b        matches the empty string at the
X                                   edge of a word.
X
X               \B        \B        matches the empty string if not at
X                                   the edge of a word.
X
X               \w        \w        matches word-constituent characters
X                                   (letters & digits).
X
X               \W        \W        matches characters that are not
X                                   word-constituent.
X
X          Operator precedence is (highest to lowest) ?, *, and +,
X          concatenation, and finally |.  All other constructs are
X          syntactically identical to normal characters.  For the truly
X          interested, the file dfa.c describes (and implements) the
X          exact grammar understood by the parser.
X
X     OOOOPPPPTTTTIIIIOOOONNNNSSSS
X          ----AAAA _n_u_m
X               print <num> lines of context after every matching line
X
X          ----BBBB _n_u_m
X               print _n_u_m lines of context before every matching line
X
X          ----CCCC   print 2 lines of context on each side of every match
X
X
X
X     Page 2                                          (printed 2/26/89)
X
X
X
X
X
X
X     GGGGRRRREEEEPPPP((((1111))))          GGGGNNNNUUUU PPPPrrrroooojjjjeeeecccctttt ((((1111999988888888 DDDDeeeecccceeeemmmmbbbbeeeerrrr 11113333))))           GGGGRRRREEEEPPPP((((1111))))
X
X
X
X          ----_n_u_m print _n_u_m lines of context on each side of every match
X
X          ----VVVV   print the version number on the diagnostic output
X
X          ----bbbb   print every match preceded by its byte offset
X
X          ----cccc   print a total count of matching lines only
X
X          ----eeee _e_x_p_r
X               search for _e_x_p_r; useful if _e_x_p_r begins with -
X
X          ----ffff _f_i_l_e
X               search for the expression contained in _f_i_l_e
X
X          ----hhhh   don't display filenames on matches
X
X          ----iiii   ignore case difference when comparing strings
X
X          ----llll   list files containing matches only
X
X          ----nnnn   print each match preceded by its line number
X
X          ----ssss   run silently producing no output except error messages
X
X          ----vvvv   print only lines that contain no matches for the <expr>
X
X          ----wwww   print only lines where the match is a complete word
X
X          ----xxxx   print only lines where the match is a whole line
X
X     SSSSEEEEEEEE AAAALLLLSSSSOOOO
X          emacs(1), ed(1), sh(1), _G_N_U _E_m_a_c_s _M_a_n_u_a_l
X
X     IIIINNNNCCCCOOOOMMMMPPPPAAAATTTTIIIIBBBBIIIILLLLIIIITTTTIIIIEEEESSSS
X          The following incompatibilities with UNIX _g_r_e_p exist:
X
X               The context-dependent meaning of * is not quite the
X               same (grep only).
X
X               ----bbbb prints a byte offset instead of a block offset.
X
X               The {_m,_n} construct of System V grep is not
X               implemented.
X
X     BBBBUUUUGGGGSSSS
X          GNU _e?_g_r_e_p has been thoroughly debugged and tested by
X          several people over a period of several months; we think
X          it's a reliable beast or we wouldn't distribute it.  If by
X          some fluke of the universe you discover a bug, send a
X          detailed description (including options, regular
X          expressions, and a copy of an input file that can reproduce
X          it) to me, mike@wheaties.ai.mit.edu.
X
X
X
X     Page 3                                          (printed 2/26/89)
X
X
X
X
X
X
X     GGGGRRRREEEEPPPP((((1111))))          GGGGNNNNUUUU PPPPrrrroooojjjjeeeecccctttt ((((1111999988888888 DDDDeeeecccceeeemmmmbbbbeeeerrrr 11113333))))           GGGGRRRREEEEPPPP((((1111))))
X
X
X
X          There is also a newsgroup, gnu.utils.bug, for reporting FSF
X          utility programs' bugs and fixes; but before reporting
X          something as a bug, please try to be sure that it really is
X          a bug, not a misunderstanding or a deliberate feature.
X          Also, include the version number of the utility program you
X          are running in _e_v_e_r_y bug report that you send in.  Please do
X          not send anything but bug reports to this newsgroup.
X
X     AAAAVVVVAAAAIIIILLLLAAAABBBBIIIILLLLIIIITTTTYYYY
X          GNU _g_r_e_p is free; anyone may redistribute copies of _g_r_e_p to
X          anyone under the terms stated in the GNU General Public
X          License, a copy of which may be found in each copy of _G_N_U
X          _E_m_a_c_s.  See also the comment at the beginning of the source
X          code file grep.c.
X
X          Copies of GNU _g_r_e_p may sometimes be received packaged with
X          distributions of Unix systems, but it is never included in
X          the scope of any license covering those systems.  Such
X          inclusion violates the terms on which distribution is
X          permitted.  In fact, the primary purpose of the General
X          Public License is to prohibit anyone from attaching any
X          other restrictions to redistribution of any of the Free
X          Software Foundation programs.
X
X     AAAAUUUUTTTTHHHHOOOORRRRSSSS
X          Mike Haertel wrote the deterministic regexp code and the
X          bulk of the program.
X
X          James A. Woods is responsible for the hybridized search
X          strategy of using Boyer-Moore-Gosper fixed-string search as
X          a filter before calling the general regexp matcher.
X
X          Arthur David Olson contributed code that finds fixed strings
X          for the aforementioned BMG search for a large class of
X          regexps.
X
X          Richard Stallman wrote the backtracking regexp matcher that
X          is used for \fIdigit backreferences, as well as the getopt
X          that is provided for 4.2BSD sites.  The backtracking matcher
X          was originally written for GNU Emacs.
X
X          D. A. Gwyn wrote the C alloca emulation that is provided so
X          System V machines can run this program.  (Alloca is used
X          only by RMS' backtracking matcher, and then only rarely, so
X          there is no loss if your machine doesn't have a "real"
X          alloca.)
X
X          Scott Anderson and Henry Spencer designed the regression
X          tests used in the "regress" script.
X
X          Paul Placeway wrote the original version of this manual
X          page.
X
X
X
X     Page 4                                          (printed 2/26/89)
X
X
X
END_OF_egrep.1
echo shar: 1108 control characters may be missing from \"egrep.1\"
if test 11445 -ne `wc -c <egrep.1`; then
    echo shar: \"egrep.1\" unpacked with wrong size!
fi
# end of overwriting check
fi
if test -f readme -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"readme\"
else
echo shar: Extracting \"readme\" \(7366 characters\)
sed "s/^X//" >readme <<'END_OF_readme'
XThis README documents GNU e?grep version 1.1.
X
XChanges needed to the makefile under various perversions of Unix are
Xdescribed therein.
X
XIf the type "char" is unsigned on your machine, you will have to fix
Xthe definition of the macro SIGN_EXTEND_CHAR() in regex.c.  A reasonable
Xdefinition might be:
X	#define SIGN_EXTEND_CHAR(c) ((c)>(char)127?(c)-256:(c))
X
XGNU e?grep is provided "as is" with no warranty.  The exact terms
Xunder which you may use and (re)distribute this program are detailed
Xin a comment at the top of grep.c.
X
XGNU e?grep is based on a fast lazy-state deterministic matcher (about
Xtwice as fast as stock Unix egrep) hybridized with a Boyer-Moore-Gosper
Xsearch for a fixed string that eliminates impossible text from being
Xconsidered by the full regexp matcher without necessarily having to
Xlook at every character.  The result is typically many times faster
Xthan Unix grep or egrep.  (Regular expressions containing backreferencing
Xmay run more slowly, however.)
X
XGNU e?grep attempts, as closely as possible, to understand compatibly
Xthe regexp syntaxes of the Unix programs it replaces.  The following table
Xdetails the various special characters understood in both the grep and
Xegrep incarnations:
X
X(grep)	(egrep)		(explanation)
X  .	   .		matches any single character except newline
X  \?	   ?		postfix operator; preceeding item is optional
X  *	   *		postfix operator; preceeding item 0 or more times
X  \+	   +		postfix operator; preceeding item 1 or more times
X  \|	   |		infix operator; matches either argument
X  ^	   ^		matches the empty string at the beginning of a line
X  $	   $		matches the empty string at the end of a line
X  \<	   \<		matches the empty string at the beginning of a word
X  \>	   \>		matches the empty string at the end of a word
X [chars] [chars]	match any character in the given class; if the
X			first character after [ is ^, match any character
X			not in the given class; a range of characters may
X			be specified by <first>-<last>; for example, \W
X			(below) is equivalent to the class [^A-Za-z0-9]
X \( \)	  ( )		parentheses are used to override operator precedence
X \<1-9>	  \<1-9>	\<n> matches a repeat of the text matched earlier
X			in the regexp by the subexpression inside the
X			nth opening parenthesis
X  \	   \		any special character may be preceded by a backslash
X			to match it literally
X
X(the following are for compatibility with GNU Emacs)
X  \b	   \b		matches the empty string at the edge of a word
X  \B	   \B		matches the empty string if not at the edge of a word
X  \w	   \w		matches word-constituent characters (letters & digits)
X  \W	   \W		matches characters that are not word-constituent
X
XOperator precedence is (highest to lowest) ?, *, and +, concatenation,
Xand finally |.  All other constructs are syntactically identical to
Xnormal characters.  For the truly interested, a comment in dfa.c describes
Xthe exact grammar understood by the parser.
X
XGNU e?grep understands the following command line options:
X	-A <num>	print <num> lines of context after every matching line
X	-B <num>	print <num> lines of context before every matching line
X	-C		print 2 lines of context on each side of every match
X	-<num>		print <num> lines of context on each side
X	-V		print the version number on stderr
X	-b		print every match preceded by its byte offset
X	-c		print a total count of matching lines only
X	-e <expr>	search for <expr>; useful if <expr> begins with -
X	-f <file>	take <expr> from the given <file>
X	-h		don't display filenames on matches
X	-i		ignore case difference when comparing strings
X	-l		list files containing matches only
X	-n		print each match preceded by its line number
X	-s		run silently producing no output except error messages
X	-v		print only lines that contain no matches for the <expr>
X	-w		print only lines where the match is a complete word
X	-x		print only lines where the match is a whole line
X
XThe options understood by GNU e?grep are meant to be (nearly) compatible
Xwith both the BSD and System V versions of grep and egrep.
X
XThe following incompatibilities with other versions of grep exist:
X	the context-dependent meaning of * is not quite the same (grep only)
X	-b prints a byte offset instead of a block offset
X	the \{m,n\} construct of System V grep is not implemented
X
XGNU e?grep has been thoroughly debugged and tested by several people
Xover a period of several months; we think it's a reliable beast or we
Xwouldn't distribute it.  If by some fluke of the universe you discover
Xa bug, send a detailed description (including options, regular
Xexpressions, and a copy of an input file that can reproduce it) to me,
Xmike@wheaties.ai.mit.edu.
X
XGNU e?grep is brought to you by the efforts of several people:
X
X	Mike Haertel wrote the deterministic regexp code and the bulk
X	of the program.
X
X	James A. Woods is responsible for the hybridized search strategy
X	of using Boyer-Moore-Gosper fixed-string search as a filter
X	before calling the general regexp matcher.
X
X	Arthur David Olson contributed code that finds fixed strings for
X	the aforementioned BMG search for a large class of regexps.
X
X	Richard Stallman wrote the backtracking regexp matcher that is
X	used for \<digit> backreferences, as well as the getopt that
X	is provided for 4.2BSD sites.  The backtracking matcher was
X	originally written for GNU Emacs.
X
X	D. A. Gwyn wrote the C alloca emulation that is provided so
X	System V machines can run this program.  (Alloca is used only
X	by RMS' backtracking matcher, and then only rarely, so there
X	is no loss if your machine doesn't have a "real" alloca.)
X
X	Scott Anderson and Henry Spencer designed the regression tests
X	used in the "regress" script.
X
XIf you are interested in improving this program, you may wish to try
Xany of the following:
X
X1.  Make backreferencing \<digit> faster.  Right now, backreferencing is
X    handled by calling the Emacs backtracking matcher to verify the partial
X    match.  This is slow; if the DFA routines could handle backreferencing
X    themselves a speedup on the order of three to four times might occur
X    in those cases where the backtracking matcher is called to verify nearly
X    every line.  Also, some portability problems due to the inclusion of the
X    emacs matcher would be solved because it could then be eliminated.
X    Note that expressions with backreferencing are not true regular
X    expressions, and thus are not equivalent to any DFA.  So this is hard.
X
X2.  There is a bug in the backtracking matcher, regex.c, such that the |
X    operator is not properly commutative.  Let x and y be arbitrary
X    regular expressions, and suppose both x and y have matches at
X    some point in the target text.  Then the regexp x|y should select
X    the longest of the two matches.  With the backtracking matcher, if the
X    first match succeeds it does not even try the second, even though
X    the second may be a longer match.  This is obviously of no concern
X    for grep, which does not care exactly where or how long a match is,
X    so long as it knows it is there.  On the other hand, the backtracking
X    matcher is used in GNU AWK, wherein its behavior can only be considered
X    a bug.
X
X3.  Handle POSIX style regexps.  I'm not sure if this could be called an
X    improvement; some of the things on regexps in the POSIX draft I have
X    seen are pretty sickening.  But it would be useful in the interests of
X    conforming to the standard.
END_OF_readme
if test 7366 -ne `wc -c <readme`; then
    echo shar: \"readme\" unpacked with wrong size!
fi
# end of overwriting check
fi
echo shar: End of shell archive.
exit 0