[net.unix-wizards] the many greps

edhall%rand-unix@sri-unix.UUCP (11/11/83)

There are definitely still enough differences to require 3 greps.
For example, try comparing:

  egrep '....................................................................'

with:

  grep '....................................................................'

The grep will be much faster than the egrep.  This seems to be true for any
regular expression with a lot of wild card characters (i.e. `.''s).

Also, when you are looking for matches with a long list of words, you'll
find:

  fgrep -f list

much faster than:

  egrep -f list

(Alas, in both cases the maximum number of strings in `list' has a fairly
small limit--about 400 for fgrep, and substantially less for egrep.)

So, though I use egrep in about 85% of cases, I still find some use for
fgrep and plain ol' grep.

		-Ed Hall
		edhall@rand-unix        (ARPA)
		decvax!randvax!edhall   (UUCP)

thomas@utah-gr.UUCP (Spencer W. Thomas) (11/15/83)

The other day, I tried
	egrep '^.......?.?.?.?.?$'
after a couple of minutes (!) it told me "regular expression too big"!?!?

Anybody know why this is?  I finally did
	egrep '......' | egrep -v '............'

=Spencer

dan%bbncd@sri-unix.UUCP (11/16/83)

From:  Dan Franklin <dan@bbncd>

Each time the 3 greps are discussed, and people point out that they use
different algorithms, each best for different kinds of regular expressions, I
am puzzled by the leap to the conclusion that they must therefore be different
programs.  Some UNIX C compilers have several different algorithms for the
'switch' statement, choosing either an indexed table, a hashed table with
linear rehash, or the obvious if/then/else structure for the output, depending
on the properties of the input.  These compilers do not provide 'switch1',
'switch2', and 'switch3' statements; the compiler examines the properties of
the case list and chooses the best representation.  If the only difference
between the three greps were the space-time performance of each algorithm, the
sensible thing to do would be to have one 'grep' which chose the most efficient
algorithm for the regular expression--with, perhaps, a switch so the user could
override grep's choice on special occasions (no heuristic can be perfect).

So why doesn't somebody do just that?  Consider how much new-user puzzlement
(and excess unix-wizards mail) would be eliminated!  There is a reason: the
three greps interpret three different forms of regular expression.  You can't
take an arbitrary shell script which uses, say, 'grep' and substitute 'egrep'
everywhere without first scrutinizing each regular expression to make sure it
doesn't have parentheses, vertical bars, etc.  So even if 'egrep' could use a
variant of the 'grep' algorithm in the right circumstances, you couldn't throw
away 'grep'.  (Each command also accepts a different subset of options, but
that problem could be solved.) Too bad.

	Dan Franklin

JTW@MIT-XX.ARPA (11/22/83)

From:  John T. Wroclawski <JTW@MIT-XX.ARPA>

	So why doesn't somebody do just that?  There is a reason: the three
	greps interpret three different forms of regular expression.  You
	can't take an arbitrary shell script which uses, say, 'grep' and
	substitute 'egrep' everywhere without first scrutinizing each regular
	expression to make sure it doesn't have parentheses, vertical bars,
	etc.

Now wait a minute. What's to prevent the (one) unified grep from basing
it's choice of algorithm partly on whether that algorithm can handle the
particular regular expr grep was given as an argument?
-------