[comp.sources.d] The fastest "gmatch"

alter@ttidca.TTI.COM (Steve Alter) (12/21/88)

     A couple of months ago I posted a request for public-domain (or
at least legally copyable) versions of the "gmatch" or "glob" routine
and got a very satisfactory number of responses.  So many, in fact, that
I decided to compare them all and pick the fastest for use in a program
of my own, which will be posted.

     I've completed my tests of all those gmatch-lookalikes.  As it
turns out, only one routine had really crippling bugs in it; the rest
only differed in handling of marginal or degenerate cases such as when
the pattern and/or the subject-string are null.  They also differed on
the character used to indicate the negation of a character-class, but
that's no big deal to change.

     As far as the time-tests are concerned, I really put them through
their paces.  I wrote a program that feeds a specified pattern and
subject-string to each routine and repeating for a specified number of
iterations.  I then wrote a shell-script that runs this test program
using 69 different pattern/string pairs, and 100000 (one hundred
thousand) iterations of each.  I then ran this shell script through 50
complete passes (that's a total of five million iterations) on each of
two different varieties of Vax processors: 11/785 (4.3 BSD) and a
VaxStation 3200 (Ultrix 1.2).  The 3200 ran the whole set in two days
whereas the 11/785 needed three WEEKS (it actually stayed up the whole
time!)  Disclaimer: I don't claim that my 69 test-pairs represent any
kind of "normal" situation; I just tried to catch lots of the syntax
combinations.

     Why five million iterations?  To completely flatten out any
discrepancies caused by fluctuations in system-load and obtain a
statistically significant average.  I was aiming for a very high
signal-to-noise ratio.  It worked because in each test, the standard
deviation turned out to be a very tiny fraction of the mean (all less than
2% on the VaxStation, most around 0.7%) so I believe that I got very
accurate overall timings.

So who is the fastest?

WE HAVE A WINNER!  THE WINNER IS... (the envelope, please)...
	The Free Software Foundation
	with the "glob_match" function!

Now I don't feel right in posting this routine because it's not really
public-domain (we don't see source to gcc on the net, do we?) but I'll
send it to anyone who wants it, as allowed by the FSF license.

If anybody wants a copy of the test environment (I can't imagine why,
but who knows?) then I'll send that out too.

Great thanks to David MacKenzie <edf@rocky2.rockefeller.edu> for
sending me the gnuglob package in the first place.
-- 
Steve Alter        <alter@tti.com>
{csun,philabs,psivax,pyramid,quad1,rdlvax,retix}!ttidca!alter
Citicorp/TTI, Santa Monica CA  (213) 452-9191 x2541