alter@ttidca.TTI.COM (Steve Alter) (12/21/88)
A couple of months ago I posted a request for public-domain (or
at least legally copyable) versions of the "gmatch" or "glob" routine
and got a very satisfactory number of responses. So many, in fact, that
I decided to compare them all and pick the fastest for use in a program
of my own, which will be posted.
I've completed my tests of all those gmatch-lookalikes. As it
turns out, only one routine had really crippling bugs in it; the rest
only differed in handling of marginal or degenerate cases such as when
the pattern and/or the subject-string are null. They also differed on
the character used to indicate the negation of a character-class, but
that's no big deal to change.
As far as the time-tests are concerned, I really put them through
their paces. I wrote a program that feeds a specified pattern and
subject-string to each routine and repeating for a specified number of
iterations. I then wrote a shell-script that runs this test program
using 69 different pattern/string pairs, and 100000 (one hundred
thousand) iterations of each. I then ran this shell script through 50
complete passes (that's a total of five million iterations) on each of
two different varieties of Vax processors: 11/785 (4.3 BSD) and a
VaxStation 3200 (Ultrix 1.2). The 3200 ran the whole set in two days
whereas the 11/785 needed three WEEKS (it actually stayed up the whole
time!) Disclaimer: I don't claim that my 69 test-pairs represent any
kind of "normal" situation; I just tried to catch lots of the syntax
combinations.
Why five million iterations? To completely flatten out any
discrepancies caused by fluctuations in system-load and obtain a
statistically significant average. I was aiming for a very high
signal-to-noise ratio. It worked because in each test, the standard
deviation turned out to be a very tiny fraction of the mean (all less than
2% on the VaxStation, most around 0.7%) so I believe that I got very
accurate overall timings.
So who is the fastest?
WE HAVE A WINNER! THE WINNER IS... (the envelope, please)...
The Free Software Foundation
with the "glob_match" function!
Now I don't feel right in posting this routine because it's not really
public-domain (we don't see source to gcc on the net, do we?) but I'll
send it to anyone who wants it, as allowed by the FSF license.
If anybody wants a copy of the test environment (I can't imagine why,
but who knows?) then I'll send that out too.
Great thanks to David MacKenzie <edf@rocky2.rockefeller.edu> for
sending me the gnuglob package in the first place.
--
Steve Alter <alter@tti.com>
{csun,philabs,psivax,pyramid,quad1,rdlvax,retix}!ttidca!alter
Citicorp/TTI, Santa Monica CA (213) 452-9191 x2541