[comp.lang.perl] pattern matching performance

rahardj@ccu.umanitoba.ca (Budi Rahardjo) (05/06/91)

I was wondering if anybody could show me the fastest way to match
a pattern in perl. I have a big flatfile (around 17000 lines).
Using UNIX grep takes around 1 or 2 second, but perl's pattern
matching takes 12 secs. 
Need advice ...

-- budi

Here is a simplified benchmark that I use 

----- cut here ----
#!/usr/local/bin/perl
# Benchmarking pattern matching
# Sun4 - Sun OS 4.1.1
#
print "Enter a pattern to grep : ";
$pat = <STDIN>; chop $pat;

# Using UNIX grep
$start = time;
open (FLAT,"grep $pat flatfile|");
while (<FLAT>) { print; }
$elapse = time - $start;
print ">>>> UNIX grep takes $elapse sec.\n";
close(FLAT);

# Using perl pattern matching
open(FLAT,"flatfile");
$start = time;
while (<FLAT>) {
    if (/$pat/) { print;}
    }
$elapse = time - $start;
print ">>>> Perl's pattern matching takes $elapse sec.\n";
close(FLAT);

tchrist@convex.COM (Tom Christiansen) (05/07/91)

From the keyboard of rahardj@ccu.umanitoba.ca (Budi Rahardjo):
:I was wondering if anybody could show me the fastest way to match
:a pattern in perl. I have a big flatfile (around 17000 lines).
:Using UNIX grep takes around 1 or 2 second, but perl's pattern
:matching takes 12 secs. 
:Need advice ...
:
:-- budi
:
:Here is a simplified benchmark that I use 
:
:----- cut here ----
:#!/usr/local/bin/perl
:# Benchmarking pattern matching
:# Sun4 - Sun OS 4.1.1
:#
:print "Enter a pattern to grep : ";
:$pat = <STDIN>; chop $pat;
:
:# Using UNIX grep
:$start = time;
:open (FLAT,"grep $pat flatfile|");
:while (<FLAT>) { print; }
:$elapse = time - $start;
:print ">>>> UNIX grep takes $elapse sec.\n";
:close(FLAT);
:
:# Using perl pattern matching
:open(FLAT,"flatfile");
:$start = time;
:while (<FLAT>) {
:    if (/$pat/) { print;}
:    }
:$elapse = time - $start;
:print ">>>> Perl's pattern matching takes $elapse sec.\n";
:close(FLAT);

First, don't use a block on your if.   It just slows
you down.  Perl has to go through a bit of overhead 
because of your block declaration -- I'm sure Larry
can much better explain than I the magical contortions
he goes through to optimize some of this stuff.  The
short story is that you want to do one of these:

    print if /$pat/;
    /$pat/ && print;

Second, your pattern is invariant, so tell perl this
by adding a /o modifier to your pattern match.

    print if /$pat/o;

Finally, you could change the loop part to an eval so
that you trick the perl compiler into seeing a fixed
expression and thus calling a better checker.  Why Larry
doesn't do this on a /o as well, I do not know, but he
doesn't, and this will run faster than the preceding example:

    open(FLAT,"flatfile");
    $start = time;
    eval "while (<FLAT>) { print if /$pat/; }";
    $elapse = time - $start;
    print ">>>> Perl's pattern matching takes $elapse sec.\n";
    close(FLAT);

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
		"So much mail, so little time." 

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR/AA) (05/07/91)

As quoted from <1991May6.161906.21161@ccu.umanitoba.ca> by rahardj@ccu.umanitoba.ca (Budi Rahardjo):
+---------------
| a pattern in perl. I have a big flatfile (around 17000 lines).
| Using UNIX grep takes around 1 or 2 second, but perl's pattern
| matching takes 12 secs. 
| 
| while (<FLAT>) {
|     if (/$pat/) { print;}
|     }
+---------------

The problem is that /$pat/ gets recompiled for every line.  If you never
change the value of $pat, the sequence /$pat/o will be compiled only once.
If you plan to change it occasionally, you need to be a bit sneakier:

eval 'sub match { print if /$pat/o; }';

Run this every time you change the value of $pat, then call &match to do
the comparison.  (Since the line is already in $_, just say "&match;" to make
it even faster.)  (I think.)

++Brandon
-- 
Me: Brandon S. Allbery			  Ham: KB8JRR/AA  10m,6m,2m,220,440,1.2
Internet: allbery@NCoast.ORG		       (restricted HF at present)
Delphi: ALLBERY				 AMPR: kb8jrr.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery       KB8JRR @ WA8BXN.OH