rahardj@ccu.umanitoba.ca (Budi Rahardjo) (05/06/91)
I was wondering if anybody could show me the fastest way to match
a pattern in perl. I have a big flatfile (around 17000 lines).
Using UNIX grep takes around 1 or 2 second, but perl's pattern
matching takes 12 secs.
Need advice ...
-- budi
Here is a simplified benchmark that I use
----- cut here ----
#!/usr/local/bin/perl
# Benchmarking pattern matching
# Sun4 - Sun OS 4.1.1
#
print "Enter a pattern to grep : ";
$pat = <STDIN>; chop $pat;
# Using UNIX grep
$start = time;
open (FLAT,"grep $pat flatfile|");
while (<FLAT>) { print; }
$elapse = time - $start;
print ">>>> UNIX grep takes $elapse sec.\n";
close(FLAT);
# Using perl pattern matching
open(FLAT,"flatfile");
$start = time;
while (<FLAT>) {
if (/$pat/) { print;}
}
$elapse = time - $start;
print ">>>> Perl's pattern matching takes $elapse sec.\n";
close(FLAT);tchrist@convex.COM (Tom Christiansen) (05/07/91)
From the keyboard of rahardj@ccu.umanitoba.ca (Budi Rahardjo):
:I was wondering if anybody could show me the fastest way to match
:a pattern in perl. I have a big flatfile (around 17000 lines).
:Using UNIX grep takes around 1 or 2 second, but perl's pattern
:matching takes 12 secs.
:Need advice ...
:
:-- budi
:
:Here is a simplified benchmark that I use
:
:----- cut here ----
:#!/usr/local/bin/perl
:# Benchmarking pattern matching
:# Sun4 - Sun OS 4.1.1
:#
:print "Enter a pattern to grep : ";
:$pat = <STDIN>; chop $pat;
:
:# Using UNIX grep
:$start = time;
:open (FLAT,"grep $pat flatfile|");
:while (<FLAT>) { print; }
:$elapse = time - $start;
:print ">>>> UNIX grep takes $elapse sec.\n";
:close(FLAT);
:
:# Using perl pattern matching
:open(FLAT,"flatfile");
:$start = time;
:while (<FLAT>) {
: if (/$pat/) { print;}
: }
:$elapse = time - $start;
:print ">>>> Perl's pattern matching takes $elapse sec.\n";
:close(FLAT);
First, don't use a block on your if. It just slows
you down. Perl has to go through a bit of overhead
because of your block declaration -- I'm sure Larry
can much better explain than I the magical contortions
he goes through to optimize some of this stuff. The
short story is that you want to do one of these:
print if /$pat/;
/$pat/ && print;
Second, your pattern is invariant, so tell perl this
by adding a /o modifier to your pattern match.
print if /$pat/o;
Finally, you could change the loop part to an eval so
that you trick the perl compiler into seeing a fixed
expression and thus calling a better checker. Why Larry
doesn't do this on a /o as well, I do not know, but he
doesn't, and this will run faster than the preceding example:
open(FLAT,"flatfile");
$start = time;
eval "while (<FLAT>) { print if /$pat/; }";
$elapse = time - $start;
print ">>>> Perl's pattern matching takes $elapse sec.\n";
close(FLAT);
--tom
--
Tom Christiansen tchrist@convex.com convex!tchrist
"So much mail, so little time." allbery@NCoast.ORG (Brandon S. Allbery KB8JRR/AA) (05/07/91)
As quoted from <1991May6.161906.21161@ccu.umanitoba.ca> by rahardj@ccu.umanitoba.ca (Budi Rahardjo): +--------------- | a pattern in perl. I have a big flatfile (around 17000 lines). | Using UNIX grep takes around 1 or 2 second, but perl's pattern | matching takes 12 secs. | | while (<FLAT>) { | if (/$pat/) { print;} | } +--------------- The problem is that /$pat/ gets recompiled for every line. If you never change the value of $pat, the sequence /$pat/o will be compiled only once. If you plan to change it occasionally, you need to be a bit sneakier: eval 'sub match { print if /$pat/o; }'; Run this every time you change the value of $pat, then call &match to do the comparison. (Since the line is already in $_, just say "&match;" to make it even faster.) (I think.) ++Brandon -- Me: Brandon S. Allbery Ham: KB8JRR/AA 10m,6m,2m,220,440,1.2 Internet: allbery@NCoast.ORG (restricted HF at present) Delphi: ALLBERY AMPR: kb8jrr.AmPR.ORG [44.70.4.88] uunet!usenet.ins.cwru.edu!ncoast!allbery KB8JRR @ WA8BXN.OH