rahardj@ccu.umanitoba.ca (Budi Rahardjo) (05/06/91)
I was wondering if anybody could show me the fastest way to match
a pattern in perl. I have a big flatfile (around 17000 lines).
Using UNIX grep takes around 1 or 2 second, but perl's pattern
matching takes 12 secs.
Need advice ...
-- budi
Here is a simplified benchmark that I use
----- cut here ----
#!/usr/local/bin/perl
# Benchmarking pattern matching
# Sun4 - Sun OS 4.1.1
#
print "Enter a pattern to grep : ";
$pat = <STDIN>; chop $pat;
# Using UNIX grep
$start = time;
open (FLAT,"grep $pat flatfile|");
while (<FLAT>) { print; }
$elapse = time - $start;
print ">>>> UNIX grep takes $elapse sec.\n";
close(FLAT);
# Using perl pattern matching
open(FLAT,"flatfile");
$start = time;
while (<FLAT>) {
if (/$pat/) { print;}
}
$elapse = time - $start;
print ">>>> Perl's pattern matching takes $elapse sec.\n";
close(FLAT);
tchrist@convex.COM (Tom Christiansen) (05/07/91)
From the keyboard of rahardj@ccu.umanitoba.ca (Budi Rahardjo): :I was wondering if anybody could show me the fastest way to match :a pattern in perl. I have a big flatfile (around 17000 lines). :Using UNIX grep takes around 1 or 2 second, but perl's pattern :matching takes 12 secs. :Need advice ... : :-- budi : :Here is a simplified benchmark that I use : :----- cut here ---- :#!/usr/local/bin/perl :# Benchmarking pattern matching :# Sun4 - Sun OS 4.1.1 :# :print "Enter a pattern to grep : "; :$pat = <STDIN>; chop $pat; : :# Using UNIX grep :$start = time; :open (FLAT,"grep $pat flatfile|"); :while (<FLAT>) { print; } :$elapse = time - $start; :print ">>>> UNIX grep takes $elapse sec.\n"; :close(FLAT); : :# Using perl pattern matching :open(FLAT,"flatfile"); :$start = time; :while (<FLAT>) { : if (/$pat/) { print;} : } :$elapse = time - $start; :print ">>>> Perl's pattern matching takes $elapse sec.\n"; :close(FLAT); First, don't use a block on your if. It just slows you down. Perl has to go through a bit of overhead because of your block declaration -- I'm sure Larry can much better explain than I the magical contortions he goes through to optimize some of this stuff. The short story is that you want to do one of these: print if /$pat/; /$pat/ && print; Second, your pattern is invariant, so tell perl this by adding a /o modifier to your pattern match. print if /$pat/o; Finally, you could change the loop part to an eval so that you trick the perl compiler into seeing a fixed expression and thus calling a better checker. Why Larry doesn't do this on a /o as well, I do not know, but he doesn't, and this will run faster than the preceding example: open(FLAT,"flatfile"); $start = time; eval "while (<FLAT>) { print if /$pat/; }"; $elapse = time - $start; print ">>>> Perl's pattern matching takes $elapse sec.\n"; close(FLAT); --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "So much mail, so little time."
allbery@NCoast.ORG (Brandon S. Allbery KB8JRR/AA) (05/07/91)
As quoted from <1991May6.161906.21161@ccu.umanitoba.ca> by rahardj@ccu.umanitoba.ca (Budi Rahardjo): +--------------- | a pattern in perl. I have a big flatfile (around 17000 lines). | Using UNIX grep takes around 1 or 2 second, but perl's pattern | matching takes 12 secs. | | while (<FLAT>) { | if (/$pat/) { print;} | } +--------------- The problem is that /$pat/ gets recompiled for every line. If you never change the value of $pat, the sequence /$pat/o will be compiled only once. If you plan to change it occasionally, you need to be a bit sneakier: eval 'sub match { print if /$pat/o; }'; Run this every time you change the value of $pat, then call &match to do the comparison. (Since the line is already in $_, just say "&match;" to make it even faster.) (I think.) ++Brandon -- Me: Brandon S. Allbery Ham: KB8JRR/AA 10m,6m,2m,220,440,1.2 Internet: allbery@NCoast.ORG (restricted HF at present) Delphi: ALLBERY AMPR: kb8jrr.AmPR.ORG [44.70.4.88] uunet!usenet.ins.cwru.edu!ncoast!allbery KB8JRR @ WA8BXN.OH