[comp.lang.perl] speed: V2 verses V3

schwartz@psuvax1.cs.psu.edu (Scott Schwartz) (12/14/89)

Just for fun I compared the speed of perl 2.0 with perl 3.0
using the pi computing demo by W. Kebsch <nixpbe!kebsch>
(minus the reporting of intermediate results.)

The results, on a Sun4/280:

psuvax1% perl-2 pi.pl 200
pi.pl-1.2, digits:  200, terms:  333, elements:   53

  3. 1415 9265 3589 7932 3846 2643 3832 7950 2884 1971 6939 9375 1058 2097 4944 5923 0781 6406 2862 0899 8628 0348 2534 2117 0679 8214 8086 5132 8230 6647 0938 4460 9550 5822 3172 5359 4081 2848 1117 4502 8410 2701 9385 2110 5559 6446 2294 8954 9303 8196

[u=15.7833  s=.8  cu=0.0166667  cs=.116667]

psuvax1% perl-3 pi.pl 200
pi.pl-1.2, digits:  200, terms:  333, elements:   53

  3. 1415 9265 3589 7932 3846 2643 3832 7950 2884 1971 6939 9375 1058 2097 4944 5923 0781 6406 2862 0899 8628 0348 2534 2117 0679 8214 8086 5132 8230 6647 0938 4460 9550 5822 3172 5359 4081 2848 1117 4502 8410 2701 9385 2110 5559 6446 2294 8954 9303 8196

[u=22.6333  s=1.05  cu=0.0166667  cs=0.0666667]


-- Scott

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (12/16/89)

In article <1808@uvaarpa.virginia.edu> schwartz@psuvax1.cs.psu.edu writes:
: Just for fun I compared the speed of perl 2.0 with perl 3.0
: using the pi computing demo by W. Kebsch <nixpbe!kebsch>
: (minus the reporting of intermediate results.)
: 
: psuvax1% perl-2 pi.pl 200
: [u=15.7833  s=.8  cu=0.0166667  cs=.116667]
: psuvax1% perl-3 pi.pl 200
: [u=22.6333  s=1.05  cu=0.0166667  cs=0.0666667]

This doesn't surprise me at all.  I've done absolutely no optimization
on math operations, and the changes to the run-time system to allow
arrays to be passed around more freely (it's more of a stack machine
now) could certainly adversely affect some of the operations done in pi.pl.
However, I'm better poised now to be able to make a perl-to-C translator.

In fact, I've worried very little about performance for 3.0.  My intent
was to get the interface into a stable configuration, and then worry
about performance.  Patches which enhance performance cause few problems,
but patches which change the interface can bring on lots of headaches.

I'd be more interested in comparisons of text processing performance.
You'll probably find that some tasks are a lot faster, some are a little
faster, and some are a little slower.  Hopefully a net gain.

Larry

tchrist@convex.COM (Tom Christiansen) (12/18/89)

>I'd be more interested in comparisons of text processing performance.
>You'll probably find that some tasks are a lot faster, some are a little
>faster, and some are a little slower.  Hopefully a net gain.

Here are some timings on text handling.  The program is a quickie to
extract all termcap entries that match the command arguments, inspired by
from Larry's lib/termcap.pl Tgetent() routine.

Here are the relevant data:

    % wc /etc/termcap
	2235    8179  102598 /etc/termcap

    % grep -n 'wyse50[:|]' /etc/termcap
    2060:ye|w50|wyse50|Wyse 50:\

    % cat gent
    $| = 1; $\ = "\n";
    for $arg (@ARGV) { do gentry($arg); } 

    sub gentry {
	local ($entry) = @_;
	$TERMCAP = '/etc/termcap';
	open TERMCAP || die "can't open $TERMCAP: $!\n";
	while (<TERMCAP>) {
	    next if /^#/;
	    next if /^\t/;
	    next unless /^(\S*\|)?${entry}[|:]/;
	    chop;
	    while (chop eq '\\') {
		$_ .= <TERMCAP>;
		chop;
	    } 
	    $_ .= ':';
	    s/:\t*:/:/g;
	    print;
	} 
    }

C1 timings: (32 meg)

    c-120% time perl2 gent wyse50 wyse50 wyse50 > /dev/null
    10.5u 1.9s 0:13 89% 0+0k 33+0io 60pf+0w
    c-120% time perl3 gent wyse50 wyse50 wyse50 > /dev/null
    6.9u 0.6s 0:08 89% 0+8k 0+1io 67pf+0w

C2 timings: (128 meg)

    c-220% time perl2 gent wyse50 wyse50 wyse50 > /dev/null
    2.4u 0.7s 0:03 82% 0+0k 25+0io 46pf+0w
    c-220% time perl3 gent wyse50 wyse50 wyse50 > /dev/null
    1.6u 0.1s 0:01 92% 0+0k 7+0io 51pf+0w

Extended precision C2 timings:

    c-220% /bin/time -e perl2 gent wyse50 wyse50 wyse50 > /dev/null
	 3.884330 real        2.448687 user        0.760707 sys
    c-220% /bin/time -e perl3 gent wyse50 wyse50 wyse50 > /dev/null
	 2.737149 real        1.709143 user        0.164598 sys


That means that for THIS application on THESE architectures and THESE
configurations, perl3 runs in 2/3 the user time that perl2 does on both
architectures, and just 1/3 and 1/5 system time respectively on a c1 and a
c2.  (The difference in system time ratios MAY be because the c1 was running
ConvexOS 7.1 the c2 had version 8.0 instead.)  This is sure a pretty nice
overall performance increase in my book.

Oddly enough, on a diskless sun/350 with 4 meg of memory, there was little
variance in user time (5%) but a 50% speedup in system time between perl2
and perl3.

Another interesting note is that if you combine these these 2 statments:
    next if /^#/;
    next if /^\t/;
into 
    next if /^#/ || /^\t/;
then your user time goes from 1.7 to 2.0 on the c2 for perl3, but if 
you instead make them:
    next if /^[#\t]/;
it doesn't change.   Interesting optimizations going on here somewhere.

What kinds of ratios do people get for other machines?


--tom


DISCLAIMER: These timings should not be construed to be official
benchmarks from my employer, whom I do not represent in this capacity.  
They are presented only to illustrate ratios between perl2 and perl3.


    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

schwartz@psuvax1.cs.psu.edu (Scott Schwartz) (12/18/89)

In article <4047@convex.UUCP> tchrist@convex.COM (Tom Christiansen) writes:
>Oddly enough, on a diskless sun/350 with 4 meg of memory, there was little
>variance in user time (5%) but a 50% speedup in system time between perl2
>and perl3.

On my machine, a Sun4/280 w/ 32M, the termcap entry for wyse50 is
matched by an adds entry close to the beginning.  I ran it for a tvi955
instead, which is near the end.  Here's four trials of each...

psuvax1% time perl-3 xxx.pl tvi955 >/dev/null
1.160u 0.240s 0:01.63 85% 0+560k 0+0io 0pf+0w
1.060u 0.330s 0:01.46 95% 0+554k 0+0io 0pf+0w
1.120u 0.330s 0:01.75 82% 0+561k 1+0io 0pf+0w
1.120u 0.270s 0:01.55 89% 0+555k 2+0io 0pf+0w

psuvax1% time perl-2 xxx.pl tvi955 > /dev/null
1.040u 0.530s 0:01.74 90% 0+473k 1+0io 0pf+0w
1.230u 0.390s 0:01.79 90% 0+473k 0+0io 0pf+0w
1.190u 0.420s 0:01.74 92% 0+469k 0+0io 0pf+0w
1.160u 0.460s 0:02.98 54% 0+463k 0+0io 0pf+0w

I think this benchmark is not computationally expensive enough
to give good results.  One second of runtime tells nothing, really.

-- 
Scott Schwartz		<schwartz@shire.cs.psu.edu>
"More mips; cheaper mips; never too many." -- John Mashey

tchrist@convex.COM (Tom Christiansen) (12/18/89)

In article <1989Dec18.032836.16434@psuvax1.cs.psu.edu> schwartz@psuvax1.cs.psu.edu (Scott Schwartz) writes:

>I think this benchmark is not computationally expensive enough
>to give good results.  One second of runtime tells nothing, really.

Basically all true.  That's why I ran I picked an entry 2000 lines into
the file and then passed it the same argument 3 times, which you didn't do
-- you made it only look once.  I chose it because it did a variety of
text things, like regular expression matching and substitutions and
concatenation.  I made it run on a big file and go through several passes
of the same file to make it run long enough to make the results useful.
I think a bigger problem with it is that we don't all have the same
termcap files.  

Here's another termcap benchmark, which exercises split() and associative
arrays.  It also shows about a 1/3 speedup going from perl2 to perl3.

All runs produce this output:
    saw 1365 entries on 2235 lines, 15 duplicates

Here are the timings (both machines had 128meg):
    c1% time perl2 tcount.pl < /etc/termcap  > /dev/null
    4.7u 0.7s 0:05 96% 0+6k 0+1io 150pf+0w
    c1% time perl3 tcount.pl < /etc/termcap  > /dev/null
    3.2u 0.4s 0:03 96% 0+8k 0+2io 132pf+0w
    c2% time perl2 tcount.pl < /etc/termcap  > /dev/null
    1.4u 0.3s 0:01 94% 0+0k 1+1io 137pf+0w
    c2% time perl3 tcount.pl < /etc/termcap  > /dev/null
    0.9u 0.2s 0:01 94% 0+0k 0+1io 116pf+0w

And this was the program:
    #!/usr/bin/perl
    while (<>) {
	$lines++;
	next if /^[#\s]/;
	chop;
	s/:.*//;
	split(/\|/);
	for (@_) {
	    $count++;
	    $seen{$_}++;
	} 
    } 
    @keys = keys(seen);
    printf "saw %d entries on %d lines, %d duplicates\n",
	    $count, $lines, $count - $#keys;

Scott may not like it either because it also runs too quickly.  Anybody want
to post a better benchmark?  I'm having trouble finding something that'll
actually run for a long time.  My cfman program does, but it's totally 
unsuitable as a benchmark because everyone has different man pages.

--tom

    Tom Christiansen                       {uunet,uiucdcs,sun}!convex!tchrist 
    Convex Computer Corporation                            tchrist@convex.COM
		 "EMACS belongs in <sys/errno.h>: Editor too big!"

flee@shire.cs.psu.edu (Felix Lee) (12/18/89)

Tom Christiansen <tchrist@convex.COM> wrote:
> Anybody want to post a better benchmark?  I'm having trouble finding
> something that'll actually run for a long time.

You guys aren't really seriously into text processing, are you. :-)

Here's timings for a perl script that counts word frequencies.
	% time perl-2 wf.pl /etc/termcap >/dev/null
	13.3u + 0.9s = 0:15 (95%); (0k+864k)/92k (0+0)io (0f+80r)pg+0sw
	% !!
	13.4u + 0.7s = 0:14 (98%); (0k+872k)/92k (0+0)io (0f+79r)pg+0sw
	% !!
	13.3u + 0.8s = 0:14 (100%); (0k+872k)/92k (0+0)io (0f+79r)pg+0sw
	% time perl-3 wf.pl /etc/termcap >/dev/null
	18.6u + 1.0s = 0:20 (95%); (0k+944k)/84k (0+0)io (0f+73r)pg+0sw
	% !!
	18.7u + 0.9s = 0:20 (95%); (0k+944k)/84k (0+0)io (0f+72r)pg+0sw
	% !!
	18.7u + 0.9s = 0:20 (94%); (0k+944k)/84k (0+0)io (0f+73r)pg+0sw

This is on a Sun-4.  /etc/termcap is 146k, about 32000 total words,
about 2000 different words, average word length is 3 chars.

If you want worse behavior, try /usr/dict/words.  About 24000 words,
every one unique, average length 7 chars.  I get 103.0u for perl-2 and
158.2u for perl-3.

If you eliminate the simple arithmetic in the script, perl-3 performs
a little better, but still worse than perl-2.

Here's the script.

#!/usr/bin/perl
# Count word frequency.
while (<>) {
	foreach $k (split(/[^a-zA-Z]+/)) {
		$k =~ tr/A-Z/a-z/, ++$freq{$k} if ($k);
	}
}
foreach $k (sort downfreq keys(freq)) {
	printf "%5d %s\n", $freq{$k}, $k;
}
sub downfreq {
	($freq{$b} - $freq{$a}) || ($a gt $b);
}
--
Felix Lee	flee@shire.cs.psu.edu	*!psuvax1!flee

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (12/19/89)

In article <1989Dec18.112735.4443@psuvax1.cs.psu.edu> flee@shire.cs.psu.edu (Felix Lee) writes:
: Here's timings for a perl script that counts word frequencies.
: 	% time perl-2 wf.pl /etc/termcap >/dev/null
: 	13.3u + 0.9s = 0:15 (95%); (0k+864k)/92k (0+0)io (0f+80r)pg+0sw
: 	% !!
: 	13.4u + 0.7s = 0:14 (98%); (0k+872k)/92k (0+0)io (0f+79r)pg+0sw
: 	% !!
: 	13.3u + 0.8s = 0:14 (100%); (0k+872k)/92k (0+0)io (0f+79r)pg+0sw
: 	% time perl-3 wf.pl /etc/termcap >/dev/null
: 	18.6u + 1.0s = 0:20 (95%); (0k+944k)/84k (0+0)io (0f+73r)pg+0sw
: 	% !!
: 	18.7u + 0.9s = 0:20 (95%); (0k+944k)/84k (0+0)io (0f+72r)pg+0sw
: 	% !!
: 	18.7u + 0.9s = 0:20 (94%); (0k+944k)/84k (0+0)io (0f+73r)pg+0sw
: 
: 
: This is on a Sun-4.  /etc/termcap is 146k, about 32000 total words,
: about 2000 different words, average word length is 3 chars.
: 
: If you want worse behavior, try /usr/dict/words.  About 24000 words,
: every one unique, average length 7 chars.  I get 103.0u for perl-2 and
: 158.2u for perl-3.
: 
: Here's the script.
: 
: #!/usr/bin/perl
: # Count word frequency.
: while (<>) {
: 	foreach $k (split(/[^a-zA-Z]+/)) {
: 		$k =~ tr/A-Z/a-z/, ++$freq{$k} if ($k);
: 	}
: }
: foreach $k (sort downfreq keys(freq)) {
: 	printf "%5d %s\n", $freq{$k}, $k;
: }
: sub downfreq {
: 	($freq{$b} - $freq{$a}) || ($a gt $b);
: }

This particular script is exercising almost none of the constructs that
were sped up in perl 3, and several of the constructs that were slowed
down.

In particular, the sorting is probably a little slower for a couple of
reasons.  First, subroutine calls run a little slower due to the code
to handle array returns.  Second, associative array references are a
bit slower due to the check for dbm arrays, and making sure associative
arrays don't create themselves when checked by the "defined" function.

The foreach is also a bit slower due to allowing for nested references
to the same array.

Disclaimer: the above is merely well-informed speculation.  Profiling might
well pinpoint some other culprit.

Larry

flee@shire.cs.psu.edu (Felix Lee) (12/19/89)

Larry Wall <lwall@jpl-devvax.JPL.NASA.GOV> wrote:
> This particular script is exercising almost none of the constructs
> that were sped up in perl 3, and several of the constructs that were
> slowed down.

Yes, I suspected as much, especially the associative array references.
I offered that script as an example of a typically expensive text
application.  Most perl-as-a-report-generator applications aren't
going to be nearly as expensive, but will do much the same thing.
Assocs are more useful than normal arrays for report generation;
they're natural for tabulating data.  (That's why the reverse-cat in
awk is so slow: awk has assocs, not regular arrays.)

But anyway, the enhancements in version 3 are great for all those
writing serious applications (like newsreaders:-) in Perl.
--
Felix Lee	flee@shire.cs.psu.edu	*!psuvax1!flee