[comp.misc] Awk Benchmarks

ajayshah@alhena.usc.edu (Ajay Shah) (06/13/91)

Results of benchmarking awk programs and awk implementations.

Executive Sumary First.

The program

-------------------------
BEGIN {words = 0;}

{words = words + NF;}

END {print "Words = " words " and lines = " NR}
-------------------------

is around 30% slower than the program

-------------------------
{words+=NF}
END {print "Words = " words " and lines = " NR}
-------------------------

As for awk implementations, the options are

	awk (old awk)

	nawk, gawk and mawk (new awk).

awk and nawk come with SunOS.  gawk is GNU Awk.  mawk is another
free awk, released recently on the net.

Amazingly, mawk is the fastest.  It's around 2% faster than awk
but it gives you full nawk programming; you're not crippled on
the language.

gawk is VERY slow.  It's around 3x slower than the others.  BUT
it has VERY LARGE physical limits.  When crunching huge files, if
the other awks blow up, fall back upon gawk.

nawk is reliable -- a commercial product.

Look at the times yourself, on a wc program acting upon a 4Meg
file:

/bin/awk:
10.9u 0.8s 0:12 92% 0+276k 56+0io 56pf+0w

/bin/nawk:
11.9u 0.6s 0:16 78% 0+376k 14+0io 14pf+0w

/max/a/bin/gawk:
46.1u 0.9s 0:53 88% 0+336k 11+0io 11pf+0w

/max/a/bin/mawk:
10.8u 1.1s 0:13 87% 0+388k 0+0io 0pf+0w

For a frame of reference, a C program.
/bin/wc:
4.0u 0.7s 0:05 92% 0+188k 75+0io 75pf+0w

Stop here if you don't want more details.
---------------------------------------------------------------------------


PROBLEM I: BENCHMARKING AWK PROGRAMS

This was done using awk only.  1st version of awk program:

-------------------------
BEGIN {words = 0;}

{words = words + NF;}

END {print "Words = " words " and lines = " NR}
-------------------------

Speed data: (repeated trials)
13.6u 0.7s 0:15 95% 0+296k 52+0io 53pf+0w
13.7u 0.7s 0:15 94% 0+296k 53+0io 53pf+0w
13.9u 0.7s 0:15 94% 0+296k 45+0io 45pf+0w

2nd version of same program:

-------------------------
{words+=NF}
END {print "Words = " words " and lines = " NR}
-------------------------

Speed data: (repeated trials)
10.9u 0.8s 0:12 92% 0+276k 56+0io 56pf+0w
10.9u 0.7s 0:12 93% 0+276k 49+0io 49pf+0w
10.7u 0.9s 0:12 93% 0+276k 51+0io 51pf+0w
10.9u 0.7s 0:11 97% 0+276k 44+3io 44pf+0w

Amazing, ain't it??
---------------------------------------------------------------------------


PROBLEM II: COMPARING AWK IMPLEMENTATIONS

Comparing our stable of awks, using the 2nd version only.

/bin/awk:
10.9u 0.8s 0:12 92% 0+276k 56+0io 56pf+0w
10.9u 0.7s 0:12 93% 0+276k 49+0io 49pf+0w
10.7u 0.9s 0:12 93% 0+276k 51+0io 51pf+0w
10.9u 0.7s 0:11 97% 0+276k 44+3io 44pf+0w

/bin/nawk:
11.9u 0.6s 0:15 83% 0+376k 34+0io 46pf+0w
11.9u 0.6s 0:16 78% 0+376k 14+0io 14pf+0w
11.9u 0.7s 0:16 77% 0+372k 0+0io 0pf+0w
12.0u 0.7s 0:15 82% 0+376k 0+0io 0pf+0w

/max/a/bin/gawk:

45.6u 0.8s 0:52 88% 0+344k 44+0io 47pf+0w
46.1u 0.9s 0:53 88% 0+336k 11+0io 11pf+0w
46.3u 0.9s 0:52 90% 0+340k 4+0io 3pf+0w
45.2u 1.1s 0:53 86% 0+340k 0+1io 0pf+0w

/max/a/bin/mawk:

10.7u 1.0s 0:12 91% 0+384k 48+0io 47pf+0w
10.8u 1.1s 0:13 87% 0+388k 0+0io 0pf+0w
10.8u 1.1s 0:13 87% 0+388k 0+0io 0pf+0w
10.8u 1.2s 0:15 80% 0+384k 0+0io 0pf+0w

(So mawk gives you nawk/gawk programming at a awk price).

-- 
_______________________________________________________________________________
Ajay Shah, (213)734-3930, ajayshah@usc.edu
                             The more things change, the more they stay insane.
_______________________________________________________________________________

david@cs.dal.ca (David Trueman) (06/14/91)

As the principle author of gawk, I felt I had to defend "my baby".


First, there is a new version that would have been released a week ago,
but I thought it better not to do so, before I went away for a week to Usenix.

The new version is at least twice as fast as 2.11.1 and is more robust and
more portable and almost completely compliant with the current POSIX draft.

On the benchmark suite that came with the mawk distribution, this version
of gawk is only about 25% slower than mawk.  On some typical applications,
gawk is significantly faster than mawk (on others it is significantly slower).

I will give a more complet report in a week or two.

-- 
{uunet watmath}!dalcs!david  or  david@cs.dal.ca