[comp.sources.d] Mawk, an implementation of new awk

brennan@ssc-vax.UUCP (Mike Brennan) (05/11/91)

Mawk 0.97 an implementation of awk as defined in 
Aho, Kernighan and Weinberger, The AWK Programming Language,
Addison-Wesley, 1988
can be obtained by anonymous ftp from oxy.edu (134.69.1.2).
If you don't have ftp resources, check alt.sources in the
next few days or I could email it to you.

The 1988 definition of awk adds several features that
significantly enhance the language, including expanded
regular expression features, built-in substitution functions,
more IO control and user defined functions.

Advantages of mawk:
 1) Mawk is fast.  Benchmarks against 3 other new awks follow.
    The test programs are included in the distribution.

 2) The record separator, RS, is interpreted as a regular expression.
    This make operations which are basically not line oriented easier
    to program and improves performance in some cases.


The following are some timing tests of Mawk versus three
other (new) awks.  Times are user + sys in seconds. First
col is mawk time, second col is other awk time and last
col is the ratio.


Mawk vs.  Awk  on Stardent 3000, SysV 3.0
    cat           4.0    4.8   1.20
    wc            8.1    6.1   0.75
    fields       20.8   26.3   1.26
    reg0          4.7    6.0   1.28
    reg1          5.6    6.0   1.07
    reg2         18.1    6.0   0.33
    loops         6.0   12.6   2.10
    words        18.1   18.4   1.02
    newton *      0.9    1.7   1.89
    concat       14.1   15.0   1.06
    primes *      2.6    3.1   1.19
    squeeze       5.3    2.9   0.55
    qsort         6.8   21.3   3.13
    wfrq          8.9   10.0   1.12
    deps **       3.1    5.2   1.68
			       1.15 #

Mawk vs.  Gawk 2.11.1  on Sun3, SunOS 4.0
    cat           6.1    8.1   1.33
    wc           12.6   40.6   3.22
    fields       35.7  117.6   3.29
    reg0          6.7   11.0   1.64
    reg1          8.5   12.6   1.48
    reg2         34.3   55.5   1.62
    loops        40.4  214.6   5.31
    words        31.3  110.8   3.54
    newton        6.7   25.4   3.79
    concat       20.9   65.7   3.14
    primes       38.8   28.3   0.73
    squeeze       2.2    4.6   2.09
    qsort        36.1   42.3   1.17
    wfrq         66.5  199.4   3.00
    deps         16.2   42.9   2.65
			       2.24 #

Mawk vs.  Nawk  on  VAX  3600  Ultrix 4.1
    cat           5.7    7.7   1.35
    wc           12.8   12.4   0.97
    fields       34.1   58.9   1.73
    reg0          7.1    8.6   1.21
    reg1          8.9   21.8   2.45
    reg2         36.7   58.4   1.59
    loops        30.5  117.7   3.86
    words        31.0   58.7   1.89
    newton        5.6   11.9   2.12
    concat       21.3   28.9   1.36
    primes       36.3   17.2   0.47
    squeeze       2.2    3.1   1.41
    qsort        39.3   29.5   0.75
    wfrq         76.2  173.9   2.28
    deps         18.1   32.4   1.79
			       1.50 #

* newton , primes and loops take no input. 
Newton computed the square roots of 1 to 1000 by Newtons method 
and primes was a sieve for primes < 5000.  Loops was three nested
loops 100 x 50 x 50 with a sum on the inside.

** deps input was *.c on mawk source

The other programs read a file of 20000+ C source lines.
The input files were blownup by 4 (80000+) on the Stardent.

# geometric mean of col 3 --  (a1 * a2 * ... an ) ^ (1/n),
  this is a more appropriated average for ratios than the
  arithemetic mean.


Mike Brennan
brennan@bcsaic.boeing.com
206-773-4425