[net.lang.prolog] BENCHMARKING PROLOG SYSTEMS, PART 1 of 3

jclaude@ecrcvax.UUCP (Jean Claude Syre) (07/02/86)

Expires:
References:
Sender:
Reply-To: jclaude@ecrcvax.UUCP (Jean Claude Syre)
Followup-To:
Distribution:
Organization: European Computer-Industry Research Centre, Munchen, W. Germany
Keywords:  long file


         ***********************************************
         *** BENCHMARK PROGRAMS FOR PROLOG SYSTEMS   ***
         ***    (FINAL VERSION)                      ***
         ***********************************************
                       Part 1 (of 3)
J.C. SYRE
ECRC (European Computer-industry Research Center)
Arabellastr. 17
D-8000 MUNICH 81
WEST GERMANY

mcvax!unido!ecrcvax!jclaude
jclaude%ecrcvax.UUCP@Germany.CSNET

This set of benchmark programs is a collective work done
by the Logic Programming Group and the Computer Architecture
Group of ECRC, the European Computer-Industry Research Centre,
in Munich, AS WELL AS BY OTHER PEOPLE HAVING RESPONDED TO OUR
FIRST PROPOSAL SOME TIME AGO. Many thanks to all people who
have helped us improving this benchmark.

For convenience, you can send messages to Hans Benker
(replace "jclaude" by "hans" in this net address), or
myself.
The designers of the benchmark programs are: H. Benker, J. Noye,
Micha Meier, S. Schmitz, J.C. Syre, and ****many others**** from ECRC
and from other places in the world who brought many useful
comments.

The first section deals with simple programs whose single
purpose is to evaluate a single feature of prolog execution.
The times we give correspond to an interpretation using Cprolog
on a VAX11/785 under UNIX BSD4.2, in a "quiet" environment
(which may be obtained on a sunny Sunday with all other logins
prohibited). They are subject to a 10 percent inaccuracy due to
paging and timecounts.

The second section presents more complex programs, which may
still run on small Prolog systems. We are open to suggestions
to improve the significance of the benchmark. The programs
include many of the programs taken by the University of
Berkeley for their evaluation of the PLM1 prolog machine.

There should be a third section, that we have not fully built yet,
and we count on you to build it. Those
programs should be representative of real scale prototypes of
large Prolog applications. CHAT80 is an example of such
programs for this section. If you feel embarassed to propagate
one of your programs having properties that you are reluctant to
make public, you may think of modifying it or truncate it so
that it becomes hardly readable, or useless for actual use by
others.

THIS IS OUR FINAL VERSION. We have taken into account a lot of
remarks made by those of you who responded to our proposal of
Prolog benchmark made two months ago. Not all of your
suggestions have been included (mainly to keep the benchmark
within a reasonable size), but many of them.

We are expecting a large number of answers from you. We can
also send the benchmark by normal mail if necessary.

IN ORDER TO KEEP THE POSSIBILITY OF COMPARING RESULTS FROM
VARIOUS SOURCES AND SYSTEMS, YOU ARE KINDLY ASKED NOT TO CHANGE
THE SOURCE PROGRAMS. SEND YOUR RESULTS TO OUR ADDRESS. WE WILL
MAKE A COMPILATION OF THEM, AND SEND THEM TO THE NET, UNLESS
YOU DO NOT WISH IT (IN THAT CASE, PLEASE MENTION IT
EXPLICITELY).

Have fun !



1. Simple Benchmark programs.

These simple (or simplistic) programs aim at evaluating a
single feature of the Prolog System. Here a Prolog System is
understood as either a pair <Prolog software, Host machine>,
where the Prolog software is an interpreter, a compiler, a
combination of both, and the Host machine is a conventional
machine (with its operating system and workload),
a simulator of a prolog processor, or a real piece of
Prolog hardware (direct interpreter, or PLM processor, or
anything else).

The "single feature" mentioned above means that the performance
results will show how well the Prolog System can handle a
particular characteristic of the language.

The phenomena we measure are:

 o   calls (section 1.1)
         program: boresea(N))
 o   non-determinism (section 1.2)
         programs: choice_point(N),
         choice_point0ar(N), baktrak1(N), baktrak2(N)
 o   handling of environments (section 1.3)
         programs: envir(N), envir0ar(N)
 o   indexing (section 1.4)
         program: index(N)
 o   unification (section 1.5)
         programs: construct_list(N), match_list(N),
         construct_structure(N), match_structure(N),
         match_nested_structure(N), general_unification(N)
 o   dereferencing (section 1.6)
         program: deref(N)
 o   cut (section 1.7)
         program: cuttest(N)

There are many more which would be interesting to measure, e.g.
efficiency of built-ins, "assert" and "retract", I/O and tail
recursion optimisation. However for now the above 7 criteria
seemed to be the most interesting and maybe somebody on
the net can design benchmarks for the remaining features.

Measuring a single feature of a language is difficult. One single
execution of a tiny program testing a particular feature takes
not enough time to measure it precisely. To get a better precision
one has to execute the test program hundreds of times. There are two
ways to do this: Write down the test program as often as one wants
to execute it or include it in a loop. The first solution
implies that one has to write programs with hundreds of lines
of code, where each line does the same job.
This is not convenient and it is desirable to use loops. In the
case of our benchmark programs however, the time spent executing
the loop is not negligible, due to the very small size of our test
programs. Therefore we used a combination of both methods, i.e.
sequences of repeated code surrounded with a loop. In order to
minimise the effect of the loop, we actually run as well an "empty"
loop, without the benchmark program. We call this "compensation loop"
and subtract its execution time  from the execution time of
the loop including the benchmark program. This increases of course
the relative error on the time measurement, but we have decreased the
influence of the unavoidable loop.

The repeated code can be generated by your favorite editor.
How much repeated code you need to get a sufficient precision of
course depends on the implementation of your particular Prolog system.
However we put as much repeated code into each benchmark program
as we think is apropriate to most Prolog implementations.
So we think you should get sufficient precision without modifying
our programs in that respect.
The listings of the programs follow below. For each simple program
we try to give the characteristics of it and some remarks about what
it measures.
Note that "cputime" in C-Prolog on the VAX gives you the possibility
to measure runtime. This may be different in other Prolog systems.
All the rest of the programs should be portable to any other system
without any problem.


1.1. Program to test calls  (boresea).

This is the one you always dreamed to test! Like all benchmarks,
it uses a loop calling the actual benchmark program. The benchmark
program consists of a sequence of 200 predicates having no arguments,
no choice points, NOTHING. 200 is chosen to have sufficient accuracy
in measuring the execution time.

The results show the effect of pure calls, and the Klips performance
can be called the peak performance of the prolog system. Note
that the peak performance has very little significance to classify
the overall performance of a Prolog system.

---------------- cut here - beginning of program listing ----------

/* This program is called with the query "?-boresea(X)."         */
/* X is the number of loop iterations executed. It should be big */
/* enough to give significant results.                           */
/* suggested value for X: 100 for interpreted code*/
/*                       1000 for compiled code   */
/* average values for C-prolog interpreter:       */
/*       X=1000, Tloop=27.1 T.comp=1.0 Tnet=26.1 Klips=7.7 */

boresea(X)
     :- T1 is cputime,
        do_max_KLips(X),       /* calls the loop to execute the  */
        T2 is cputime,         /* sequence of 200 predicates     */
        compens_loop(X),       /* compensation loop              */
        T3 is cputime,
        print_times(T1,T2,T3,X,200)./*compute and print results  */


compens_loop(0).                    /* compensation loop         */
compens_loop(X) :- Y is X - 1, compens_loop(Y).

print_times(T1,T2,T3,X,I) :-        /* prints the results        */
        TT1 is T2 - T1,
        TT2 is T3 - T2,
        TT is TT1 - TT2,
        write('T overall loop:   '),write(TT1), nl,
        write('T compens loop:   '),write(TT2), nl,
        write('T net:            '),write(TT),nl,
        write('KLips:            '),
        Li is I * X,
        Lips is Li / TT,
        KLips is Lips / 1000,
        write(KLips),nl,nl.

do_max_KLips(0).            /* loop calling the actual benchmark */
do_max_KLips(X) :- lips1, Y is X - 1, do_max_KLips(Y).

/* predicates to test call */

lips1 :- lips2.
lips2 :- lips3.
lips3 :- lips4.
lips4 :- lips5.
lips5 :- lips6.
lips6 :- lips7.
lips7 :- lips8.
lips8 :- lips9.
lips9 :- lips10.
lips10 :- lips11.
lips11 :- lips12.
lips12 :- lips13.
lips13 :- lips14.
lips14 :- lips15.
lips15 :- lips16.
lips16 :- lips17.
lips17 :- lips18.
lips18 :- lips19.
lips19 :- lips20.
lips20 :- lips21.
lips21 :- lips22.
lips22 :- lips23.
lips23 :- lips24.
lips24 :- lips25.
lips25 :- lips26.
lips26 :- lips27.
lips27 :- lips28.
lips28 :- lips29.
lips29 :- lips30.
lips30 :- lips31.
lips31 :- lips32.
lips32 :- lips33.
lips33 :- lips34.
lips34 :- lips35.
lips35 :- lips36.
lips36 :- lips37.
lips37 :- lips38.
lips38 :- lips39.
lips39 :- lips40.
lips40 :- lips41.
lips41 :- lips42.
lips42 :- lips43.
lips43 :- lips44.
lips44 :- lips45.
lips45 :- lips46.
lips46 :- lips47.
lips47 :- lips48.
lips48 :- lips49.
lips49 :- lips50.
lips50 :- lips51.
lips51 :- lips52.
lips52 :- lips53.
lips53 :- lips54.
lips54 :- lips55.
lips55 :- lips56.
lips56 :- lips57.
lips57 :- lips58.
lips58 :- lips59.
lips59 :- lips60.
lips60 :- lips61.
lips61 :- lips62.
lips62 :- lips63.
lips63 :- lips64.
lips64 :- lips65.
lips65 :- lips66.
lips66 :- lips67.
lips67 :- lips68.
lips68 :- lips69.
lips69 :- lips70.
lips70 :- lips71.
lips71 :- lips72.
lips72 :- lips73.
lips73 :- lips74.
lips74 :- lips75.
lips75 :- lips76.
lips76 :- lips77.
lips77 :- lips78.
lips78 :- lips79.
lips79 :- lips80.
lips80 :- lips81.
lips81 :- lips82.
lips82 :- lips83.
lips83 :- lips84.
lips84 :- lips85.
lips85 :- lips86.
lips86 :- lips87.
lips87 :- lips88.
lips88 :- lips89.
lips89 :- lips90.
lips90 :- lips91.
lips91 :- lips92.
lips92 :- lips93.
lips93 :- lips94.
lips94 :- lips95.
lips95 :- lips96.
lips96 :- lips97.
lips97 :- lips98.
lips98 :- lips99.
lips99 :- lips100.
lips100:- lips101.
lips101 :- lips102.
lips102 :- lips103.
lips103 :- lips104.
lips104 :- lips105.
lips105 :- lips106.
lips106 :- lips107.
lips107 :- lips108.
lips108 :- lips109.
lips109 :- lips110.
lips110 :- lips111.
lips111 :- lips112.
lips112 :- lips113.
lips113 :- lips114.
lips114 :- lips115.
lips115 :- lips116.
lips116 :- lips117.
lips117 :- lips118.
lips118 :- lips119.
lips119 :- lips120.
lips120 :- lips121.
lips121 :- lips122.
lips122 :- lips123.
lips123 :- lips124.
lips124 :- lips125.
lips125 :- lips126.
lips126 :- lips127.
lips127 :- lips128.
lips128 :- lips129.
lips129 :- lips130.
lips130 :- lips131.
lips131 :- lips132.
lips132 :- lips133.
lips133 :- lips134.
lips134 :- lips135.
lips135 :- lips136.
lips136 :- lips137.
lips137 :- lips138.
lips138 :- lips139.
lips139 :- lips140.
lips140 :- lips141.
lips141 :- lips142.
lips142 :- lips143.
lips143 :- lips144.
lips144 :- lips145.
lips145 :- lips146.
lips146 :- lips147.
lips147 :- lips148.
lips148 :- lips149.
lips149 :- lips150.
lips150 :- lips151.
lips151 :- lips152.
lips152 :- lips153.
lips153 :- lips154.
lips154 :- lips155.
lips155 :- lips156.
lips156 :- lips157.
lips157 :- lips158.
lips158 :- lips159.
lips159 :- lips160.
lips160 :- lips161.
lips161 :- lips162.
lips162 :- lips163.
lips163 :- lips164.
lips164 :- lips165.
lips165 :- lips166.
lips166 :- lips167.
lips167 :- lips168.
lips168 :- lips169.
lips169 :- lips170.
lips170 :- lips171.
lips171 :- lips172.
lips172 :- lips173.
lips173 :- lips174.
lips174 :- lips175.
lips175 :- lips176.
lips176 :- lips177.
lips177 :- lips178.
lips178 :- lips179.
lips179 :- lips180.
lips180 :- lips181.
lips181 :- lips182.
lips182 :- lips183.
lips183 :- lips184.
lips184 :- lips185.
lips185 :- lips186.
lips186 :- lips187.
lips187 :- lips188.
lips188 :- lips189.
lips189 :- lips190.
lips190 :- lips191.
lips191 :- lips192.
lips192 :- lips193.
lips193 :- lips194.
lips194 :- lips195.
lips195 :- lips196.
lips196 :- lips197.
lips197 :- lips198.
lips198 :- lips199.
lips199 :- lips200.
lips200.

--------------------cut here - end of program listing------------


1.2. Program to test non deterministic behaviour

This program contains a series of 3 different benchmark predicates.

The predicate "choice_point(N)" tests calls invoking the creation
of a choice point, i.e. a branch point where the execution
will possibly come back to in case of backtracking. It
does NOT backtrack. Two versions are proposed, one with and the other
without arguments.

We then present two predicates to evaluate the mechanism of
backtracking during execution. Both predicates create one
choice_point and then backtrack 20 times on every loop iteration step.
"baktrak1(N)" exhibits a kind of backtracking called "deep",
while "baktrak2(N)" deals with "shallow" backtracking.
Both are worth being tried, whatever your particular Prolog
System is.

------------cut here - beginning of program listing----------------
/* The predicates are called:                             */

/*    o  "choice_point(N)"    - creation of choice points    */
/*    o  "choice_point0ar(N)  - same, with 0 arg             */
/*    o  "baktrak1(N)"        - deep backtracking            */
/*    o  "baktrak2(N)"        - shallow backtracking         */

/*  N is the number of loop iterations executed  */


/* predicate to test creation of choice points without backtracking */
/* suggested value for N: 1000 */
/* results for  Cprolog N=1000 */
/* Tloop=5.95 Tcompens=0.98 Tnet=4.97 Klips=4.02 */

choice_point(N):-T1 is cputime,
        cre_CP(N), T2 is cputime,
        compens_loop(N), T3 is cputime,
        print_times(T1,T2,T3,N,20).

/* predicate choice_point, but with zero argument  */
/* suggested value for N: 1000 */
/* results for Cprolog: N=1000 */
/* Tloop=3.55 Tcompens=0.98 Tnet=2.57 Klips=7.7  */


choice_point0ar(N):-T1 is cputime,
        cre_CP0ar(N), T2 is cputime,
        compens_loop(N), T3 is cputime,
        print_times(T1,T2,T3,N,20).
/* Predicate to test the (deep) backtracking mechanism. */
/* suggested value for N: 1000 (interp), 2000(comp) */
/* results for Cprolog: N=1000  */
/* Tloop=9.63 Tcomp=1 Tnet=8.63 Klips=2.32  */

baktrak1(N)
     :- T1 is cputime,
        deep_back(N),
        T2 is cputime,
        compens_loop(N),
        T3 is cputime,
        print_times(T1,T2,T3,N,20).


/* Predicate to test the (shallow) backtracking mechanism */
/* suggested value for N: 1000 (interp), 2000 (comp) */
/* results for Cprolog: N=1000  */
/* Tloop=3.63  Tcomp=0.95 Tnet=2.68 Klips=7.45 */

baktrak2(X)
     :- T1 is cputime,
        shallow_back(X), T2 is cputime,
        compens_loop(X), T3 is cputime,
        print_times(T1,T2,T3,X,20).


/* compensation loop, used to measure the time spent in the loop  */
compens_loop(0).
compens_loop(X) :- Y is X - 1, compens_loop(Y).

/* loop to test choice point creation   */
cre_CP(0).
cre_CP(N):-M is N-1, ccp1(0,0,0), cre_CP(M).

cre_CP0ar(0).
cre_CP0ar(N):-M is N-1, ccp1, cre_CP0ar(M).

/* loop to test deep backtracking       */
deep_back(0).
deep_back(X) :- pd(_,_,_), Y is X - 1, deep_back(Y).

/* loop to test shallow backtracking */
shallow_back(0).
shallow_back(X) :- ps(_,a,b), Y is X - 1, shallow_back(Y).


print_times(T1,T2,T3,X,I) :-        /* prints the results */
        TT1 is T2 - T1,
        TT2 is T3 - T2,
        TT is TT1 - TT2,
        write('T overall loop:   '),write(TT1), nl,
        write('T compens loop:   '),write(TT2), nl,
        write('T net:            '),write(TT),nl,
        write('KLips:            '),
        Li is I * X,
        Lips is Li / TT,
        KLips is Lips / 1000,
        write(KLips),nl,nl.

/*  ccp1 creates 20 choice points */
/* ccp1 is the beginning of a set of predicates  */
/* composed of 2 clauses each. Every invokation of nd0 will create */
/* a sequence of 20 choice points. The body of the clauses are     */
/* limited to one goal, thus avoiding a creation of environment    */
/* when the clause is activated. nd0, and its successors, have     */
/*   three arguments to comply with our average static analysis    */
/*   results made on more than 30 real Prolog programs.            */
/* ccpXX exists with 3 arguments, and 0 args. */

ccp1(X,Y,Z):-ccp2(X,Y,Z).
ccp1(X,Y,Z).
ccp2(X,Y,Z):-ccp3(X,Y,Z).
ccp2(X,Y,Z).
ccp3(X,Y,Z):-ccp4(X,Y,Z).
ccp3(X,Y,Z).
ccp4(X,Y,Z):-ccp5(X,Y,Z).
ccp4(X,Y,Z).
ccp5(X,Y,Z):-ccp6(X,Y,Z).
ccp5(X,Y,Z).
ccp6(X,Y,Z):-ccp7(X,Y,Z).
ccp6(X,Y,Z).
ccp7(X,Y,Z):-ccp8(X,Y,Z).
ccp7(X,Y,Z).
ccp8(X,Y,Z):-ccp9(X,Y,Z).
ccp8(X,Y,Z).
ccp9(X,Y,Z):-ccp10(X,Y,Z).
ccp9(X,Y,Z).
ccp10(X,Y,Z):-ccp11(X,Y,Z).
ccp10(X,Y,Z).
ccp11(X,Y,Z):-ccp12(X,Y,Z).
ccp11(X,Y,Z).
ccp12(X,Y,Z):-ccp13(X,Y,Z).
ccp12(X,Y,Z).
ccp13(X,Y,Z):-ccp14(X,Y,Z).
ccp13(X,Y,Z).
ccp14(X,Y,Z):-ccp15(X,Y,Z).
ccp14(X,Y,Z).
ccp15(X,Y,Z):-ccp16(X,Y,Z).
ccp15(X,Y,Z).
ccp16(X,Y,Z):-ccp17(X,Y,Z).
ccp16(X,Y,Z).
ccp17(X,Y,Z):-ccp18(X,Y,Z).
ccp17(X,Y,Z).
ccp18(X,Y,Z):-ccp19(X,Y,Z).
ccp18(X,Y,Z).
ccp19(X,Y,Z):-ccp20(X,Y,Z).
ccp19(X,Y,Z).

ccp20(X,Y,Z).
ccp20(X,Y,Z).

ccp1:-ccp2.
ccp1.
ccp2:-ccp3.
ccp2.
ccp3:-ccp4.
ccp3.
ccp4:-ccp5.
ccp4.
ccp5:-ccp6.
ccp5.
ccp6:-ccp7.
ccp6.
ccp7:-ccp8.
ccp7.
ccp8:-ccp9.
ccp8.
ccp9:-ccp10.
ccp9.
ccp10:-ccp11.
ccp10.
ccp11:-ccp12.
ccp11.
ccp12:-ccp13.
ccp12.
ccp13:-ccp14.
ccp13.
ccp14:-ccp15.
ccp14.
ccp15:-ccp16.
ccp15.
ccp16:-ccp17.
ccp16.
ccp17:-ccp18.
ccp17.
ccp18:-ccp19.
ccp18.
ccp19:-ccp20.
ccp19.

ccp20.
ccp20.


/*  deep backtracking */
/*  The call to pd creates a choice point, and invokes a      */
/*  call to q. It will fail and there will be a backtracking  */
/*  step  to try the next clause defining pd. pd has 21       */
/*  clauses,thus failure                                      */
/*  occurs 20 times                                           */

pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_) :- q(X1,X2,a).
pd(X1,X2,_).

q(X1,X2,b).


/*   shallow backtracking */
/*   The ps predicate fails 20 times. The shallow backtracking   */
/*   will not restore all current state registers in Prolog      */
/*   systems which perform this optimisation, while others will. */

ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,X,X).
ps(_,_,_).

---------------------cut here - end of program listing-------------

rb@cci632.UUCP (07/12/86)

UNSW prolog doesn't have a cputime function.  It would sure
be nice if it did :-).

Would it be possible to get some benchmarks that ran on it?
Would it be possible to get cputime on UNSW?

Is this supposed to be a standard feature?
About the best UNSW seems to be able to do is get the "date"
via a shell command, hardly suitable for benchmarking :-).