jclaude@ecrcvax.UUCP (Jean Claude Syre) (07/02/86)
Expires: References: Sender: Reply-To: jclaude@ecrcvax.UUCP (Jean Claude Syre) Followup-To: Distribution: Organization: European Computer-Industry Research Centre, Munchen, W. Germany Keywords: long file *********************************************** *** BENCHMARK PROGRAMS FOR PROLOG SYSTEMS *** *** (FINAL VERSION) *** *********************************************** Part 1 (of 3) J.C. SYRE ECRC (European Computer-industry Research Center) Arabellastr. 17 D-8000 MUNICH 81 WEST GERMANY mcvax!unido!ecrcvax!jclaude jclaude%ecrcvax.UUCP@Germany.CSNET This set of benchmark programs is a collective work done by the Logic Programming Group and the Computer Architecture Group of ECRC, the European Computer-Industry Research Centre, in Munich, AS WELL AS BY OTHER PEOPLE HAVING RESPONDED TO OUR FIRST PROPOSAL SOME TIME AGO. Many thanks to all people who have helped us improving this benchmark. For convenience, you can send messages to Hans Benker (replace "jclaude" by "hans" in this net address), or myself. The designers of the benchmark programs are: H. Benker, J. Noye, Micha Meier, S. Schmitz, J.C. Syre, and ****many others**** from ECRC and from other places in the world who brought many useful comments. The first section deals with simple programs whose single purpose is to evaluate a single feature of prolog execution. The times we give correspond to an interpretation using Cprolog on a VAX11/785 under UNIX BSD4.2, in a "quiet" environment (which may be obtained on a sunny Sunday with all other logins prohibited). They are subject to a 10 percent inaccuracy due to paging and timecounts. The second section presents more complex programs, which may still run on small Prolog systems. We are open to suggestions to improve the significance of the benchmark. The programs include many of the programs taken by the University of Berkeley for their evaluation of the PLM1 prolog machine. There should be a third section, that we have not fully built yet, and we count on you to build it. Those programs should be representative of real scale prototypes of large Prolog applications. CHAT80 is an example of such programs for this section. If you feel embarassed to propagate one of your programs having properties that you are reluctant to make public, you may think of modifying it or truncate it so that it becomes hardly readable, or useless for actual use by others. THIS IS OUR FINAL VERSION. We have taken into account a lot of remarks made by those of you who responded to our proposal of Prolog benchmark made two months ago. Not all of your suggestions have been included (mainly to keep the benchmark within a reasonable size), but many of them. We are expecting a large number of answers from you. We can also send the benchmark by normal mail if necessary. IN ORDER TO KEEP THE POSSIBILITY OF COMPARING RESULTS FROM VARIOUS SOURCES AND SYSTEMS, YOU ARE KINDLY ASKED NOT TO CHANGE THE SOURCE PROGRAMS. SEND YOUR RESULTS TO OUR ADDRESS. WE WILL MAKE A COMPILATION OF THEM, AND SEND THEM TO THE NET, UNLESS YOU DO NOT WISH IT (IN THAT CASE, PLEASE MENTION IT EXPLICITELY). Have fun ! 1. Simple Benchmark programs. These simple (or simplistic) programs aim at evaluating a single feature of the Prolog System. Here a Prolog System is understood as either a pair <Prolog software, Host machine>, where the Prolog software is an interpreter, a compiler, a combination of both, and the Host machine is a conventional machine (with its operating system and workload), a simulator of a prolog processor, or a real piece of Prolog hardware (direct interpreter, or PLM processor, or anything else). The "single feature" mentioned above means that the performance results will show how well the Prolog System can handle a particular characteristic of the language. The phenomena we measure are: o calls (section 1.1) program: boresea(N)) o non-determinism (section 1.2) programs: choice_point(N), choice_point0ar(N), baktrak1(N), baktrak2(N) o handling of environments (section 1.3) programs: envir(N), envir0ar(N) o indexing (section 1.4) program: index(N) o unification (section 1.5) programs: construct_list(N), match_list(N), construct_structure(N), match_structure(N), match_nested_structure(N), general_unification(N) o dereferencing (section 1.6) program: deref(N) o cut (section 1.7) program: cuttest(N) There are many more which would be interesting to measure, e.g. efficiency of built-ins, "assert" and "retract", I/O and tail recursion optimisation. However for now the above 7 criteria seemed to be the most interesting and maybe somebody on the net can design benchmarks for the remaining features. Measuring a single feature of a language is difficult. One single execution of a tiny program testing a particular feature takes not enough time to measure it precisely. To get a better precision one has to execute the test program hundreds of times. There are two ways to do this: Write down the test program as often as one wants to execute it or include it in a loop. The first solution implies that one has to write programs with hundreds of lines of code, where each line does the same job. This is not convenient and it is desirable to use loops. In the case of our benchmark programs however, the time spent executing the loop is not negligible, due to the very small size of our test programs. Therefore we used a combination of both methods, i.e. sequences of repeated code surrounded with a loop. In order to minimise the effect of the loop, we actually run as well an "empty" loop, without the benchmark program. We call this "compensation loop" and subtract its execution time from the execution time of the loop including the benchmark program. This increases of course the relative error on the time measurement, but we have decreased the influence of the unavoidable loop. The repeated code can be generated by your favorite editor. How much repeated code you need to get a sufficient precision of course depends on the implementation of your particular Prolog system. However we put as much repeated code into each benchmark program as we think is apropriate to most Prolog implementations. So we think you should get sufficient precision without modifying our programs in that respect. The listings of the programs follow below. For each simple program we try to give the characteristics of it and some remarks about what it measures. Note that "cputime" in C-Prolog on the VAX gives you the possibility to measure runtime. This may be different in other Prolog systems. All the rest of the programs should be portable to any other system without any problem. 1.1. Program to test calls (boresea). This is the one you always dreamed to test! Like all benchmarks, it uses a loop calling the actual benchmark program. The benchmark program consists of a sequence of 200 predicates having no arguments, no choice points, NOTHING. 200 is chosen to have sufficient accuracy in measuring the execution time. The results show the effect of pure calls, and the Klips performance can be called the peak performance of the prolog system. Note that the peak performance has very little significance to classify the overall performance of a Prolog system. ---------------- cut here - beginning of program listing ---------- /* This program is called with the query "?-boresea(X)." */ /* X is the number of loop iterations executed. It should be big */ /* enough to give significant results. */ /* suggested value for X: 100 for interpreted code*/ /* 1000 for compiled code */ /* average values for C-prolog interpreter: */ /* X=1000, Tloop=27.1 T.comp=1.0 Tnet=26.1 Klips=7.7 */ boresea(X) :- T1 is cputime, do_max_KLips(X), /* calls the loop to execute the */ T2 is cputime, /* sequence of 200 predicates */ compens_loop(X), /* compensation loop */ T3 is cputime, print_times(T1,T2,T3,X,200)./*compute and print results */ compens_loop(0). /* compensation loop */ compens_loop(X) :- Y is X - 1, compens_loop(Y). print_times(T1,T2,T3,X,I) :- /* prints the results */ TT1 is T2 - T1, TT2 is T3 - T2, TT is TT1 - TT2, write('T overall loop: '),write(TT1), nl, write('T compens loop: '),write(TT2), nl, write('T net: '),write(TT),nl, write('KLips: '), Li is I * X, Lips is Li / TT, KLips is Lips / 1000, write(KLips),nl,nl. do_max_KLips(0). /* loop calling the actual benchmark */ do_max_KLips(X) :- lips1, Y is X - 1, do_max_KLips(Y). /* predicates to test call */ lips1 :- lips2. lips2 :- lips3. lips3 :- lips4. lips4 :- lips5. lips5 :- lips6. lips6 :- lips7. lips7 :- lips8. lips8 :- lips9. lips9 :- lips10. lips10 :- lips11. lips11 :- lips12. lips12 :- lips13. lips13 :- lips14. lips14 :- lips15. lips15 :- lips16. lips16 :- lips17. lips17 :- lips18. lips18 :- lips19. lips19 :- lips20. lips20 :- lips21. lips21 :- lips22. lips22 :- lips23. lips23 :- lips24. lips24 :- lips25. lips25 :- lips26. lips26 :- lips27. lips27 :- lips28. lips28 :- lips29. lips29 :- lips30. lips30 :- lips31. lips31 :- lips32. lips32 :- lips33. lips33 :- lips34. lips34 :- lips35. lips35 :- lips36. lips36 :- lips37. lips37 :- lips38. lips38 :- lips39. lips39 :- lips40. lips40 :- lips41. lips41 :- lips42. lips42 :- lips43. lips43 :- lips44. lips44 :- lips45. lips45 :- lips46. lips46 :- lips47. lips47 :- lips48. lips48 :- lips49. lips49 :- lips50. lips50 :- lips51. lips51 :- lips52. lips52 :- lips53. lips53 :- lips54. lips54 :- lips55. lips55 :- lips56. lips56 :- lips57. lips57 :- lips58. lips58 :- lips59. lips59 :- lips60. lips60 :- lips61. lips61 :- lips62. lips62 :- lips63. lips63 :- lips64. lips64 :- lips65. lips65 :- lips66. lips66 :- lips67. lips67 :- lips68. lips68 :- lips69. lips69 :- lips70. lips70 :- lips71. lips71 :- lips72. lips72 :- lips73. lips73 :- lips74. lips74 :- lips75. lips75 :- lips76. lips76 :- lips77. lips77 :- lips78. lips78 :- lips79. lips79 :- lips80. lips80 :- lips81. lips81 :- lips82. lips82 :- lips83. lips83 :- lips84. lips84 :- lips85. lips85 :- lips86. lips86 :- lips87. lips87 :- lips88. lips88 :- lips89. lips89 :- lips90. lips90 :- lips91. lips91 :- lips92. lips92 :- lips93. lips93 :- lips94. lips94 :- lips95. lips95 :- lips96. lips96 :- lips97. lips97 :- lips98. lips98 :- lips99. lips99 :- lips100. lips100:- lips101. lips101 :- lips102. lips102 :- lips103. lips103 :- lips104. lips104 :- lips105. lips105 :- lips106. lips106 :- lips107. lips107 :- lips108. lips108 :- lips109. lips109 :- lips110. lips110 :- lips111. lips111 :- lips112. lips112 :- lips113. lips113 :- lips114. lips114 :- lips115. lips115 :- lips116. lips116 :- lips117. lips117 :- lips118. lips118 :- lips119. lips119 :- lips120. lips120 :- lips121. lips121 :- lips122. lips122 :- lips123. lips123 :- lips124. lips124 :- lips125. lips125 :- lips126. lips126 :- lips127. lips127 :- lips128. lips128 :- lips129. lips129 :- lips130. lips130 :- lips131. lips131 :- lips132. lips132 :- lips133. lips133 :- lips134. lips134 :- lips135. lips135 :- lips136. lips136 :- lips137. lips137 :- lips138. lips138 :- lips139. lips139 :- lips140. lips140 :- lips141. lips141 :- lips142. lips142 :- lips143. lips143 :- lips144. lips144 :- lips145. lips145 :- lips146. lips146 :- lips147. lips147 :- lips148. lips148 :- lips149. lips149 :- lips150. lips150 :- lips151. lips151 :- lips152. lips152 :- lips153. lips153 :- lips154. lips154 :- lips155. lips155 :- lips156. lips156 :- lips157. lips157 :- lips158. lips158 :- lips159. lips159 :- lips160. lips160 :- lips161. lips161 :- lips162. lips162 :- lips163. lips163 :- lips164. lips164 :- lips165. lips165 :- lips166. lips166 :- lips167. lips167 :- lips168. lips168 :- lips169. lips169 :- lips170. lips170 :- lips171. lips171 :- lips172. lips172 :- lips173. lips173 :- lips174. lips174 :- lips175. lips175 :- lips176. lips176 :- lips177. lips177 :- lips178. lips178 :- lips179. lips179 :- lips180. lips180 :- lips181. lips181 :- lips182. lips182 :- lips183. lips183 :- lips184. lips184 :- lips185. lips185 :- lips186. lips186 :- lips187. lips187 :- lips188. lips188 :- lips189. lips189 :- lips190. lips190 :- lips191. lips191 :- lips192. lips192 :- lips193. lips193 :- lips194. lips194 :- lips195. lips195 :- lips196. lips196 :- lips197. lips197 :- lips198. lips198 :- lips199. lips199 :- lips200. lips200. --------------------cut here - end of program listing------------ 1.2. Program to test non deterministic behaviour This program contains a series of 3 different benchmark predicates. The predicate "choice_point(N)" tests calls invoking the creation of a choice point, i.e. a branch point where the execution will possibly come back to in case of backtracking. It does NOT backtrack. Two versions are proposed, one with and the other without arguments. We then present two predicates to evaluate the mechanism of backtracking during execution. Both predicates create one choice_point and then backtrack 20 times on every loop iteration step. "baktrak1(N)" exhibits a kind of backtracking called "deep", while "baktrak2(N)" deals with "shallow" backtracking. Both are worth being tried, whatever your particular Prolog System is. ------------cut here - beginning of program listing---------------- /* The predicates are called: */ /* o "choice_point(N)" - creation of choice points */ /* o "choice_point0ar(N) - same, with 0 arg */ /* o "baktrak1(N)" - deep backtracking */ /* o "baktrak2(N)" - shallow backtracking */ /* N is the number of loop iterations executed */ /* predicate to test creation of choice points without backtracking */ /* suggested value for N: 1000 */ /* results for Cprolog N=1000 */ /* Tloop=5.95 Tcompens=0.98 Tnet=4.97 Klips=4.02 */ choice_point(N):-T1 is cputime, cre_CP(N), T2 is cputime, compens_loop(N), T3 is cputime, print_times(T1,T2,T3,N,20). /* predicate choice_point, but with zero argument */ /* suggested value for N: 1000 */ /* results for Cprolog: N=1000 */ /* Tloop=3.55 Tcompens=0.98 Tnet=2.57 Klips=7.7 */ choice_point0ar(N):-T1 is cputime, cre_CP0ar(N), T2 is cputime, compens_loop(N), T3 is cputime, print_times(T1,T2,T3,N,20). /* Predicate to test the (deep) backtracking mechanism. */ /* suggested value for N: 1000 (interp), 2000(comp) */ /* results for Cprolog: N=1000 */ /* Tloop=9.63 Tcomp=1 Tnet=8.63 Klips=2.32 */ baktrak1(N) :- T1 is cputime, deep_back(N), T2 is cputime, compens_loop(N), T3 is cputime, print_times(T1,T2,T3,N,20). /* Predicate to test the (shallow) backtracking mechanism */ /* suggested value for N: 1000 (interp), 2000 (comp) */ /* results for Cprolog: N=1000 */ /* Tloop=3.63 Tcomp=0.95 Tnet=2.68 Klips=7.45 */ baktrak2(X) :- T1 is cputime, shallow_back(X), T2 is cputime, compens_loop(X), T3 is cputime, print_times(T1,T2,T3,X,20). /* compensation loop, used to measure the time spent in the loop */ compens_loop(0). compens_loop(X) :- Y is X - 1, compens_loop(Y). /* loop to test choice point creation */ cre_CP(0). cre_CP(N):-M is N-1, ccp1(0,0,0), cre_CP(M). cre_CP0ar(0). cre_CP0ar(N):-M is N-1, ccp1, cre_CP0ar(M). /* loop to test deep backtracking */ deep_back(0). deep_back(X) :- pd(_,_,_), Y is X - 1, deep_back(Y). /* loop to test shallow backtracking */ shallow_back(0). shallow_back(X) :- ps(_,a,b), Y is X - 1, shallow_back(Y). print_times(T1,T2,T3,X,I) :- /* prints the results */ TT1 is T2 - T1, TT2 is T3 - T2, TT is TT1 - TT2, write('T overall loop: '),write(TT1), nl, write('T compens loop: '),write(TT2), nl, write('T net: '),write(TT),nl, write('KLips: '), Li is I * X, Lips is Li / TT, KLips is Lips / 1000, write(KLips),nl,nl. /* ccp1 creates 20 choice points */ /* ccp1 is the beginning of a set of predicates */ /* composed of 2 clauses each. Every invokation of nd0 will create */ /* a sequence of 20 choice points. The body of the clauses are */ /* limited to one goal, thus avoiding a creation of environment */ /* when the clause is activated. nd0, and its successors, have */ /* three arguments to comply with our average static analysis */ /* results made on more than 30 real Prolog programs. */ /* ccpXX exists with 3 arguments, and 0 args. */ ccp1(X,Y,Z):-ccp2(X,Y,Z). ccp1(X,Y,Z). ccp2(X,Y,Z):-ccp3(X,Y,Z). ccp2(X,Y,Z). ccp3(X,Y,Z):-ccp4(X,Y,Z). ccp3(X,Y,Z). ccp4(X,Y,Z):-ccp5(X,Y,Z). ccp4(X,Y,Z). ccp5(X,Y,Z):-ccp6(X,Y,Z). ccp5(X,Y,Z). ccp6(X,Y,Z):-ccp7(X,Y,Z). ccp6(X,Y,Z). ccp7(X,Y,Z):-ccp8(X,Y,Z). ccp7(X,Y,Z). ccp8(X,Y,Z):-ccp9(X,Y,Z). ccp8(X,Y,Z). ccp9(X,Y,Z):-ccp10(X,Y,Z). ccp9(X,Y,Z). ccp10(X,Y,Z):-ccp11(X,Y,Z). ccp10(X,Y,Z). ccp11(X,Y,Z):-ccp12(X,Y,Z). ccp11(X,Y,Z). ccp12(X,Y,Z):-ccp13(X,Y,Z). ccp12(X,Y,Z). ccp13(X,Y,Z):-ccp14(X,Y,Z). ccp13(X,Y,Z). ccp14(X,Y,Z):-ccp15(X,Y,Z). ccp14(X,Y,Z). ccp15(X,Y,Z):-ccp16(X,Y,Z). ccp15(X,Y,Z). ccp16(X,Y,Z):-ccp17(X,Y,Z). ccp16(X,Y,Z). ccp17(X,Y,Z):-ccp18(X,Y,Z). ccp17(X,Y,Z). ccp18(X,Y,Z):-ccp19(X,Y,Z). ccp18(X,Y,Z). ccp19(X,Y,Z):-ccp20(X,Y,Z). ccp19(X,Y,Z). ccp20(X,Y,Z). ccp20(X,Y,Z). ccp1:-ccp2. ccp1. ccp2:-ccp3. ccp2. ccp3:-ccp4. ccp3. ccp4:-ccp5. ccp4. ccp5:-ccp6. ccp5. ccp6:-ccp7. ccp6. ccp7:-ccp8. ccp7. ccp8:-ccp9. ccp8. ccp9:-ccp10. ccp9. ccp10:-ccp11. ccp10. ccp11:-ccp12. ccp11. ccp12:-ccp13. ccp12. ccp13:-ccp14. ccp13. ccp14:-ccp15. ccp14. ccp15:-ccp16. ccp15. ccp16:-ccp17. ccp16. ccp17:-ccp18. ccp17. ccp18:-ccp19. ccp18. ccp19:-ccp20. ccp19. ccp20. ccp20. /* deep backtracking */ /* The call to pd creates a choice point, and invokes a */ /* call to q. It will fail and there will be a backtracking */ /* step to try the next clause defining pd. pd has 21 */ /* clauses,thus failure */ /* occurs 20 times */ pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_) :- q(X1,X2,a). pd(X1,X2,_). q(X1,X2,b). /* shallow backtracking */ /* The ps predicate fails 20 times. The shallow backtracking */ /* will not restore all current state registers in Prolog */ /* systems which perform this optimisation, while others will. */ ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,X,X). ps(_,_,_). ---------------------cut here - end of program listing-------------
rb@cci632.UUCP (07/12/86)
UNSW prolog doesn't have a cputime function. It would sure be nice if it did :-). Would it be possible to get some benchmarks that ran on it? Would it be possible to get cputime on UNSW? Is this supposed to be a standard feature? About the best UNSW seems to be able to do is get the "date" via a shell command, hardly suitable for benchmarking :-).