vanroy@pisces.uucp (06/13/90)
Benchmark timings: Benchmark Prolog C (no opt.) C (opt. 4) tak(24,16,8) 1.2 2.1 1.6 All three timings are in user seconds (with 'time') measured on a 25 MHz MIPS processor. The Prolog version is compiled with the Aquarius Prolog compiler under development at Berkeley. The C versions are compiled with the MIPS C compiler, with no optimization and full optimization (level 4). Disclaimer: this benchmark has a particularly easy translation to C. The result does not necessarily hold for other programs. But don't you believe it anymore that Prolog is slow! Peter Van Roy ------------------------------------------------------------------------------- /* C version of tak benchmark */ #include <stdio.h> int tak(x,y,z) int x, y, z; { int a1, a2, a3; if (x <= y) return z; a1 = tak(x-1,y,z); a2 = tak(y-1,z,x); a3 = tak(z-1,x,y); return tak(a1,a2,a3); } main() { printf("%d\n", tak(24, 16, 8)); } ------------------------------------------------------------------------------- /* Prolog version of tak benchmark */ main :- tak(24,16,8,X), write(X), nl. tak(X,Y,Z,A) :- X =< Y, Z = A. tak(X,Y,Z,A) :- X > Y, X1 is X - 1, tak(X1,Y,Z,A1), Y1 is Y - 1, tak(Y1,Z,X,A2), Z1 is Z - 1, tak(Z1,X,Y,A3), tak(A1,A2,A3,A). -------------------------------------------------------------------------------
christer@cs.umu.se (Christer Ericson) (06/13/90)
In article <36986@ucbvax.BERKELEY.EDU> vanroy@pisces.uucp () writes: > >[Stuff about some particular Prolog implementation being faster that some >other particular C implementation deleted] > >Disclaimer: this benchmark has a particularly easy translation to C. >The result does not necessarily hold for other programs. But don't >you believe it anymore that Prolog is slow! > This is silly! Haven't you learned yet that no language is faster/slower than another language - it's the implementation that's faster/slower than that other implementation. The worst thing I know is people saying things like C is faster than Pascal, because it isn't. Most C compilers generate faster code than most Pascal compilers, true, but that's only because C heavily depends upon (read allows) the programmer to optimize the code while the Pascal compiler has to figure it all out by itself. > Peter Van Roy /Christer | Christer Ericson Internet: christer@cs.umu.se [130.239.1.101] | | Department of Computer Science, University of Umea, S-90187 UMEA, Sweden | | "I bully sheep. I claim God doesn't exist..." |
richard@aiai.ed.ac.uk (Richard Tobin) (06/14/90)
>This is silly! Haven't you learned yet that no language is faster/slower than >another language - it's the implementation that's faster/slower than that >other implementation. I think Peter knows all about that. There is a widespread belief that languages like prolog can never be as fast as languages like C (especially for numerical applications); the figures quoted are a counterexample to that. I would be interested to know if there are any Lisp compilers that can produce comparable or better results. -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
hawley@icot32.icot.or.jp (David John Hawley) (06/14/90)
In article <1990Jun13.100341.7148@cs.umu.se> christer@cs.umu.se (Christer Ericson) writes: >This is silly! Haven't you learned yet that no language is faster/slower than >another language - it's the implementation that's faster/slower than that Well, of course the point is that Prolog is a much higher-level language than C type languages, but we don't want to pay for all that expressiveness all the time. I.e. the speed of our programs should relate to the algorithms, not to the languages in which they are expressed. So Prolog needs GREAT compilers and implementations, and THEN we can write SHORT, SIMPLE, and FAST programs. Well, I tried the given benchmark on my SUN/?: C version - 8.7u SICstus Prolog(compiled) - mem ovflw after > 1 min of real time So how did he do it? Inquiring minds (and inquisitive noses) want to know. --------------------------- David Hawley, ICOT, 4th Lab csnet: hawley%icot.jp@relay.cs.net uucp:{enea,inria,mit-eddie,ukc}!icot!hawley ICOT, 1-4-28 Mita, Minato-ku, Tokyo 108 JAPAN. TEL/FAX {81-3-456-}2514/1618
jwmills@iuvax.cs.indiana.edu (Jonathan Mills) (06/14/90)
From Peter Vanroy: >Benchmark timings: > >Benchmark Prolog C (no opt.) C (opt. 4) > >tak(24,16,8) 1.2 2.1 1.6 >fib(30) 1.5 2.0 1.6 > >All three timings are in user seconds (with 'time') measured on a 25 >MHz MIPS processor. The Prolog version is compiled with the >Aquarius Prolog compiler under development at Berkeley. The C >versions are compiled with the MIPS C compiler, with no optimization >and full optimization (level 4). > >Because older implementations of Prolog have been relatively slow, >it is sometimes believed that this is inherent in the language. The >purpose of my original message was to dispel that notion by >providing measurements of an optimizing Prolog compiler. These are encouraging benchmarks, but should be generalized cautiously. First, to place the performance of the Aquarius Prolog compiler in perspective, the times for the call: tak(18,12,6) should be posted. This would allow the direct comparison of the times with those reported by by Richard P. Gabriel in "Performance and Evaluation of Lisp Systems", MIT Press, 1974. Comparing Prolog and Lisp implementations is not unreasonable because of the similarities between implementation techniques, particularly for simple programs. From Richard O'Keefe: >This particular benchmark result tells us precisely two things: >(1) _one_ Prolog system on that machine has cheaper procedure calls > than _one_ C compiler. Good work! But not very surprising > given that C compiler writers tend to worry more about other > things. > >(2) that particular Prolog system handles integer arithmetic better > than a lot of other Prolog systems. Gabriel reports that the "significant Lisp-level operations" for tak(18,12,6) are composed of: Calls to TAK 63,609 1-'s (decrements) 47,706 Since the Prolog code listed by Vanroy is a direct translation of the TAK benchmark, these numbers and their significance _probably_ apply. The prevalence of function calls and decrements in this benchmark limits its general significance. The fib/1 benchmark is similarly likely to be a limited measure of general performance since it is also composed primarily of function calls (to FIB) and decrements (N - 1 and N - 2). From Richard O'Keefe: >It doesn't tell us how C and Prolog would compare on realistic >applications, and there the pertinent question tends to be how easy >it is to handle a good data structure. I've an example where I >speeded a program up quite substantially by translating from Fortran >to Prolog, the speedup really came from using a much better data >structure, but it was much easier to use that data structure in >Prolog. Gabriel has other benchmarks that are better suited to evaluate the general performance of the Aquarius Compiler. Two that appear well suited are the BOYER and BROWSE benchmarks. BOYER is a tiny theorem prover intended to give plausible estimates of the performance of the Boyer-Moore theorem prover on various Lisp systems. BROWSE is a compendium of operations that an expert system might use. Both of these benchmarks could be translated to advantage in Prolog, since they have a substantial amount of list manipulation and pattern matching. Ross Overbeek has a theorem prover whose performance in a variety of Prologs was compared to C at a benchmarker's workshop held at The Aerospace Corporation in 1988. Carl Kesselman sponsored the workshop, and may be able to provide source for a test. From Richard Tobin: >I would be interested to know if there are any Lisp compilers that >can produce comparable or better results. Gabriel has no results for a 25MHz MIPS machine (wasn't available), but does list some other interesting results for TAK: IMPLEMENTATION CPU RAW TIME (sec) -------------- --------- -------------- Portable Standard Lisp Cray-1 0.044 Symbolics 3600+IFU 0.43 Assembler 68000 0.7 Franz Lisp VAX 780 1.09 Common Lisp VAX 785 1.18 Portable Standard Lisp Sun/2 (68010) 1.44 Common Lisp VAX 780 1.83 C 68000 1.9 Franz Lisp 68000 2.37 INTERLISP Dolphin 11.12 It would also be interesting to compare the performance of commercial Prolog compilers (ALS, BIM, Quintus) on various machines to see how they handle TAK. While older Prolog implementations were slow, newer implementations have moved to native code, and impressive speeds. A recent claim for ALS Prolog (using that hoary chestnut, NREV, sorry :-) gave in excess of one megaLIPS on the 88000, and 100+ KLIPS on a Mac IIfx. Perhaps they would be willing to key in TAK and post the timings. In closing, designing and interpreting benchmarks offer tempting targets for implementors and their critics. Gabriel's book suggests some strategies to follow (and avoid); there are many other articles out there also (Jack Dongarra has published advice for designers of benchmarks for supercomputers, for example). Vanroy's results are encouraging, and (hopefully) will lead to even better comparisons. Let's support his work: I'd like to suggest that anyone with additional "points" on the graph post their results to the net. This would help Prolog implementors today just as Gabriel's work goaded Lisp implementors in the 1980's.
kdq@demott.COM (Kevin D. Quitt) (06/15/90)
In article <47658@iuvax.cs.indiana.edu> jwmills@iuvax.cs.indiana.edu (Jonathan Mills) writes: >but does list some other interesting results for TAK: > >IMPLEMENTATION CPU RAW TIME (sec) >-------------- --------- -------------- >Portable Standard Lisp Cray-1 0.044 >Symbolics 3600+IFU 0.43 >Assembler 68000 0.7 >Franz Lisp VAX 780 1.09 >Common Lisp VAX 785 1.18 >Portable Standard Lisp Sun/2 (68010) 1.44 >Common Lisp VAX 780 1.83 >C 68000 1.9 >Franz Lisp 68000 2.37 >INTERLISP Dolphin 11.12 > Just for fun, on a Motorola Delta 3608 (25MHz 68030), cc: gnucc: 0.6 0.15 cc tak.c 0.3 0.3 cc -O tak.c and on a TwinHead 386/25, Microsoft C 6.0 0.28 /Ox /Gs 0.22 /Ox /Gsr -- _ Kevin D. Quitt demott!kdq kdq@demott.com DeMott Electronics Co. 14707 Keswick St. Van Nuys, CA 91405-1266 VOICE (818) 988-4975 FAX (818) 997-1190 MODEM (818) 997-4496 PEP last 96.37% of all statistics are made up.
dan@sics.se (Dan Sahlin) (06/15/90)
David Hawley got the following result when trying tak(24,16,8,X) on SICStus: > SICstus Prolog(compiled) - mem ovflw after > 1 min of real time SICStus Prolog runs out of memory because a lot of choicepoints are created and a lot of trailing is done. If a cut is put into the first clause of "tak" no choicepoints will be created we will not run out of memory. Presumably Van Roy's compiler has detected that this cut may be safely inserted. Running tak(24,16,8,X) using SICStus Prolog on a SPARCstation 1 having compiled the program into emulated code then takes about 77 seconds. We also compiled the program to native code for SPARC (this version of SICStus has not yet been released) and it took about 25 seconds. As the SPARCstation has about half the number of MIPS as Van Roy's machine, we still have a factor of ten of execution performance improvements to get down to his figure of just 1.2 seconds. >So how did he do it? Inquiring minds (and inquisitive noses) want to know. Yes, please tell us how it was done! Dan Sahlin email: dan@sics.se
mmh@cs.qmw.ac.uk (Matthew Huntbach) (06/16/90)
What is the point of using Prolog if you don't backtrack? All you have is some syntactic sugar for a subset of functional languages (i.e. instead of fun f(pat1) = f(g(h(p))) | f(pat2) = []; you have f(pat1,q) :- h(p,p1),g(p1,p2),f(p2,q). f(pat2,[]) instead of fun f(x) = if a(x) then b(x) else c(x) you have f(x,y) :- a(x),!,b(x,y). f(x,y) :- c(x,y). ). The comparisons that are being made seem to be for non-backtracking problems. If backtracking is important, then comparisons should be made with examples where Prolog backtracks. If it isn't, all we are saying is that Prolog has a nicer sugar coating than its rivals, but this sugar coating doesn't (as you might suppose) hide any nasty-tasting inefficiency. Matthew Huntbach
vanroy@pisces.uucp (06/16/90)
Several people have asked me to post the code and/or explain what kind of optimizations are done. Aw shucks, don't you know that a magician doesn't like to give away his secrets? I will answer the second question first. The tak benchmark is able to take advantage of several optimization techniques: (1) Deterministic predicates. The compiler is able to recognize determinacy in the program. The Prolog code given for tak uses a tremendous amount of memory on systems like Quintus or Sicstus Prolog because they will create a choice point for each call. A cut is needed to make it run for them. (2) Simplifying variable binding. The compiler recognizes that the variables used to bind the output of tak can be implemented in a simpler way than the default method of creating a memory cell pointing to itself, passing it into the routine, and binding it there. Tak does recursive calculations on integers (i.e. objects that fit in a register), so the compiler is able to output the results directly in registers. This is a general technique which is also used to implement integer arithmetic. (3) Last call optimization. The last call to tak is compiled as a jump, i.e. recursion is converted to iteration. This is not done by the C compiler, and measurements with the pixie utility on the MIPS show that this explains a large part of the speed difference. This optimization is essential for Prolog because the only way to denote iteration in Prolog is by using recursion. C has other constructs to denote iteration (i.e. "for" and "while" loops) so it does not need last call optimization as badly as Prolog does. (4) Dataflow analysis. The compiler incorporates a global dataflow analysis module to do type inferencing. For tak, the results of analysis are crucial for optimization (2) above, and they also make it unnecessary to do type checking, dereferencing, and trailing. To do analysis the programmer has to declare only the types of the entry points. For tak this is a single directive: :- entry((main:-true)). No other declarations are used. For those who are interested, the paper "The Benefits of Global Flow Analysis for an Optimizing Prolog Compiler", to appear in the 1990 North American Conference on Logic Programming, describes techniques (2) and (4) in detail and gives more numbers (for the BAM processor). The compiler has only recently been ported to the MIPS and the run-time system is not yet completely functional there. I plan to extend the comparison to other programs (such as Overbeek's theorem prover). The BAM version is completely functional, but the BAM processor does not have a C compiler. The Aquarius compiler does not do memoization. However, there is no reason a smart (*very* smart) compiler couldn't do it for tak. The C compiler could do it as well. This is an orthogonal issue to the performance comparison. By the way, the execution time for tak(18,12,6), given by Gabriel in his study of Lisp performance, is 31 ms for Aquarius Prolog on a 25 MHz MIPS processor. Sincerely, Peter Van Roy ------------------------------------------------------------------------------- In answer to the first question, here is the MIPS assembly code generated by the Aquarius compiler for the tak benchmark: (I've left out the code for the run-time system.) .data .align 3 $32: .word 7 .word atm_string_0+1 atm_string_0: .word 1463 .word 1f+1 1: .word 1495 .word 3 .text .align 2 # procedure(tak/0) .globl tak_0 .ent tak_0 tak_0: $33: .set noreorder sltu $14, $17, $16 bne $14, 0, 1f move $14, $16 move $16, $17 # label(1) 1: addu $16, 8 sw $14, -8($16) sw $31, -4($16) .set reorder li $2, 295 li $3, 199 li $4, 103 jal $34 move $2, $5 jal $35 lw $31, -4($16) lw $16, -8($16) j $36 .end tak_0 # procedure(tak/4) .globl tak_4 .ent tak_4 tak_4: $34: bgt $2, $3, $37 move $5, $4 j $31 # label('$b_tak/4_2'/4) $37: .set noreorder sltu $14, $17, $16 bne $14, 0, 1f move $14, $16 move $16, $17 # label(1) 1: addu $16, 32 sw $14, -8($16) sw $31, -4($16) .set reorder sw $2, -24($16) sw $3, -20($16) sw $4, -28($16) addu $2, $2, -16 jal $34 sw $5, -16($16) lw $14, -20($16) addu $2, $14, -16 lw $3, -28($16) lw $4, -24($16) jal $34 sw $5, -12($16) lw $14, -28($16) addu $2, $14, -16 lw $3, -24($16) lw $4, -20($16) jal $34 lw $2, -16($16) lw $3, -12($16) move $4, $5 lw $31, -4($16) lw $16, -8($16) j $34 .end tak_4 -------------------------------------------------------------------------------
dankg@volcano.Berkeley.EDU (Dan KoGai) (06/16/90)
In article <36986@ucbvax.BERKELEY.EDU> vanroy@pisces.uucp () writes: > [benchmark omitted] > >/* C version of tak benchmark */ > >#include <stdio.h> > >int tak(x,y,z) >int x, y, z; >{ > int a1, a2, a3; > if (x <= y) return z; > a1 = tak(x-1,y,z); > a2 = tak(y-1,z,x); > a3 = tak(z-1,x,y); > return tak(a1,a2,a3); >} But this C code is hardly optimum. 1st of all you don't need a1, a2, a3 to waste space and time. try the following. int tak(int x, int y, int z) { return (x <= y) ? z : tak(tak(--x, y, z),tak(--y, z, x),tak(--z, x, y)) } > [Prolog version omitted] And it's also unfair to compare 2 versions without posting the assembly codes both versions made. Both C and Prolog allow programmers to write various codes doing the same thing. ---------------- ____ __ __ + Dan The Walking C Obfscunator ||__||__| + E-mail: dankg@ocf.berkeley.edu ____| ______ + Voice: +1 415-549-6111 | |__|__| + USnail: 1730 Laloma Berkeley, CA 94709 U.S.A |___ |__|__| + |____|____ + "Unix is not insecure. It's people who are" \_| | + "Unix doesn't have pain. It's people who do"
vanroy@pisces (Peter Van Roy) (06/16/90)
In article <1990Jun15.205242.16314@agate.berkeley.edu> dankg@volcano.Berkeley.EDU (Dan KoGai) writes: > But this C code is hardly optimum. 1st of all you don't need >a1, a2, a3 to waste space and time. try the following. > >int tak(int x, int y, int z) >{ > return (x <= y) ? z : tak(tak(--x, y, z),tak(--y, z, x),tak(--z, x, y)) > >} > >> [Prolog version omitted] > >____ __ __ + Dan The Walking C Obfscunator Did you try to run this program? It behaves very strangely: (after adding a semicolon to the return statement) mammoth:~/Peter <20> cc -g taktak.c mammoth:~/Peter <21> dbx a.out dbx version 2.0 Copyright 1988 MIPS Computer Systems Inc. Type 'help' for help. reading symbolic information ... main: 15 printf("%d\n", tak(24, 16, 8)); (dbx) run ^C <- interrupted after ~10 seconds Interrupt [tak:10 +0x28,0x400220] tak(--z, x, y)); (dbx) print x,y,z -44896 -44897 -44898 <- strange values From page 186 of K&R: (note that predecrement does a side-effect) [Regarding function evaluation] The order of evaluation of arguments is undefined by the language; take note that the various compilers differ. By the way, I did try a different version of tak.c without the temporary variables a1, a2, and a3. It ran *slower* unoptimized and at exactly the same speed optimized. Please think before you post. Sincerely, Peter Van Roy
grover@brahmand.Eng.Sun.COM (Vinod Grover) (06/16/90)
In article <1990Jun15.205242.16314@agate.berkeley.edu> dankg@volcano.Berkeley.EDU (Dan KoGai) writes: >In article <36986@ucbvax.BERKELEY.EDU> vanroy@pisces.uucp () writes: >a1, a2, a3 to waste space and time. try the following. > >int tak(int x, int y, int z) >{ > return (x <= y) ? z : tak(tak(--x, y, z),tak(--y, z, x),tak(--z, x, y)) >} This program has a missing semicolon, and runs into an infinite loop. C does not require the side-effect in --x to happen immediately. You must force the subtraction. Furthermore, eliminating explicit temp variables does not mean that the compiler won't put them back in. However, declaring those explicit temps to be register variables *may* improve the program, but is not guarenteed to do so. Vinod Grover