merlyn@iwarp.intel.com (Randal Schwartz) (06/13/90)
In article <36986@ucbvax.BERKELEY.EDU>, vanroy@pisces writes: | | Benchmark timings: | | Benchmark Prolog C (no opt.) C (opt. 4) | | tak(24,16,8) 1.2 2.1 1.6 Bogus. Comparing apples and oranges dude. The prolog version most likely saves results from tak(a,b,c) so that they can do a quick lookup the next time around. I jimmied up one in perl that does the same thing... $ cat tak #!/local/usr/bin/perl sub tak { $takcalled++; local($x,$y,$z) = @_; return $z if $x <= $y; if (defined $tak{$x,$y,$z}) { $takrecalled++; # print "recomputing $x $y $z\n"; return $tak{$x,$y,$z}; } local($a1,$a2,$a3); $a1 = &tak($x-1,$y+0,$z+0); $a2 = &tak($y-1,$z+0,$x+0); $a3 = &tak($z-1,$x+0,$y+0); return $tak{$x,$y,$z} = &tak($a1+0,$a2+0,$a3+0); } $start = (times)[0]; $result = &tak(24,16,8); $finish = (times)[0]; $seconds = $finish - $start; print "result $result, ", "called $takcalled, ", "recalled $takrecalled, ", "time $seconds secs\n"; $ ./tak result 9, called 941, recalled 194, time 0.91666666666666662966 secs Now, tell me if 0.917 seconds of user time isn't better than either C or prolog! (This is on a sparc..., so there's some more apples and oranges for ya :-) Notice how many *recalled* short-circuits I took. The C code would have had to have computed that out fully. But once prolog "knew" the fact of tak(11,14,17,96) (or whatever), it'd never compute it again. Stop comparing apples and oranges, unless you are willing to throw in a few Perls. :-) (Something with a little less smoke and mirrors is to compare timings on a factorial written with recursion. Trace it through slowly, and you'll see what I mean.) Just another Perl hacker, -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Welcome to Portland, Oregon, home of the California Raisins!"=/
debray@cs.arizona.edu (Saumya K. Debray) (06/13/90)
merlyn@iwarp.intel.com (Randal Schwartz) writes: > In article <36986@ucbvax.BERKELEY.EDU>, vanroy@pisces writes: > | > | Benchmark timings: > | > | Benchmark Prolog C (no opt.) C (opt. 4) > | > | tak(24,16,8) 1.2 2.1 1.6 > > Bogus. > > Comparing apples and oranges dude. > > The prolog version most likely saves results from tak(a,b,c) so that > they can do a quick lookup the next time around. Look -- a program either does something, or it doesn't [*]. To say that it "most likely" does something is basically the same as saying "I {didn't bother to | couldn't} understand the code, but here's a wild guess", except that it's not quite as honest. For the benchmark in question, Van Roy posted both the C and the Prolog sources, so it's easy to verify whether or not there's any table lookups going on. There isn't. [*] Unless, I suppose, we're talking about randomization or true nondeterminism. In this case, we aren't. -- Saumya Debray CS Department, University of Arizona, Tucson internet: debray@cs.arizona.edu uucp: uunet!arizona!debray
vanroy@pisces (Peter Van Roy) (06/14/90)
In article <1990Jun12.212525.25551@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes: > >Bogus. > >Comparing apples and oranges dude. > >The prolog version most likely saves results from tak(a,b,c) so that >they can do a quick lookup the next time around. I jimmied up one >in perl that does the same thing... > >Just another Perl hacker, >/=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ Please do not jump to conclusions. The Prolog implementation does *NOT* save intermediate results from tak(a,b,c). It is doing *ALL* recursive calls. (I wish the compiler were intelligent enough to do what you say it does :-). The speed is due to straightforward compiler optimization. By the way, the behavior of tak is not an isolated case. For example (times in user seconds, as before): Benchmark Prolog C (no opt.) C (opt. 4) fib(30) 1.5 2.0 1.6 This program is doing all recursive calls too. Because older implementations of Prolog have been relatively slow, it is sometimes believed that this is inherent in the language. The purpose of my original message was to dispel that notion by providing measurements of an optimizing Prolog compiler. Recent work in Prolog implementation has significantly improved performance. You might want to look at "Fast Prolog with an Extended General Purpose Architecture" by Bruce Holmer et al in the 17th International Symposium on Computer Architecture and "LIPS on a MIPS: Results from a Prolog Compiler for a RISC" by Andrew Taylor in the 1990 International Conference on Logic Programming. Sincerely, Peter Van Roy ------------------------------------------------------------------------------- /* C version of fib benchmark */ #include <stdio.h> int fib(x) int x; { if (x <= 1) return 1; return (fib(x-1)+fib(x-2)); } main() { printf("%d\n", fib(30)); } ------------------------------------------------------------------------------- /* Prolog version of fib benchmark */ main :- fib(30,N), write(N), nl. fib(N,F) :- N =< 1, F = 1. fib(N,F) :- N > 1, N1 is N - 1, fib(N1,F1), N2 is N - 2, fib(N2,F2), F is F1 + F2. -------------------------------------------------------------------------------
grover@brahmand.Eng.Sun.COM (Vinod Grover) (06/14/90)
In article <37011@ucbvax.BERKELEY.EDU> vanroy@pisces (Peter Van Roy) writes: >Please do not jump to conclusions. The Prolog implementation does *NOT* save >intermediate results from tak(a,b,c). It is doing *ALL* recursive calls. >(I wish the compiler were intelligent enough to do what you say it does :-). >The speed is due to straightforward compiler optimization. By the way, the Could you tell us what optimization is being performed that is lacking in C? Thanks Vinod Grover
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/14/90)
In article <1990Jun12.212525.25551@iwarp.intel.com>, merlyn@iwarp.intel.com (Randal Schwartz) writes: > In article <36986@ucbvax.BERKELEY.EDU>, vanroy@pisces writes: > | Benchmark Prolog C (no opt.) C (opt. 4) > | tak(24,16,8) 1.2 2.1 1.6 > Bogus. > The Prolog version most likely saves results from tak(a,b,c) so that > they can do a quick lookup the next time around. Now *that* claim is bogus. It was specifically stated that this was a *Prolog* implementation, which means that it *mustn't* cache results. (If van Roy had meant that something like extension tables was involved, no doubt he would have said so.) The explanation is much much simpler. Everything in Prolog involves procedure calls, so a good Prolog implementation has had a _lot_ of thought put into making procedure calls _really_ cheap. Procedure calls are important in C, but not important enough for VAX implementations to bypass the 'CALLS' instruction and use 'JSR'. This particular benchmark result tells us precisely two things: (1) _one_ Prolog system on that machine has cheaper procedure calls than _one_ C compiler. Good work! But not very surprising given that C compiler writers tend to worry more about other things. (2) that particular Prolog system handles integer arithmetic better than a lot of other Prolog systems. It doesn't tell us how C and Prolog would compare on realistic applications, and there the pertinent question tends to be how easy it is to handle a good data structure. I've an example where I speeded a program up quite substantially by translating from Fortran to Prolog, the speedup really came from using a much better data structure, but it was much easier to use that data structure in Prolog. With all due respect to Perl (The TECO of the 90s) this is one respect where Perl 1.x was not as good as it might have been (miles better than awk, though). -- "A 7th class of programs, correct in every way, is believed to exist by a few computer scientists. However, no example could be found to include here."
richard@aiai.ed.ac.uk (Richard Tobin) (06/14/90)
In article <1990Jun12.212525.25551@iwarp.intel.com> merlyn@iwarp.intel.com (Randal Schwartz) writes: >Bogus. >The prolog version most likely saves results from tak(a,b,c) so that >they can do a quick lookup the next time around. I assume Peter van Roy would have told us if this were the case. Peter, could you post the generated code? -- Richard -- Richard Tobin, JANET: R.Tobin@uk.ac.ed AI Applications Institute, ARPA: R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!R.Tobin
sehr@sp20.csrd.uiuc.edu (David C. Sehr) (06/14/90)
Pardon my impertinence, Dr. O'Keefe, but how does goal caching make an implementation no longer ``a Prolog implementation''? It seems to me that whatever implementation observes the externally observable language semantics is ``a Prolog implementation''. If the approach you seem to suggest is taken, then any FORTRAN compiler that does loop invariant floating is not a FORTRAN implementation. That surely is not the approach taken by the FORTRAN community. Since the Tak benchmark is side effect free with only one solution, to all external appearances the execution is the same with or without goal caching. Is there some subtle semantic point I've missed somewhere? I know that tracing/debugging would be different, but that doesn't seem to be at issue here. Now the fact that there's not enough redundancy in Tak to account for that much of a speedup is more proof that there's no goal caching going on in the example at hand. Of course, the fibonacci example cited in a subsequent posting is much more suspect for this, although I'll take Dr. van Roy at his word on both examples. David
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (06/15/90)
In article <1990Jun14.145424.29053@csrd.uiuc.edu>, sehr@sp20.csrd.uiuc.edu (David C. Sehr) writes: > Pardon my impertinence, Dr. O'Keefe, but how does goal caching make > an implementation no longer ``a Prolog implementation''? It seems to > me that whatever implementation observes the externally observable ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > language semantics is ``a Prolog implementation''. Precisely. Goal caching does *NOT* preserve the externally observable language semantics. (That's why extension tables in SB Prolog have to be declared as such.) -- "A 7th class of programs, correct in every way, is believed to exist by a few computer scientists. However, no example could be found to include here."
sehr@sp20.csrd.uiuc.edu (David C. Sehr) (06/15/90)
Well, aside from the miserable english in my first posting, maybe I didn't make my point clear. I agree that memo functions don't preserve the semantics of the language in the general case. What I still fail to see is that it doesn't preserve the semantics in this case (i.e. either tak or fib). The overwhelming majority of ``optimizing'' transformations don't preserve language semantics in the most general case (e.g. invariant hoisting fails with zero-trip loops, constant folding fails with finite precision arithmetic, do to doall parallelization fails in the presence of pi-blocks, etc.). A part of all optimization is determining the legality of the transformation before it is applied. Is there something obvious about this specific problem that I've missed? David
jeff@aiai.ed.ac.uk (Jeff Dalton) (06/15/90)
What was missing form the original article was an explanation of why the Prolog result was faster. Without that, we can't learn all that much from it. Two examples may serve to illustrate this point. I have done Lisp vs C comparisons for TAK in the past. Some results showed Franz Lisp being faster than C. Lisp arithmetic can be as fast as C, but it would be unusual to find that it was faster. However, this was on a VAX, and a little thought suggested that what was really being shown was that Franz implemented procedure calls more efficiently. Indeed, "local" functions in Franz are called by JSB (or whatever the VAX instruction is) while the C call was using CALLS. Of course, C could use JSB, so my result did not show that the Franz Lisp language was faster than C, just that certain implementations on certain machines could produce certain results. A more fun example is that Kyoto Common Lisp was faster than C on a Sun3. This cries out for explanation, because KCL compiles Lisp by translating it into C. So this Prolog vs C result doesn't show that this Prolog is faster than that C on arithmetic. It might be procedure calls or something else that accounts for it. I at least would be interested in knowing what and whether C could do it too. Perhaps Richard O'Keefe is right to suggest that it's "_really_ cheap" procedure calls. However, before going on I think we ought to bear in mind that the original article said: Disclaimer: this benchmark has a particularly easy translation to C. The result does not necessarily hold for other programs. But don't you believe it anymore that Prolog is slow! Ok, maybe Prolog is still slow at everything else; but it's clear that no one was really trying to claim otherwise. Jeff Dalton, JANET: J.Dalton@uk.ac.ed AI Applications Institute, ARPA: J.Dalton%uk.ac.ed@nsfnet-relay.ac.uk Edinburgh University. UUCP: ...!ukc!ed.ac.uk!J.Dalton
tim@proton.amd.com (Tim Olson) (06/16/90)
In article <22078@megaron.cs.arizona.edu> debray@cs.arizona.edu (Saumya K. Debray) writes: | merlyn@iwarp.intel.com (Randal Schwartz) writes: | > The prolog version most likely saves results from tak(a,b,c) so that | > they can do a quick lookup the next time around. | | Look -- a program either does something, or it doesn't [*]. To say that | it "most likely" does something is basically the same as saying | "I {didn't bother to | couldn't} understand the code, but here's a | wild guess", except that it's not quite as honest. | | For the benchmark in question, Van Roy posted both the C and the Prolog | sources, so it's easy to verify whether or not there's any table lookups | going on. There isn't. No, there is no *explicit* table lookup in the prolog version of the benchmark, but that doesn't mean that there is no table lookup of cached results going on in the depths of the Prolog implementation. It would be a valid thing to do in a Prolog implementation to speed up unification. It is quite easy to follow a C program and see what is going on, since there is a good correspondance between the language and the hardware. For Prolog, however, (like Smalltalk, ML, etc.) there is usually quite a lot hidden in the implementation that prevents a direct comparison. The argument here is that "unification result cacheing" changes the benchmark significantly -- entire call trees are pruned from the execution. -- Tim Olson Advanced Micro Devices (tim@amd.com)