jaffer@zurich.ai.mit.edu (Aubrey Jaffer) (05/02/91)
At the end of this message is benchmark code in C and in Scheme. My original hope was that computing the ratios of execution time for the 2 programs would allow me to roughly measure speed of an implementation independent of the hardware. There are problems with this approach. The benchmark programs take an argument and compute that many digits of pi. The programs take time proportional to the square of the number of digits. This allows one to compensate for widely differing times of computation. When I measure the times on my 10Mhz 68000 I get: (pi 100) takes 33.9 secs (scm2d) time pi 200 takes 7.1 secs 33.9/(7.1/4) ==> 19 times slower than C But on kleph.ai.mit.edu (which I think has a 68020) I get: (pi 100) takes 6.6 secs (scm2d) time pi 800 takes 7.6 secs 6.6/(7.6/64) ==> 56 times slower that C ??! What is going on here? J. Finger suggests that kleph has an instruction cache. That would mean that the C inner loop stays in the cache while the scm interpreter doesn't. So what are the prospects for machine independent performance measures? An algorithm with a more complicated inner loop would keep the C program out of cache. But such algorithms tend either to be artifically lengthened or lack the nice scaling properties of the pi programs. Part of the appeal to me of the pi program is that it does an actual difficult computation with the best code I can devise. Previously I used an intentionally less that optimal fibonacci. An argument can also be made that the cached figures are more realistic for everyday programming problems. This has an interesting consequence in that the penalty for using interpreters on newer machines is more severe than on old ones. So if anyone can suggest solitions to these problems I would be very interested. By the way, on kleph MITScheme takes 22 secs for (pi 100) interpreted (185 times slower) and about 8 secs for (pi 100) compiled (about 67 times slower) so I guess my interpreter is not doing too badly. /* 'Spigot' algorithm origionally due to Stanly Rabinowitz */ /* main(c,v) int c;char **v;{ int n=200,j,m,b=2,k,t,r=1e5; long q; short *a; if(c==2)n=atoi(v[1])/5+1; k=m=3.322*n*5; a=calloc(1+m,2); while(k)a[--k]=2; for(a[m]=4;j<n;b=q%r){ q=0; for(k=m;k;){ q+=a[k]*r; t=(2*k+1); a[k]=q%t; q=q/t; q*=k--;} printf("%05d%s",b+q/r,++j%10?" ":"\n");} puts("");} */ main (c,v)int c;char**v;{int n=200,j,m, b=2,k,t,r=1e5;long q;short*a;if(c==2 )n=atoi(v[1])/5+1;k=m=3.322*n*5;a= calloc( 1+m,2) ;while (k)a[ --k]=2 ;for(a [m]=4; j<n;b= q%r){q =0;for (k=m;k ;){q+= a[k]*r ;t=(2* k+1); a[k]=q %t;q= q/t;q* =k--;} printf( "%05d%s" ,b+q/r, ++j%10? " ":"\n" );}puts ("");} =========================================================== (define r 100000) (define (pi n) (let* ((n (+ (quotient n 5) 1)) (m (quotient (* n 5 3322) 1000)) (a (make-vector (+ 1 m) 2))) (vector-set! a m 4) (do ((j 1 (+ 1 j)) (q 0 0) (b 2 (remainder q r))) ((> j n)) (do ((k m (- k 1))) ((zero? k)) (set! q (+ q (* (vector-ref a k) r))) (let ((t (+ 1 (* 2 k)))) (vector-set! a k (remainder q t)) (set! q (* k (quotient q t))))) (let ((s (number->string (+ b (quotient q r))))) (do ((l (string-length s) (+ 1 l))) ((>= l 5) (display s)) (display #\0))) (display (if (zero? (modulo j 10)) #\newline #\ ))) (newline)))
sfk@otter.hpl.hp.com (Steve Knight) (05/02/91)
Whenever the topic of benchmarks comes around, I am always impressed by the subtle depth of the problems. It seems to be an area in which the demand for simple answers is much greater than the supply. > So what are the prospects for machine independent performance > measures? Very poor. Folks are simply too incautious about the interpretation of benchmarks in any case. (A case in point -- some folks I know were doing a whole series of benchmarks on some software. They wrote up all the tests real nice. However, I couldn't make head or tail of their figures when I compared it against other results I'd checked out. So I got on the machine & tried the tests. My first test was to check out the timing software they were using. It was wrong by a factor of 2. This is by no means a unique example.) I guess I can add one point. When doing benchmarks across machines, it is important to establish what the timing and storage measurements are actually measuring. Does the timing include process swapping, page swapping, and other admin overheads? Does it correlate with a stopwatch? Does it include time spent in the kernel? Does it include garbage collection time (how many were there)? Do two C programs of known relative performance give equal performance ratios on a range of machines? If you're on 68xxx machines, do the same test but avoiding 68020 instructions, to check if its ineffective use of the 68020 instruction set that causes the difficulties. And so on. Steve
huntley@copper.ucs.indiana.edu (Haydn Huntley) (05/02/91)
In article <JAFFER.91May1163805@kleph.ai.mit.edu> jaffer@zurich.ai.mit.edu (Aubrey Jaffer) writes: | At the end of this message is benchmark code in C and in Scheme. My | original hope was that computing the ratios of execution time for the | 2 programs would allow me to roughly measure speed of an | implementation independent of the hardware. There are problems with | this approach. | | The benchmark programs take an argument and compute that many digits | of pi. The programs take time proportional to the square of the | number of digits. This allows one to compensate for widely differing | times of computation. | | When I measure the times on my 10Mhz 68000 I get: | (pi 100) takes 33.9 secs (scm2d) | time pi 200 takes 7.1 secs | 33.9/(7.1/4) ==> 19 times slower than C | | But on kleph.ai.mit.edu (which I think has a 68020) I get: | (pi 100) takes 6.6 secs (scm2d) | time pi 800 takes 7.6 secs | 6.6/(7.6/64) ==> 56 times slower that C ??! | | What is going on here? J. Finger suggests that kleph has an | instruction cache. That would mean that the C inner loop stays in the | cache while the scm interpreter doesn't. I also performed some similar benchmarks on a VAX 8650 and a Sun 3, and compared your C version compiled with GCC, and your Scheme version compiled with Chez Scheme 3.2 and found that the C versions were 20 to 30 times faster. The times I measured were: On a VAX 8650 (copper.ucs.indiana.edu) (pi 100) takes 1.84 secs (Chez Scheme 3.2) (pi 800) takes 106.93 secs (Chez Scheme 3.2) time pi 100 takes .06 secs (GCC 1.39) time pi 800 takes 3.54 secs (GCC 1.39) ratio of Scheme to C is approx. 30 On a Sun 3 with a 16Mhz 68020 (cssun2.ucs.indiana.edu) (pi 100) takes 4.84 secs (Chez Scheme 3.2) (pi 800) takes 246.46 secs (Chez Scheme 3.2) time pi 100 takes .2 secs (GCC 1.39) time pi 800 takes 10.0 secs (GCC 1.39) ratio of Scheme to C is approx. 24 So what is going on here? Well, simply put, Scheme is not nearly as efficient a language for executing this benchmark as C is. In general, Scheme can be a more convenient language to program because it does garbage collection for you, while a language like C requires more care when doing dynamic memory allocation and reclaimation. On the other hand, C is usually much more efficient, because C compilers can usually store variables in registers, and at the worst, they can always put them on the stack, while Scheme compilers sometimes must store things out in the heap, and use registers less advantagously than C compilers do. In addition, Scheme has latent typing and generic arithmetic, which require tests to be made at run-time, while C has static typing, so it doesn't have to do any tests on the types of variables. I would guess (and I'm just a lowly CS grad student) that the difference between using a cache, and missing it's use, typically would only account for a speed difference of at most a factor of 4, and more probably only a factor of 2. Did you use the same compiler and same options on both your 68000 and 68020? Do they both run at the same clock rate? Your C benchmarks show that kaleph is 8 times faster than your 68000! | So what are the prospects for machine independent performance | measures? An algorithm with a more complicated inner loop would keep | the C program out of cache. But such algorithms tend either to be | artifically lengthened or lack the nice scaling properties of the pi | programs. Part of the appeal to me of the pi program is that it does | an actual difficult computation with the best code I can devise. | Previously I used an intentionally less that optimal fibonacci. | | An argument can also be made that the cached figures are more | realistic for everyday programming problems. This has an interesting | consequence in that the penalty for using interpreters on newer | machines is more severe than on old ones. | | So if anyone can suggest solitions to these problems I would be very | interested. By the way, on kleph MITScheme takes 22 secs for (pi 100) | interpreted (185 times slower) and about 8 secs for (pi 100) compiled | (about 67 times slower) so I guess my interpreter is not doing too | badly. I'd say that your interpreter seems to be incredibly fast! Is it an really an interpreter or actually a compiler? Does it support generic arithmetic? (fixnums, bignums, and flonums) Does it do run-time checks to see if the heap is nearly full? Does it perform any type checks at run-time? Please don't laugh at these questions, but I've been working on a Scheme compiler this semester, and I learned how expensive it is to implement generic arithmetic, garbage collection, and type checking! Each one of them costs something like a factor of 2 in performance! --Haydn -- ;; ***************************************************** ;; * Haydn Huntley huntley@copper.ucs.indiana.edu * ;; *****************************************************
boehm@parc.xerox.com (Hans Boehm) (05/03/91)
huntley@copper.ucs.indiana.edu (Haydn Huntley) writes: >... I learned how expensive it is to >implement generic arithmetic, garbage collection, and type checking! >Each one of them costs something like a factor of 2 in performance! >--Haydn At least in the case of garbage collection, I think this statement is a bit misleading. Depending on your ratio of accessible to available memory, caching effects, frequency of side effects, etc., allocating objects on the heap instead of stack allocating them can easily cost you a factor of 2 in performance. But if you compare garbage collection to heap allocation and explicit deallocation (as with C malloc/free), it's not clear that garbage collection costs you anything in total execution time. I've seen cases, running with a trace and sweep allocator, where inserting explicit deallocations slowed down the code. It's often cheaper for a garbage collector to restore a large collection of objects to free lists at once, than it is to deallocate them one at a time. Looking only at cpu cycles, this effect should usually be more pronounced for copying collectors. (Copying collectors have other disadvantages, but that's another story.) I suspect the problem with this example is some combination of superfluous allocation and generic arithmetic. The example isn't entirely fair to Scheme either, since there are other ways to solve this problem that actually use Scheme's builtin BigNums, and are probably much faster. The C program is reimplementing BigNums in a pretty clumsy way. (Our desk calculator utility computes 3000 digits of PI in about 10 seconds on a SPARCstation 2, using a fancier algorithm, and taking advantage of BigNum arithmetic. It's written in Russell, but spends most of its time in the DEC/INRIA BigNum package, which is written in C.) Hans (boehm@xerox.com)
jinx@zurich.ai.mit.edu (Guillermo J. Rozas) (05/03/91)
| So if anyone can suggest solitions to these problems I would be very | interested. By the way, on kleph MITScheme takes 22 secs for (pi 100) | interpreted (185 times slower) and about 8 secs for (pi 100) compiled | (about 67 times slower) so I guess my interpreter is not doing too | badly. I'd say that your interpreter seems to be incredibly fast! Is it an really an interpreter or actually a compiler? Does it support generic arithmetic? (fixnums, bignums, and flonums) Does it do run-time checks to see if the heap is nearly full? Does it perform any type checks at run-time? The numbers don't tell the whole story. When analyzing benchmarks you have to be very careful. The benchmark does not so much test the speed of the interpreter or compiler used for evaluation, but mostly the speed of the numeric and IO libraries in the system. I will justify my claims by showing the results of a few tests that I ran using MIT Scheme's compiler. I'm not trying to justify MIT Scheme's apparent slowness, nor make a sales pitch for it. I'm just using it as an exploratory tool because I know it well. I have three versions of the benchmark: A: tst-initial, the program sent by Aubrey Jaffer without modification. In MIT Scheme the program will do out-of-line calls for all numeric operations. B: tst-generic, the program modified (in an MIT-Scheme-specific way) to open code generic arithmetic. C: tst-fixnum, the program modified (in an MIT-Scheme-specific way) to use fixnum-only arithmetic instead of generic arithmetic. Note that in MIT Scheme, the out-of-line versions of the generic arithmetic procedures are VERY slow, thus if the code ever calls them, it will not run very fast. This is not a property of the interpreter/compiler, but a property of the way the generic arithmetic routines are built. A short amount of work would fix this, but no one has really worked on improving the speed of the standard library. Here is a transcript, with some comments. - The code was run on chamartin.ai.mit.edu, an HP9000s350 with 32 Mbytes of memory identical to Kleph, where Aubrey ran his tests. The HP9000s350 has a 25 MHz 68020. - SHOW-TIME prints times in milliseconds. 1 ]=> (load "/zu/jinx/tmp/tst-initial") Loading "/zu/jinx/tmp/tst-initial.com" -- done ;Value: pi 1 ]=> (show-time (lambda () (pi 100))) 00003 14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59230 78164 06286 20899 86280 34825 34211 70679 process time: 8120; real time: 8386 ;No value 1: Pretty much as reported by Aubrey. 1 ]=> (load "/zu/jinx/tmp/tst-generic") Loading "/zu/jinx/tmp/tst-generic.com" -- done ;Value: pi 1 ]=> (show-time (lambda () (pi 100))) 00003 14159 26535 89793 23846 26433 83279 50288 41971 69399 37510 58209 74944 59230 78164 06286 20899 86280 34825 34211 70679 process time: 2900; real time: 3086 ;No value 2: Not much of an improvement. 1 ]=> (fluid-let ((r 10000)) (show-time (lambda () (pi 128)))) 00003 01415 09265 03589 07932 03846 02643 03832 07950 02884 01971 06939 09375 01058 02097 04944 05923 00781 06406 02862 00899 08628 00348 02534 02117 00679 process time: 660; real time: 784 ;No value 3: Noticeable improvement even though it is run on a larger parameter. BTW, the leading 0 in all the strings should be ignored. The program is not parameterized correctly according to the value of R. 1 ]=> (load "/zu/jinx/tmp/tst-fixnum") Loading "/zu/jinx/tmp/tst-fixnum.com" -- done ;Value: pi 1 ]=> (fluid-let ((r 10000)) (show-time (lambda () (pi 128)))) 00003 01415 09265 03589 07932 03846 02643 03832 07950 02884 01971 06939 09375 01058 02097 04944 05923 00781 06406 02862 00899 08628 00348 02534 02117 00679 process time: 520; real time: 591 ;No value 4: Not much of an improvement. See below. Explanation: 1: Shows the results as reported by Aubrey. Even when compiled, the code runs slowly in MIT Scheme. 2: The results using open-coded generic arithmetic. Not much faster. 3: Same program as above but with a smaller value of R and a larger value of N to generate the same digits. It should be slower (the program is roughly quadratic on N) but runs much faster. What's the deal? The answer is that the unmodified program occasionally uses bignums in MIT Scheme because the range of fixnums in MIT Scheme is not large enough to accomodate the range of values of Q. By reducing R, we avoid bignums, and the program runs much faster. Note that in MIT Scheme fixnums only cover the range -2^25 <= x <= (2^25)-1, and Q in the loop in Aubrey's program exceeds this range every 8-10 iterations. I believe that Aubrey's scm uses fixnums in the range -2^29 <= x (2^29)-1, but I'm not certain. 4: Not much of a difference between generic arithmetic and fixnum arithmetic. Most of the time at this point is spent in NUMBER->STRING, DISPLAY, etc., and only speeding those up will make the code run faster. Clearly (because of 2 and 3) the time taken by the program is not so much a consequence of the speed of the interpreter/compiler as a consequence of the particular procedures being called and the range of values in use. Thus the benchmark tests the speed of the support library, not the speed of the underlying interpreter or compiler. It is a valid benchmark (most programs are), but we need to draw the correct conclusions from the results. To test the inherent speed of an interpreter or compiler, I would write a program that does not use any of the constants (ie. primitives) in the language. For example, it could use Church pairs and Church integers. Thus most (or all) differences in the runtime libraries would be removed. Of course, we could not then compare it to C. PS: For those who care, the code in B was the benchmark as sent by Aubrey, preceded by (declare (usual-integrations) (integrate-external "/scheme/runtime/fixart")) (define-integrable (quo x y) (if (and (fix:fixnum? x) (fix:fixnum? y)) (fix:quotient x y) (quotient x y))) (define-integrable (rem x y) (if (and (fix:fixnum? x) (fix:fixnum? y)) (fix:remainder x y) (remainder x y))) (define (mod x y) (let ((r (rem x y))) (if (or (zero? r) (< x 0) (< y 0) (not (< y 0))) r (+ r y)))) And with the renames quotient->quo, remainder->rem, modulo->mod. The reason is that (due to oversight) none of the procedures above are open-coded by the compiler.
ram+@cs.cmu.edu (Rob MacLachlan) (05/04/91)
In article <1991May2.155838.20830@bronze.ucs.indiana.edu> huntley@copper.ucs.indiana.edu (Haydn Huntley) writes: >I also performed some similar benchmarks on a VAX 8650 and a Sun 3, >and compared your C version compiled with GCC, and your Scheme version >compiled with Chez Scheme 3.2 and found that the C versions were 20 to >30 times faster. > >On a Sun 3 with a 16Mhz 68020 (cssun2.ucs.indiana.edu) >(pi 100) takes 4.84 secs (Chez Scheme 3.2) >(pi 800) takes 246.46 secs (Chez Scheme 3.2) >time pi 100 takes .2 secs (GCC 1.39) >time pi 800 takes 10.0 secs (GCC 1.39) >ratio of Scheme to C is approx. 24 On a DECStation 3100 running Mach: (pi 800) takes 2.7 secs (CMU Common Lisp Apr-28-1991, speed 3, safety 0) time pi 800 takes 2.1 secs (MIPS cc -O4) ratio of Lisp to C is 1.28 >So what is going on here? Well, simply put, Scheme is not nearly as efficient >a language for executing this benchmark as C is. [...] Scheme can be a more >convenient language to program [...] while a language like C requires more >care. [...] On the other hand, C is usually much more efficient, because C >compilers can usually store variables in registers, and at the worst, they can >always put them on the stack, while Scheme compilers sometimes must store >things out in the heap, and use registers less advantagously than C compilers >do. Yes, register allocation and avoidance of number consing are crucial to good numeric performance. But Scheme could do it too. >In addition, Scheme has latent typing and generic arithmetic, which >require tests to be made at run-time, while C has static typing, so it >doesn't have to do any tests on the types of variables. Supporting dynamic typing and generic arithmetic doesn't mean you have to use it all the time. That's what type inference and declarations are for. >| So what are the prospects for machine independent performance >| measures? Part of the appeal to me of the pi program is that it does >| an actual difficult computation with the best code I can devise. >| Previously I used an intentionally less that optimal fibonacci. Both of these benchmarks can be meaningful, but they measure *totally* different things. The recursive fibonacci is basically a function call benchmark (like the traditional TAK) which means that it can probably predict the performance of most lisp programs within a factor of 3. Given constant compiler technology, integer Specmarks will be a much better predictor. This PI program is doing extended precision arithmetic. Unless you are implementing extended precision arithmetic in Lisp (unusual but possible), then you are better off with another benchmark. And if you do find a compiler that generates reasonable code, you are probably mainly testing the speed of integer divide. I suggest reading comp.benchmarks for a while if you are interested in general benchmarking concepts. >[...] I've been working on a Scheme compiler this semester, and I learned how >expensive it is to implement generic arithmetic, garbage collection, and type >checking! Each one of them costs something like a factor of 2 in performance! Well, they may, but they don't have to, as the CMU numbers demonstrate. Common Lisp is pretty much a superset of scheme, modulo call/cc. And I don't see why supporting call/cc should make this program run any slower, since it doesn't use call/cc, and doesn't call any functions that might. I believe that the slowdown in non-CMU Lisps is entirely due to generic and bignum arithmetic. Note that the original C program was written for 32 bit integers, although it presumably could have been written otherwise. Because of this, any Lisp without efficient support for full-word integers will lose horribly, not only calling generic arithmetic routines, but also using bignum arithmetic. You should be able to get reasonable performance in pretty much any Lisp by changing the program to use FIXNUM arithmetic and adding any necessary declarations. CMU CL has an edge here because: -- You don't have to change the program. -- You don't have to add declarations out the wazoo. -- Code runs reasonably fast, even with full type checking. Here is the CL version: ________________________________________________________________ (in-package "USER") (defconstant r 100000) (defun pi (n) (declare (type (integer 0 1000) n) (inline ceiling)) (let* ((n (ceiling n 5)) (m (truncate (* n 5 3322) 1000)) (a (make-array (+ 1 m) :initial-element 2 :element-type '(unsigned-byte 12)))) (setf (aref a m) 4) (do ((j 1 (+ 1 j)) (q 0 0) (b 2 (rem q r))) ((> j n)) (declare (type (unsigned-byte 31) j b q)) (do ((k m (- k 1))) ((zerop k)) (declare (type (unsigned-byte 12) k)) (setq q (+ q (* (aref a k) r))) (multiple-value-bind (quo rem) (truncate q (+ 1 (* 2 k))) (declare (type (unsigned-byte 19) quo)) (setf (aref a k) rem) (setq q (* k quo)))) (format t "~5,'0D" (+ b (truncate q r))) (if (zerop (mod j 10)) (terpri) (write-char #\space))) (terpri)))
jinx@zurich.ai.mit.edu (Guillermo J. Rozas) (05/04/91)
Well, they may, but they don't have to, as the CMU numbers demonstrate. Common Lisp is pretty much a superset of scheme, modulo call/cc. And I don't see why supporting call/cc should make this program run any slower, since it doesn't use call/cc, and doesn't call any functions that might. How do you know that QUOTIENT or FORMAT (for example) may not error, thereby calling the condition system which may use CALL-WITH-CURRENT-CONTINUATION to construct a condition object to be used if the computation is going to be continued? I think that it is almost impossible to know. Because of this, essentially any potentially assigned variables must be on the heap so that invoking a continuation multiple times will not un-do side effects. You should be able to get reasonable performance in pretty much any Lisp by changing the program to use FIXNUM arithmetic and adding any necessary declarations. CMU CL has an edge here because: -- You don't have to change the program. -- You don't have to add declarations out the wazoo. -- Code runs reasonably fast, even with full type checking. The program that you sent is pretty different from the original, and has 5 different declarations, some pretty arcane (e.g. (declare (type (unsigned-byte 19) quo))). Thus it seems that you changed the program, and that there are declarations up the wazoo, although perhaps less than in other dialects of CL. I like the fact that you don't have to insert THE everywhere, but I still think that this is a far cry from what you seem to imply. Perhaps this is a portable CL program, and not a CMU-CL program. Please explain what you meant.
ram+@cs.cmu.edu (Rob MacLachlan) (05/04/91)
I'd like to clarify my intent here. Of course, part of the reason I posted was that I couldn't miss a chance to brag about my spiffy new compiler. But the main reason was that I didn't want to let go unchallenged the claim that Lisp (or any language with dynamic typing, generic arithmetic and GC) will be 8 times slower than C. Any powerful language feature sometimes hurts performance, but implementors can insure that you only get hurt when you play with the sharp edges. This means that there is not a straightforward product combination of inefficiency when you have a language with several potentially inefficient features. How do you know that QUOTIENT or FORMAT (for example) may not error, [...] I think that it is almost impossible to know. [...] any potentially assigned variables must be on the heap Yeah, I guess it can get weedy if you aren't willing to restrict the extent of continuations created in the debugger. But if you don't have word integers, this program will still perform terribly even if you do change to Scheme's idiomatic iteration. You should be able to get reasonable performance in pretty much any Lisp by changing the program to use FIXNUM arithmetic and adding any necessary declarations. CMU CL has an edge here because: -- You don't have to change the program. -- You don't have to add declarations out the wazoo. -- Code runs reasonably fast, even with full type checking. [...] it seems that you changed the program, and that there are declarations up the wazoo, although perhaps less than in other dialects of CL. I like the fact that you don't have to insert THE everywhere, but I still think that this is a far cry from what you seem to imply. Well, I guess my wazoo is different than yours. I meant: -- I didn't have to understand how the program worked to change it. I only changed syntax and added some declarations (which I reverse-engineered from the knowledge that long == (signed-byte 32), and that the maximum N the C program allowed was 1000.) [I also couldn't resist the temptation to clarify the program by using FORMAT...] -- There are no THE declarations. -- Safety does not mean generic arithmetic in CMU CL (though the time I reported time was unsafe.) Robert MacLachlan (ram+@cs.cmu.edu)
jaffer@zurich.ai.mit.edu (Aubrey Jaffer) (05/04/91)
huntley@copper.ucs.indiana.edu (Haydn Huntley) asks: >Did you use the same compiler and same options on both your 68000 and >68020? No and Yes. The 68000 is a 1984 vintage unix and the 68020 is an HP-UX. I used no compiler options on both. >Do they both run at the same clock rate? Your C benchmarks >show that kaleph is 8 times faster than your 68000! My 68000 is 10Mhz. The 68020 is 25Mhz. >I'd say that your interpreter seems to be incredibly fast! Is it an >really an interpreter or actually a compiler? Scm has an interpreter written entirely in C (it uses no macros written in scheme). In addition, it memoizes the locations of all variable references by modifiying the code. This is described in code.doc in the distribution. >Does it support generic arithmetic? (fixnums, bignums, and flonums) Not yet. I am adding this now. However, 30 bit fixnum arithmetic speed will not be affected. >Does it do run-time checks to see if the heap is nearly full? Yes. The times include GC. >Does it perform any type checks at run-time? It checks for number of arguments, type of arguments, structure references, numeric overflow, and unboundness of varibles.
jeff@aiai.edinburgh.ac.UK (Jeff Dalton) (05/05/91)
> Date: 3 May 91 20:23:21 GMT > From: "Guillermo J. Rozas" <jinx@zurich.ai.mit.edu> > How do you know that QUOTIENT or FORMAT (for example) may not error, > thereby calling the condition system which may use > CALL-WITH-CURRENT-CONTINUATION to construct a condition object to be > used if the computation is going to be continued? > > I think that it is almost impossible to know. Because of this, > essentially any potentially assigned variables must be on the heap so > that invoking a continuation multiple times will not un-do side effects. If so, this looks like a very strong argument against having full continuations.
hieb@heston.cs.indiana.edu (Robert Hieb) (05/06/91)
jeff@aiai.edinburgh.ac.UK (Jeff Dalton) writes: >> Date: 3 May 91 20:23:21 GMT >> From: "Guillermo J. Rozas" <jinx@zurich.ai.mit.edu> >> essentially any potentially assigned variables must be on the heap so >> that invoking a continuation multiple times will not un-do side effects. >If so, this looks like a very strong argument against having full >continuations. In practice, assigned variables in Scheme are typically "state" variables with indefinite extent anyway, and thus not suitable for stack placement. The assignments in the piece of code that occasioned this side track seem to be an artifact of an overly-literal translation from C code. Replacing the "do" loops with "named let" loops makes it easy to eliminate the assignments to "q" by turning it into a formal parameter of the inner loop. "named let" often makes it possible to remove assignments and breaks from C-style loops. In general, tail recursion optimization in Scheme makes it natural to create state machines that pass (rather than assign) parameters.
dyb@marilyn.cs.indiana.edu (R. Kent Dybvig) (05/16/91)
wa@raven.cad.mcc.com (Wayne Allen) writes: >I've heard good things about Chez Scheme, and find it hard to >take Huntley's data on face value. Okay. I had decided to stay out of this thread, since I thought that Bill Rozas covered the subject sufficiently. Huntley's numbers are probably entirely correct for Chez Scheme Version 3.2 given the original program, which uses generic arithmetic. Furthermore, since Version 3.2 uses 19 bit fixnums, the original program will not run correctly with fixnum-specific operators. However, by playing around with the parameters (specifically, the "r" parameter), it is possible to use fixnum-specific operators and get much better performance: on the order of four or five times slower than C. Using Version 3.9n (a prerelease of Chez Scheme Version 4), which uses 30-bit fixnums, it is still not possible to run "(pi 800)" with the original program without causing fixnum overflow; the CMU Common Lisp numbers must be valid only because of the "unsigned" declaration. With a slight change to the "r" parameter to allow fixnum-specific operators to be used, Version 3.9n is slightly less than 3 times slower than optimized GNU C on a Sun-3. Although a factor of 3 is much more encouraging than a factor of 20, we clearly have more work to do to make this benchmark perform as well in Scheme as it does in C. Incidentally, there's a bug in the original C code: j is not initialized. I tried the C code first on a Sun 4 and it did not run properly until I initialized j to 0. Kent R. Kent Dybvig | Computer Science Department dyb@iuvax.cs.indiana.edu | Indiana University 812/855-6486 | Bloomington, IN 47405