seibert@XN.LL.MIT.EDU (seibert) (10/19/89)
Hello lispm experts. I have a conundrum that the local folks haven't been able to explain to me. I hope you can help. I'm investigating fixnum vs single-float multiplies. It appears to me that a single-float multiply is faster than a fixnum multiply. I've appended to this message the (short) test functions I'm using and a sample output. I understand why using bignums is slower than fixnums, and the progression of CPU timings as the number of bits used in the integer representation increases, but not why the single-float multiply appears to be in the same ball-park as the fixnum multiply. I've tried the tests with and without: declarations, setfs, ash, and multiple repetitions within the timing brackets. These results are from an Explorer I, but I get essentially the same results on my Symbolics machine. The idea of the code is to take a "nbrr" and time how much CPU time is needed to square it using a single-float representation and a range of integer rep'ns. Here's the code.... (defvar bbits 8) (defvar right -7) (defvar nbrr (/ 3.0)) (defvar str t) (defmacro i* (i j) `(the fixnum (ash (* (the fixnum ,i) (the fixnum ,j)) (the fixnum right)))) (defun foo () (let ((inbr (floor (* nbrr (ash 1 (1- bbits))))) (result nil)) (timeit () (setf result (i* inbr inbr))) (format str " ~3d bits result: ~10,7f ~12A; ~12A inbr ~d" bbits (* result (/ 1.0 (ash 1 (1- bbits)))) (type-of result) (type-of inbr) inbr))) (defun foobar () (do ((bbits 4 (+ bbits 4))) ((= bbits 60)) (setf right (- 1 bbits)) (foo)) (format str "~%floating point... ") (let ((nbrw nbrr) (result nil)) (timeit () (setf result (* nbrw nbrw))) (format str " result: ~10,7f ~12A; ~12A nbr ~20,17f" result (type-of result) (type-of nbrw) nbrw))) When I invoke foobar, here's the output.... CPU: 18.0 us 4 bits result: 0.0000000 FIXNUM ; FIXNUM inbr 2 CPU: 19.0 us 8 bits result: 0.1015625 FIXNUM ; FIXNUM inbr 42 CPU: 18.0 us 12 bits result: 0.1108398 FIXNUM ; FIXNUM inbr 682 CPU: 56.0 us 16 bits result: 0.1110840 FIXNUM ; FIXNUM inbr 10922 CPU: 60.0 us 20 bits result: 0.1111088 FIXNUM ; FIXNUM inbr 174762 CPU: 62.0 us 24 bits result: 0.1111110 FIXNUM ; FIXNUM inbr 2796202 CPU: 67.0 us 28 bits result: 0.1111111 FIXNUM ; BIGNUM inbr 44739244 CPU: 54.0 us 32 bits result: 0.1111111 BIGNUM ; BIGNUM inbr 715827904 CPU: 91.0 us 36 bits result: 0.1111111 BIGNUM ; BIGNUM inbr 11453246464 CPU: 90.0 us 40 bits result: 0.1111111 BIGNUM ; BIGNUM inbr 183251943424 CPU: 90.0 us 44 bits result: 0.1111111 BIGNUM ; BIGNUM inbr 2932031094784 CPU: 89.0 us 48 bits result: 0.1111111 BIGNUM ; BIGNUM inbr 46912497516544 CPU: 95.0 us 52 bits result: 0.1111111 BIGNUM ; BIGNUM inbr 750599960264704 CPU: 95.0 us 56 bits result: 0.1111111 BIGNUM ; BIGNUM inbr 12009599364235264 floating point... CPU: 25.0 us result: 0.1111111 SINGLE-FLOAT; SINGLE-FLOAT nbr 0.33333334000000000 Thanks for your help. _Ms -- Michael Seibert seibert@xn.ll.mit.edu ll-xn!seibert
Rice@SUMEX-AIM.STANFORD.EDU (James Rice) (10/20/89)
@i[I] would expect that short float and fixnum multiplication times should be very similar. This is because to do the short float multiple you simply multiply the two mantissae and ADD the exponents. The addition of the exponents can be done in parallel to the multiple of the mantissae and should be faster. How many bits are there in the mantissa of an Explorer short float? If there are fewer than in a fixnum then I would expect it to be a little faster (modulo any mysterious IEEE behaviour checking). Floating point addition is the thing that really screws you, since you have to slide the two mantissae until the exponents match before you can perform the addition. All of those arithmetic shifts can be slow. Rice. (mind you they may simply have screwed up the ucode).
pf@islington-terrace.csc.ti.com (Paul Fuqua) (10/20/89)
Date: Thursday, October 19, 1989 1:54pm (CDT) From: seibert at xn.ll.mit.edu (seibert) (defmacro i* (i j) `(the fixnum (ash (* (the fixnum ,i) (the fixnum ,j)) (the fixnum right)))) (defun foobar () ... (timeit () (setf result (* nbrw nbrw))) Note that I* does ASH as well as *, while the floating-point timing only does *. On my Explorer 2, that make the difference between 1.41 usec (* only, fixnum) and 3.14 usec (* with ASH, fixnum). If I change the I* to * (so I just do multiplication), I get times of about 1.6 usec for fixnum-fixnum, 5.5 to 6.5 usec for fixnum-bignum, and 7.5 usec and up for bignum-bignum. Single-float multiplication is about 8.6 usec. (I should probably point out that these aren't official timings, just me hacking on my Explorer, which is old enough to have a white exterior and contains a processor board of uncertain revision.) Date: Thursday, October 19, 1989 2:35pm (CDT) From: James Rice <Rice at sumex-aim.stanford.edu> @i[I] would expect that short float and fixnum multiplication times should be very similar. I wouldn't, but that's because I know the microcode. I think small-floats are still decomposed into the same internal representation as single-floats -- microcode space was (and is) a bit tight on Explorer 1s. The addition of the exponents can be done in parallel to the multiple of the mantissae and should be faster. If you had a parallel functional unit to do it with. Multiplication on the Explorer is done with multiply-step microinstructions (Booth's algorithm) -- 1 bit at a time for the Explorer 1, 2 for the Explorer 2. Floating point addition is the thing that really screws you, since you have to slide the two mantissae until the exponents match before you can perform the addition. All of those arithmetic shifts can be slow. Especially on a machine whose hardware supports LDB/DPB more than shifts. I think the Explorer 2 has some normalisation hardware support, but I was getting out of the microcode business by the time Release 3 and the Explorer 2 came out. Paul Fuqua pf@csc.ti.com {smu,texsun,cs.utexas.edu,rice}!ti-csl!pf Texas Instruments Computer Science Center PO Box 655474 MS 238, Dallas, Texas 75265