[comp.sys.ti.explorer] fixnum vs single-float multiplies

seibert@XN.LL.MIT.EDU (seibert) (10/19/89)

Hello lispm experts.  I have a conundrum that the local folks haven't been
able to explain to me.  I hope you can help.  I'm investigating fixnum vs 
single-float multiplies.  It appears to me that a single-float multiply is 
faster than a fixnum multiply.  I've appended to this message the (short) test
functions I'm using and a sample output.  I understand why using bignums is
slower than fixnums, and the progression of CPU timings as the number of bits
used in the integer representation increases, but not why the single-float
multiply appears to be in the same ball-park as the fixnum multiply.

I've tried the tests with and without:  declarations, setfs, ash, and multiple
repetitions within the timing brackets.  These results are from an Explorer I,
but I get essentially the same results on my Symbolics machine.  

The idea of the code is to take a "nbrr" and time how much CPU time is needed
to square it using a single-float representation and a range of integer rep'ns.
Here's the code....

(defvar bbits 8)
(defvar right -7)
(defvar nbrr (/ 3.0))
(defvar str t)
(defmacro i* (i j) `(the fixnum
			 (ash (* (the fixnum ,i)
				 (the fixnum ,j))
			      (the fixnum right))))

(defun foo ()
  (let ((inbr (floor (* nbrr (ash 1 (1- bbits)))))
	(result nil))
    (timeit ()
      (setf result (i* inbr inbr)))
    (format str "   ~3d bits  result: ~10,7f ~12A; ~12A inbr ~d"
	    bbits (* result (/ 1.0 (ash 1 (1- bbits)))) 
            (type-of result) (type-of inbr) inbr)))
  
(defun foobar ()
  (do ((bbits 4 (+ bbits  4)))
      ((= bbits 60))
    (setf right (- 1 bbits))
    (foo))
  (format str "~%floating point... ")
  (let ((nbrw nbrr)
	(result nil))
    (timeit ()
      (setf result (* nbrw nbrw)))
    (format str "             result: ~10,7f ~12A; ~12A nbr ~20,17f" 
            result (type-of result) (type-of nbrw) nbrw)))

When I invoke foobar, here's the output....

CPU: 18.0 us     4 bits  result:  0.0000000 FIXNUM      ; FIXNUM       inbr 2
CPU: 19.0 us     8 bits  result:  0.1015625 FIXNUM      ; FIXNUM       inbr 42
CPU: 18.0 us    12 bits  result:  0.1108398 FIXNUM      ; FIXNUM       inbr 682
CPU: 56.0 us    16 bits  result:  0.1110840 FIXNUM      ; FIXNUM       inbr 10922
CPU: 60.0 us    20 bits  result:  0.1111088 FIXNUM      ; FIXNUM       inbr 174762
CPU: 62.0 us    24 bits  result:  0.1111110 FIXNUM      ; FIXNUM       inbr 2796202
CPU: 67.0 us    28 bits  result:  0.1111111 FIXNUM      ; BIGNUM       inbr 44739244
CPU: 54.0 us    32 bits  result:  0.1111111 BIGNUM      ; BIGNUM       inbr 715827904
CPU: 91.0 us    36 bits  result:  0.1111111 BIGNUM      ; BIGNUM       inbr 11453246464
CPU: 90.0 us    40 bits  result:  0.1111111 BIGNUM      ; BIGNUM       inbr 183251943424
CPU: 90.0 us    44 bits  result:  0.1111111 BIGNUM      ; BIGNUM       inbr 2932031094784
CPU: 89.0 us    48 bits  result:  0.1111111 BIGNUM      ; BIGNUM       inbr 46912497516544
CPU: 95.0 us    52 bits  result:  0.1111111 BIGNUM      ; BIGNUM       inbr 750599960264704
CPU: 95.0 us    56 bits  result:  0.1111111 BIGNUM      ; BIGNUM       inbr 12009599364235264
floating point... 
CPU: 25.0 us             result:  0.1111111 SINGLE-FLOAT; SINGLE-FLOAT nbr  0.33333334000000000

Thanks for your help.  _Ms
-- 
Michael Seibert         seibert@xn.ll.mit.edu        ll-xn!seibert

Rice@SUMEX-AIM.STANFORD.EDU (James Rice) (10/20/89)

@i[I] would expect that short float and fixnum
multiplication times should be very similar.  This is
because to do the short float multiple you simply multiply
the two mantissae and ADD the exponents.  The addition of
the exponents can be done in parallel to the multiple of
the mantissae and should be faster.  How many bits are
there in the mantissa of an Explorer short float?  If
there are fewer than in a fixnum then I would expect it to
be a little faster (modulo any mysterious IEEE behaviour
checking).  Floating point addition is the thing that
really screws you, since you have to slide the two
mantissae until the exponents match before you can perform
the addition.  All of those arithmetic shifts can be slow.




Rice.

(mind you they may simply have screwed up the ucode).

pf@islington-terrace.csc.ti.com (Paul Fuqua) (10/20/89)

    Date: Thursday, October 19, 1989  1:54pm (CDT)
    From: seibert at xn.ll.mit.edu  (seibert)

    (defmacro i* (i j) `(the fixnum
    			 (ash (* (the fixnum ,i)
    				 (the fixnum ,j))
    			      (the fixnum right))))
      
    (defun foobar ()
        ...

        (timeit ()
          (setf result (* nbrw nbrw)))

Note that I* does ASH as well as *, while the floating-point timing only
does *.  On my Explorer 2, that make the difference between 1.41 usec (*
only, fixnum) and 3.14 usec (* with ASH, fixnum).

If I change the I* to * (so I just do multiplication), I get times of
about 1.6 usec for fixnum-fixnum, 5.5 to 6.5 usec for fixnum-bignum, and
7.5 usec and up for bignum-bignum.  Single-float multiplication is about
8.6 usec.  (I should probably point out that these aren't official
timings, just me hacking on my Explorer, which is old enough to have a
white exterior and contains a processor board of uncertain revision.)

    Date: Thursday, October 19, 1989  2:35pm (CDT)
    From: James Rice <Rice at sumex-aim.stanford.edu>
    
    @i[I] would expect that short float and fixnum
    multiplication times should be very similar.

I wouldn't, but that's because I know the microcode.  I think
small-floats are still decomposed into the same internal representation
as single-floats -- microcode space was (and is) a bit tight on Explorer
1s.

					      The addition of
    the exponents can be done in parallel to the multiple of
    the mantissae and should be faster.

If you had a parallel functional unit to do it with.  Multiplication on
the Explorer is done with multiply-step microinstructions (Booth's
algorithm) -- 1 bit at a time for the Explorer 1, 2 for the Explorer 2.

		Floating point addition is the thing that
    really screws you, since you have to slide the two
    mantissae until the exponents match before you can perform
    the addition.  All of those arithmetic shifts can be slow.

Especially on a machine whose hardware supports LDB/DPB more than
shifts.  I think the Explorer 2 has some normalisation hardware support,
but I was getting out of the microcode business by the time Release 3
and the Explorer 2 came out.

Paul Fuqua                     pf@csc.ti.com
                               {smu,texsun,cs.utexas.edu,rice}!ti-csl!pf
Texas Instruments Computer Science Center
PO Box 655474 MS 238, Dallas, Texas 75265