[comp.arch] SPARC multiply-step instruction

cet1@cl.cam.ac.uk (C.E. Thompson) (06/13/90)

In article <39319@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>
>Specifically, most RISC designers, after studying many programs, decided
>that integer multiply (and especially divide) were used less frequently
>than many other operations, and there is substantial data that backs this
>up from many vendors.  RISC designers, depending on the benchmarks used,
>and amount of silicon available, allocated various amounts of silicon to
>support these operations, from zero up. The SPARC designers included
>a Multiply-Step, and no Divide-Step (i.e., divides are done by fair-sized
>hunk of code); HP PA included M-S and D-S; MIPS & 88K included both
>integer mult & divide in hardware, etc.  However, for example, a typical
>integer divide on a MIPS takes about 35 cycles.... and probably about
>the same on a typical CISC.
>
I have never been able to understand why the SPARC multiply-step instruction
was included at all. It only delivers a puny one bit per cycle, which makes
it quite hard find to cases where it is the right way of doing a multiply
(or at any rate, cases when it is *faster* than alternative code). 
Compilers generating code for multiplication by small constants will do  
better using addition/shift chains; and when you need a general 32x32 bit
multiply you can do better with difference-of-squares and lookup tables
than with a chain of 32 multiply-step instructions. I suppose that the
multiply-step is conceptually simpler than these techniques, but the
absence of a divide-step instruction makes it difficult to believe that
this is the rationale.

Chris Thompson
JANET:    cet1@uk.ac.cam.phx
Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk

pcg@cs.aber.ac.uk (Piercarlo Grandi) (06/28/90)

In article <1990Jun13.150100.28445@cl.cam.ac.uk> cet1@cl.cam.ac.uk (C.E.
Thompson) writes:

   In article <39319@mips.mips.COM> mash@mips.COM (John Mashey) writes:
   >
   >Specifically, most RISC designers, after studying many programs, decided
   >that integer multiply (and especially divide) were used less frequently
   >than many other operations, and there is substantial data that backs this
   >up from many vendors.

So far, fully agreed. You can staticize a lot of multiplies and divides.

   >RISC designers, depending on the benchmarks used,

Ahhhh. Here we start seeing something disagreeable. My architectural
principle is not just to optimize average speed, but also its variance,
i.e. to ensure that there are no pathological cases. Otherwise we get
typical "Bill Joy" style things, designs that are quite good for 80% of
the cases and miserable for the remaining 20% (famous cases: large
filesystem block sizes in BSD, 8 full contexts cache in SUNs).

   >and amount of silicon available, allocated various amounts of silicon to
   >support these operations, from zero up.

But, but. Should we leave out architectural features just because our
first *implementation* has a very low transistor budget? Or shouldn't we
make sure that when and if the transistor budget increases we can take
advantage of it?

More to this point: is having e.g. a large register file, which
(arguably) increases average performance for average programs more or
less desirable than having hardware support for multiply and divide,
which avoids pathologically bad performance for some relatively rare but
"important" application areas? This is a very interesting question, and
one that deserves some thought.

I think that designing an *architecture* should not overly constrained
by short term considerations. Consider Multics: the first machine did
not have hardware ring support, but it was designed in nonetheless,
because they knew it was coming.

An architecture is something that *always* is implemented partly in
software and partly in hardware; technology dictates where the
implementation will put the boundary. It looks to me very unwise to
design an architecture that presupposes a rigid boundary.

   >The SPARC designers included
   >a Multiply-Step, and no Divide-Step (i.e., divides are done by fair-sized
   >hunk of code);

   I have never been able to understand why the SPARC multiply-step instruction
   was included at all.

Fig-leaf? :-).

More seriously: my tired eyes may have betrayed me, but I seem to have
seen full multiplication and division instructions in some recent SPARC
opcode writeup (as(1) manual for 4.1 SunOS). Probably SUN have decided
to reserve these, and we will soon see a SPARC with a bit more
transistors.

Too bad they did not do it before. A lot of fixup engineering has been
proposed in this newgroup for working around having implementations that
may or not have hardware instructions for this (shared libs seems to
have been the winner). Maybe a look at extracodes and the PDP-10 and
many other systems that had the same problem could have been
beneficial...


Note: I like SPARC. Not especially because of its technical merits, but
because as far as I understand it, it is the only recent chip
architecture that is (widely) second sourced; even better, it is second
sourced not just in fabrication, but in implementation as well. Does
MIPS license the architecture and implementation to their various
sources, or do they just use them as fabrication facilities?
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk