cet1@cl.cam.ac.uk (C.E. Thompson) (06/13/90)
In article <39319@mips.mips.COM> mash@mips.COM (John Mashey) writes: > >Specifically, most RISC designers, after studying many programs, decided >that integer multiply (and especially divide) were used less frequently >than many other operations, and there is substantial data that backs this >up from many vendors. RISC designers, depending on the benchmarks used, >and amount of silicon available, allocated various amounts of silicon to >support these operations, from zero up. The SPARC designers included >a Multiply-Step, and no Divide-Step (i.e., divides are done by fair-sized >hunk of code); HP PA included M-S and D-S; MIPS & 88K included both >integer mult & divide in hardware, etc. However, for example, a typical >integer divide on a MIPS takes about 35 cycles.... and probably about >the same on a typical CISC. > I have never been able to understand why the SPARC multiply-step instruction was included at all. It only delivers a puny one bit per cycle, which makes it quite hard find to cases where it is the right way of doing a multiply (or at any rate, cases when it is *faster* than alternative code). Compilers generating code for multiplication by small constants will do better using addition/shift chains; and when you need a general 32x32 bit multiply you can do better with difference-of-squares and lookup tables than with a chain of 32 multiply-step instructions. I suppose that the multiply-step is conceptually simpler than these techniques, but the absence of a divide-step instruction makes it difficult to believe that this is the rationale. Chris Thompson JANET: cet1@uk.ac.cam.phx Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk
pcg@cs.aber.ac.uk (Piercarlo Grandi) (06/28/90)
In article <1990Jun13.150100.28445@cl.cam.ac.uk> cet1@cl.cam.ac.uk (C.E. Thompson) writes: In article <39319@mips.mips.COM> mash@mips.COM (John Mashey) writes: > >Specifically, most RISC designers, after studying many programs, decided >that integer multiply (and especially divide) were used less frequently >than many other operations, and there is substantial data that backs this >up from many vendors. So far, fully agreed. You can staticize a lot of multiplies and divides. >RISC designers, depending on the benchmarks used, Ahhhh. Here we start seeing something disagreeable. My architectural principle is not just to optimize average speed, but also its variance, i.e. to ensure that there are no pathological cases. Otherwise we get typical "Bill Joy" style things, designs that are quite good for 80% of the cases and miserable for the remaining 20% (famous cases: large filesystem block sizes in BSD, 8 full contexts cache in SUNs). >and amount of silicon available, allocated various amounts of silicon to >support these operations, from zero up. But, but. Should we leave out architectural features just because our first *implementation* has a very low transistor budget? Or shouldn't we make sure that when and if the transistor budget increases we can take advantage of it? More to this point: is having e.g. a large register file, which (arguably) increases average performance for average programs more or less desirable than having hardware support for multiply and divide, which avoids pathologically bad performance for some relatively rare but "important" application areas? This is a very interesting question, and one that deserves some thought. I think that designing an *architecture* should not overly constrained by short term considerations. Consider Multics: the first machine did not have hardware ring support, but it was designed in nonetheless, because they knew it was coming. An architecture is something that *always* is implemented partly in software and partly in hardware; technology dictates where the implementation will put the boundary. It looks to me very unwise to design an architecture that presupposes a rigid boundary. >The SPARC designers included >a Multiply-Step, and no Divide-Step (i.e., divides are done by fair-sized >hunk of code); I have never been able to understand why the SPARC multiply-step instruction was included at all. Fig-leaf? :-). More seriously: my tired eyes may have betrayed me, but I seem to have seen full multiplication and division instructions in some recent SPARC opcode writeup (as(1) manual for 4.1 SunOS). Probably SUN have decided to reserve these, and we will soon see a SPARC with a bit more transistors. Too bad they did not do it before. A lot of fixup engineering has been proposed in this newgroup for working around having implementations that may or not have hardware instructions for this (shared libs seems to have been the winner). Maybe a look at extracodes and the PDP-10 and many other systems that had the same problem could have been beneficial... Note: I like SPARC. Not especially because of its technical merits, but because as far as I understand it, it is the only recent chip architecture that is (widely) second sourced; even better, it is second sourced not just in fabrication, but in implementation as well. Does MIPS license the architecture and implementation to their various sources, or do they just use them as fabrication facilities? -- Piercarlo "Peter" Grandi | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk