bs@linus.UUCP (Robert D. Silverman) (06/21/89)
The MIPS processor has a multiply instruction which will produce a double length product from two 32 bit words, storing the result in the special LO and HI registers. However, there is no instruction that will divide that double length product by a 32 bit word, giving a 32 bit quotient and/or remainder. One can, of couse, program a nitty gritty double precision divide routine. However, I don't know the MIPS assembler well enough to be confident that I would write an efficient routine. Does anyone have a routine that will take a double length product out of the LO/HI register pair and divide it by a 32 bit (unsigned) integer, yielding a quotient and/or remainder? This, in my opinion is one of the major faults of RISC processors. They do not provide basic arithmetic instructions. Note that the above operation is trivial on many microprocessors. I have routines which do it well for the 68020/68030, 80286/80386, NSC 320000, VAX, etc. Bob Silverman
henry@utzoo.uucp (Henry Spencer) (06/25/89)
In article <57125@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >This, in my opinion is one of the major faults of RISC processors. They >do not provide basic arithmetic instructions. When the list of "basic" arithmetic instructions is pages long, one starts to wonder how many of them are really "basic". The instruction you ask for -- divide double length by single length yielding single-length result -- is not exactly frequently needed. Just how much silicon is it worth to make it run faster than an implementation as a subroutine? -- NASA is to spaceflight as the | Henry Spencer at U of Toronto Zoology US government is to freedom. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
mash@mips.COM (John Mashey) (06/25/89)
In article <1989Jun24.230056.27774@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <57125@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >>This, in my opinion is one of the major faults of RISC processors. They >>do not provide basic arithmetic instructions. >When the list of "basic" arithmetic instructions is pages long, one starts >to wonder how many of them are really "basic". The instruction you ask >for -- divide double length by single length yielding single-length result -- >is not exactly frequently needed. Just how much silicon is it worth to make >it run faster than an implementation as a subroutine? Actually, for many languages, on 32-bit machines, use of 32-bit divisor and 64-bit dividend is often worse than useless, especially if it's the only one you have. For example, consider S/360 & followons: 1) The divisor is 64-bits, and must occupy an even-odd register pair. 2) The divide leaves remainder in the even and quotient in the odd. 1) If the dividend (which naturally starts as 32-bits in most cases) is already in an even register (R) is a value not needed any more has an odd-register partner that isn't need any more then you need one instruction, a shift-right-double algebraic (SRDA) to prepare the divisor, i.e., to get: SRDA R,32 DR R,divisor LR result,R (reminder, or R+1 for quotient) 2) If the dividend is in an odd register has a value that should be retained because it's useful later, i.e., J = I/7... use of I has an odd-register partner whose value should be retained then you probably end up generating usage of a scratch register pair: LR RTEMP,R SRDA RTEMP,32 DR RTEMP,divisor LR result,RTEMP (or RTEMP+1, dep. on d 3) Needless to say, the double-register stuff is highly disliked by optimizing compiler writers, as it disrupts the regularity of some of the algorithms; as a result, many compilers end up having to use version 2), at least some of the time. (maybe people current on S/360 architectures might say how often? I'm too rusty to know.) 4) The MIPS sequence is basically what's done by 2) above: div R,divisor ..... other instructions, if compiler can find some and zero-divide check if needed mfhi result (or mflo, to get quotient or remainder) 5) Now, none of this is a terrible deal, as divides aren't that frequent, which is why some RISC architectures omit them entirely. [It depends on which benchmarks you chose whether you think this is good or not.] But note that something that hardware designers might (or might not) think was helping the compiler folks was really getting in their road.... 6) When we were doing this, we couldn't think of any languages whose natural implementation on a 32-bit machine would expect generation of a 64 by 32 bit divide as the expected operation. Does anybody know of any such, or is thsi strictly something one would use in handcoded routines? -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
slackey@bbn.com (Stan Lackey) (06/26/89)
In article <1989Jun24.230056.27774@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <57125@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >>This, in my opinion is one of the major faults of RISC processors. They >>do not provide basic arithmetic instructions. > >When the list of "basic" arithmetic instructions is pages long, one starts >to wonder how many of them are really "basic". The instruction you ask >for -- divide double length by single length yielding single-length result -- >is not exactly frequently needed. Just how much silicon is it worth to make >it run faster than an implementation as a subroutine? The RISC suppliers would like us (users) to believe that RISC is ALWAYS faster. I can't help thinking back to when Patterson's first RISC papers came out. He managed to tune the code on his RISC until all of the benchmarks ran faster than the VAX 11/780, except for one, which happened to exercise one of the microcoded character string instructions a lot. The moral of the story: If your application needs mul and div performance, get a micro with hardware support for them. If that means using a CISC, so be it. BTW: divide-double-by-single-yielding-single is the one the 68000 included. Is there a chicken-and-egg situation here? -Stan
bs@linus.UUCP (Robert D. Silverman) (06/27/89)
In article <41957@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: :In article <1989Jun24.230056.27774@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: :>In article <57125@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: :>>This, in my opinion is one of the major faults of RISC processors. They :>>do not provide basic arithmetic instructions. :> :>When the list of "basic" arithmetic instructions is pages long, one starts :>to wonder how many of them are really "basic". The instruction you ask :>for -- divide double length by single length yielding single-length result -- :>is not exactly frequently needed. Just how much silicon is it worth to make :>it run faster than an implementation as a subroutine? : :The RISC suppliers would like us (users) to believe that RISC is :ALWAYS faster. I can't help thinking back to when Patterson's first :RISC papers came out. He managed to tune the code on his RISC until :all of the benchmarks ran faster than the VAX 11/780, except for one, :which happened to exercise one of the microcoded character string :instructions a lot. : :The moral of the story: If your application needs mul and div :performance, get a micro with hardware support for them. If that :means using a CISC, so be it. : :BTW: divide-double-by-single-yielding-single is the one the 68000 :included. Is there a chicken-and-egg situation here? No, to the last question. Such an instruction is required to compute AB/C or AB MOD C exactly, when AB can overflow and C > A,B. The problem with CISC is that no one is likely to come out with a 40MIP CISC in the near future [e.g. i860]. I simply contend that operations like multiplication and division should be fundamental to ANY computer. I also include add, subtract and add/subtract with carry. How else would you compute AB/C or AB % C exactly? Note that I've also seen some pretty arcane instructions on so called RISC processors. The i860 even has a custom purpose piece of real estate on the chip to support special graphics operations. However, it can't multiply or divide!!!! This is RISC? Bob Silverman :-Stan
firth@sei.cmu.edu (Robert Firth) (06/27/89)
In article <22218@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >6) When we were doing this, we couldn't think of any languages whose >natural implementation on a 32-bit machine would expect generation >of a 64 by 32 bit divide as the expected operation. Does anybody >know of any such, or is thsi strictly something one would use in >handcoded routines? Nope, I don't know of any high-level language either that requires 64/32 => 32,32 division. For expressions such as (a*b)/c, all 32 bit, some languages permit the compiler to perform the rule-of-three operation, ie hold the intermediate product in 64 bits. Ada is one such. But no language requires it. And, of course, the operation is pretty useless for true 64-bit arithmetic since there you want a 64-bit quotient. Performing full 64/32 => 64,32 on an MC68020, for instance, is pretty tedious even with the extended division operations, since one has to correct signs and remainders manually.
slackey@bbn.com (Stan Lackey) (06/29/89)
In article <57725@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >In article <41957@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >:The RISC suppliers would like us (users) to believe that RISC is >:ALWAYS faster. >:The moral of the story: If your application needs mul and div >:performance, get a micro with hardware support for them. If that >:means using a CISC, so be it. >The problem with CISC is that no one is likely to come out with a 40MIP >CISC in the near future [e.g. i860]. I believe that now that the transistors/chip is high enough, you WILL see CISC's that perform equal to or greater than RISC's of the same technology. Wait and see. I'll wait and see if there is really ever an i860 that delivers a consistent 40MIPS across a broad range of applications. >I also include add, subtract and add/subtract with carry. How else >would you compute AB/C or AB % C exactly? This is my point - it's OK to have slower cycles, if each cycle does more work. If it takes a RISC 100 20ns cycles to do the operation you need, and the CISC 10 50ns cycles, the CISC is faster. Microcoded transcendentals are an example. -Stan