bs@linus.UUCP (Robert D. Silverman) (06/21/89)
The MIPS processor has a multiply instruction which will produce a double length product from two 32 bit words, storing the result in the special LO and HI registers. However, there is no instruction that will divide that double length product by a 32 bit word, giving a 32 bit quotient and/or remainder. One can, of couse, program a nitty gritty double precision divide routine. However, I don't know the MIPS assembler well enough to be confident that I would write an efficient routine. Does anyone have a routine that will take a double length product out of the LO/HI register pair and divide it by a 32 bit (unsigned) integer, yielding a quotient and/or remainder? This, in my opinion is one of the major faults of RISC processors. They do not provide basic arithmetic instructions. Note that the above operation is trivial on many microprocessors. I have routines which do it well for the 68020/68030, 80286/80386, NSC 320000, VAX, etc. Bob Silverman
henry@utzoo.uucp (Henry Spencer) (06/25/89)
In article <57125@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >This, in my opinion is one of the major faults of RISC processors. They >do not provide basic arithmetic instructions. When the list of "basic" arithmetic instructions is pages long, one starts to wonder how many of them are really "basic". The instruction you ask for -- divide double length by single length yielding single-length result -- is not exactly frequently needed. Just how much silicon is it worth to make it run faster than an implementation as a subroutine? -- NASA is to spaceflight as the | Henry Spencer at U of Toronto Zoology US government is to freedom. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
mash@mips.COM (John Mashey) (06/25/89)
In article <1989Jun24.230056.27774@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <57125@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >>This, in my opinion is one of the major faults of RISC processors. They >>do not provide basic arithmetic instructions. >When the list of "basic" arithmetic instructions is pages long, one starts >to wonder how many of them are really "basic". The instruction you ask >for -- divide double length by single length yielding single-length result -- >is not exactly frequently needed. Just how much silicon is it worth to make >it run faster than an implementation as a subroutine? Actually, for many languages, on 32-bit machines, use of 32-bit divisor and 64-bit dividend is often worse than useless, especially if it's the only one you have. For example, consider S/360 & followons: 1) The divisor is 64-bits, and must occupy an even-odd register pair. 2) The divide leaves remainder in the even and quotient in the odd. 1) If the dividend (which naturally starts as 32-bits in most cases) is already in an even register (R) is a value not needed any more has an odd-register partner that isn't need any more then you need one instruction, a shift-right-double algebraic (SRDA) to prepare the divisor, i.e., to get: SRDA R,32 DR R,divisor LR result,R (reminder, or R+1 for quotient) 2) If the dividend is in an odd register has a value that should be retained because it's useful later, i.e., J = I/7... use of I has an odd-register partner whose value should be retained then you probably end up generating usage of a scratch register pair: LR RTEMP,R SRDA RTEMP,32 DR RTEMP,divisor LR result,RTEMP (or RTEMP+1, dep. on d 3) Needless to say, the double-register stuff is highly disliked by optimizing compiler writers, as it disrupts the regularity of some of the algorithms; as a result, many compilers end up having to use version 2), at least some of the time. (maybe people current on S/360 architectures might say how often? I'm too rusty to know.) 4) The MIPS sequence is basically what's done by 2) above: div R,divisor ..... other instructions, if compiler can find some and zero-divide check if needed mfhi result (or mflo, to get quotient or remainder) 5) Now, none of this is a terrible deal, as divides aren't that frequent, which is why some RISC architectures omit them entirely. [It depends on which benchmarks you chose whether you think this is good or not.] But note that something that hardware designers might (or might not) think was helping the compiler folks was really getting in their road.... 6) When we were doing this, we couldn't think of any languages whose natural implementation on a 32-bit machine would expect generation of a 64 by 32 bit divide as the expected operation. Does anybody know of any such, or is thsi strictly something one would use in handcoded routines? -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
slackey@bbn.com (Stan Lackey) (06/26/89)
In article <1989Jun24.230056.27774@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <57125@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >>This, in my opinion is one of the major faults of RISC processors. They >>do not provide basic arithmetic instructions. > >When the list of "basic" arithmetic instructions is pages long, one starts >to wonder how many of them are really "basic". The instruction you ask >for -- divide double length by single length yielding single-length result -- >is not exactly frequently needed. Just how much silicon is it worth to make >it run faster than an implementation as a subroutine? The RISC suppliers would like us (users) to believe that RISC is ALWAYS faster. I can't help thinking back to when Patterson's first RISC papers came out. He managed to tune the code on his RISC until all of the benchmarks ran faster than the VAX 11/780, except for one, which happened to exercise one of the microcoded character string instructions a lot. The moral of the story: If your application needs mul and div performance, get a micro with hardware support for them. If that means using a CISC, so be it. BTW: divide-double-by-single-yielding-single is the one the 68000 included. Is there a chicken-and-egg situation here? -Stan
culberts@hpccc.HP.COM (Bruce Culbertson) (06/27/89)
mash@mips.COM (John Mashey) writes: > 6) When we were doing this, we couldn't think of any languages whose > natural implementation on a 32-bit machine would expect generation > of a 64 by 32 bit divide as the expected operation. Does anybody > know of any such, or is thsi strictly something one would use in > handcoded routines? A frequently used application that occurs to me is conversion of floating point numbers to text. A mantissa of more than 32 bits must be divided by 10 repeatedly to convert it from base 2 to base 10. Bruce Culbertson culberts@hplabs.hp.com
bs@linus.UUCP (Robert D. Silverman) (06/27/89)
In article <41957@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: :In article <1989Jun24.230056.27774@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: :>In article <57125@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: :>>This, in my opinion is one of the major faults of RISC processors. They :>>do not provide basic arithmetic instructions. :> :>When the list of "basic" arithmetic instructions is pages long, one starts :>to wonder how many of them are really "basic". The instruction you ask :>for -- divide double length by single length yielding single-length result -- :>is not exactly frequently needed. Just how much silicon is it worth to make :>it run faster than an implementation as a subroutine? : :The RISC suppliers would like us (users) to believe that RISC is :ALWAYS faster. I can't help thinking back to when Patterson's first :RISC papers came out. He managed to tune the code on his RISC until :all of the benchmarks ran faster than the VAX 11/780, except for one, :which happened to exercise one of the microcoded character string :instructions a lot. : :The moral of the story: If your application needs mul and div :performance, get a micro with hardware support for them. If that :means using a CISC, so be it. : :BTW: divide-double-by-single-yielding-single is the one the 68000 :included. Is there a chicken-and-egg situation here? No, to the last question. Such an instruction is required to compute AB/C or AB MOD C exactly, when AB can overflow and C > A,B. The problem with CISC is that no one is likely to come out with a 40MIP CISC in the near future [e.g. i860]. I simply contend that operations like multiplication and division should be fundamental to ANY computer. I also include add, subtract and add/subtract with carry. How else would you compute AB/C or AB % C exactly? Note that I've also seen some pretty arcane instructions on so called RISC processors. The i860 even has a custom purpose piece of real estate on the chip to support special graphics operations. However, it can't multiply or divide!!!! This is RISC? Bob Silverman :-Stan
firth@sei.cmu.edu (Robert Firth) (06/27/89)
In article <22218@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >6) When we were doing this, we couldn't think of any languages whose >natural implementation on a 32-bit machine would expect generation >of a 64 by 32 bit divide as the expected operation. Does anybody >know of any such, or is thsi strictly something one would use in >handcoded routines? Nope, I don't know of any high-level language either that requires 64/32 => 32,32 division. For expressions such as (a*b)/c, all 32 bit, some languages permit the compiler to perform the rule-of-three operation, ie hold the intermediate product in 64 bits. Ada is one such. But no language requires it. And, of course, the operation is pretty useless for true 64-bit arithmetic since there you want a 64-bit quotient. Performing full 64/32 => 64,32 on an MC68020, for instance, is pretty tedious even with the extended division operations, since one has to correct signs and remainders manually.
scarter@gryphon.COM (Scott Carter) (06/28/89)
In article <22218@winchester.mips.COM> mash@mips.COM (John Mashey) writes: >In article <1989Jun24.230056.27774@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >>In article <57125@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >>>This, in my opinion is one of the major faults of RISC processors. They >>When the list of "basic" arithmetic instructions is pages long, one starts >>to wonder how many of them are really "basic". The instruction you ask >>for -- divide double length by single length yielding single-length result -- >>is not exactly frequently needed. Just how much silicon is it worth to make >>it run faster than an implementation as a subroutine? Given the SRT divide algorithm we used (in the MD484 GaAs micro) the extra silicon is rather trivial. I don't have exact figures, but I would guess <100-200 devices, probably much less. Guy who designed that part of the CPU is not available at the moment or I would give more precise numbers. Performance difference, IF you need the operation frequently (see below) is enormous, i.e 35 cycles vs hundreds. >6) When we were doing this, we couldn't think of any languages whose >natural implementation on a 32-bit machine would expect generation >of a 64 by 32 bit divide as the expected operation. Does anybody >know of any such, or is thsi strictly something one would use in >handcoded routines? 64 divide by 32 giving 32 bit result occurs rather frequently when you are using fixed point numbers whose "engineering unit" representation is not a binary radix. This can be expressed directly in Ada by type foo is delta 0.001 range 0.0 .. 100.0; for foo'SMALL use 0.001; (There was a way to do this in JOVIAL too but I forget what it was) This sort of thing shows up in e.g. missile guidance (not a very high- volume application :) ) a great deal. Some signal processing functions use it too (don't ask me - I don't have the compartment to know). Also, if your integer multiply is significantly faster than your floating multiply (integer multiply giving 64-bit product, that is), there can be some nice performance advantage to using (binary radix!!) fixed point rather than floating point. When this is the case we see enough fixed- point divides to make the extra silicon worth it. Scott Carter McDonnell Douglas Electronics Systems Co. (714)-896-3097 using CTS until our Usenet connection arrives
slackey@bbn.com (Stan Lackey) (06/29/89)
In article <57725@linus.UUCP> bs@gauss.UUCP (Robert D. Silverman) writes: >In article <41957@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >:The RISC suppliers would like us (users) to believe that RISC is >:ALWAYS faster. >:The moral of the story: If your application needs mul and div >:performance, get a micro with hardware support for them. If that >:means using a CISC, so be it. >The problem with CISC is that no one is likely to come out with a 40MIP >CISC in the near future [e.g. i860]. I believe that now that the transistors/chip is high enough, you WILL see CISC's that perform equal to or greater than RISC's of the same technology. Wait and see. I'll wait and see if there is really ever an i860 that delivers a consistent 40MIPS across a broad range of applications. >I also include add, subtract and add/subtract with carry. How else >would you compute AB/C or AB % C exactly? This is my point - it's OK to have slower cycles, if each cycle does more work. If it takes a RISC 100 20ns cycles to do the operation you need, and the CISC 10 50ns cycles, the CISC is faster. Microcoded transcendentals are an example. -Stan
bb@tetons.UUCP (Bob Blau) (06/30/89)
In article <22218@winchester.mips.COM>, mash@mips.COM (John Mashey) writes: >In article <1989Jun24.230056.27774@utzoo.uucp> Henry Spencer writes: >>When the list of "basic" arithmetic instructions is pages long, one starts >>to wonder how many of them are really "basic". The instruction you ask >>for -- divide double length by single length yielding single-length result - >>is not exactly frequently needed. Just how much silicon is it worth to make >>it run faster than an implementation as a subroutine? > > Actually, for many languages, on 32-bit machines, use of 32-bit > divisor and 64-bit dividend is often worse than useless, especially > if it's the only one you have. For example, consider S/360 & followons: > 1) The divisor is 64-bits, and must occupy an even-odd register > pair. > 2) The divide leaves remainder in the even and quotient in the odd. > ... > you probably end up generating usage of a scratch register pair: > LR RTEMP,R > SRDA RTEMP,32 > DR RTEMP,divisor > LR result,RTEMP > ... > 5) Now, none of this is a terrible deal, as divides aren't that frequent, I dug up some ancient (10 year old) instruction mix statistics from my bookshelf which back up the observation that integer divides aren't too common in the 370 environment. I didn't look at actual compiler output to see the relationship between DR and SRDA, but it seems that the FORTRAN compiler used the D instruction (with the dividend in memory) more often than DR. This makes sense, since a memory access out of cache can be considerably faster than a double word arithmetic shift right (SRDA). Note that even two Load Registers (LR's) and a single word arithmetic shift right (SRA) can be faster than an SRDA depending on the implementation. These numbers are the mean instruction frequency out of traces of over 15,000,000 instructions. DR SRDA D Commercial COBOL Mix .00003 .00007 .00000 Scientific FORTRAN Mix .00016 .00115 .00071 -Bob (Std. disclaimers apply) -- Bob Blau Amdahl Corporation 143 N. 2 E., Rexburg, Idaho 83440 UUCP:{ames,decwrl,uunet}!amdahl!tetons!bb (208) 356-8915 INTERNET: bb@tetons.idaho.amdahl.com
bb@tetons.UUCP (Bob Blau) (07/01/89)
In article <812@tetons.UUCP>, I write: > ... it seems that the FORTRAN compiler > used the D instruction (with the dividend in memory) more often than DR. > This makes sense, since a memory access out of cache can be considerably > faster than a double word arithmetic shift right (SRDA). ... ack - This is irrelevant since both the D and DR insructions have the dividend in an even odd register pair, so you would need the SRDA in either case - sorry for the misinformation. More to the point perhaps is that the SRDA time is small compared to the divide time, so the overhead is not too significant, especially given the rarity of the operation. -Bob -- Bob Blau Amdahl Corporation 143 N. 2 E., Rexburg, Idaho 83440 UUCP:{ames,decwrl,uunet}!amdahl!tetons!bb (208) 356-8915 INTERNET: bb@tetons.idaho.amdahl.com