amos@taux01.nsc.com (Amos Shapir) (09/06/90)
[Quoted from the referenced article by jgk@osc.COM (Joe Keane)] >I have to agree that 128-bit floating point isn't really such a hot idea. >When you get right down to it, floating point is a hack. It's a very useful >hack; i won't argue with that. We admit 64-bit floating point doesn't work, >so what do we do? We provide more of the same. It works for most uses, and "more of the same" really is an improvement if it's done right. The point is, there isn't yet any better idea that works as well. >There are a lot of machines out there that can do IEEE 64-bit floating point, >with all its precise rules and cases, but can't multiply two 32-bit integers >in a reasonable way. What are we to make of this? It's just dumb. If they have a good FPU, using it for integer multiplication *is* a "reasonable way". Besides, a bad implementation doesn't prove anything about the basic idea of FP. >Our current programming languages have a strong influence. C has `float' and >`double' types, and most machines have single-precision and double-precision >floating point numbers. Coincidence? I think not. Certainly not, but you got it backwards - C has it because PDP-11 has it because FORTRAN has it because the early IBM machines had it... >I don't care if you have 65536-bit floating point, it's still floating point. >That means underflow, overflow, round-off error, loss of precision, need i >continue? ... >Another interesting area is hardware support for arbitrary-precision real >numbers. But even with arbitrary precision, at some point you have to start disposing of insignificant bits - and suffer the same side-effects. -- Amos Shapir amos@taux01.nsc.com, amos@nsc.nsc.com National Semiconductor (Israel) P.O.B. 3007, Herzlia 46104, Israel Tel. +972 52 522255 TWX: 33691, fax: +972-52-558322 GEO: 34 48 E / 32 10 N
bs@linus.mitre.org (Robert D. Silverman) (09/06/90)
In article <4513@taux01.nsc.com> amos@taux01.nsc.com (Amos Shapir) writes:
:[Quoted from the referenced article by jgk@osc.COM (Joe Keane)]
:>I have to agree that 128-bit floating point isn't really such a hot idea.
:>When you get right down to it, floating point is a hack. It's a very useful
:>hack; i won't argue with that. We admit 64-bit floating point doesn't work,
:>so what do we do? We provide more of the same.
:
:It works for most uses, and "more of the same" really is an improvement
:if it's done right. The point is, there isn't yet any better idea
:that works as well.
:
:>There are a lot of machines out there that can do IEEE 64-bit floating point,
:>with all its precise rules and cases, but can't multiply two 32-bit integers
:>in a reasonable way. What are we to make of this? It's just dumb.
:
:If they have a good FPU, using it for integer multiplication *is* a "reasonable
:way". Besides, a bad implementation doesn't prove anything about the basic
:idea of FP.
Having a good FPU just isn't good enough. Even with IEEE 64-bit, there are
only 53 bits of a mantissa. So just how does one multiply two 32 bit integers
together using floating point? The answer is: one can't without losing bits.
(or by doing several multiplies on the low/high halves and combining things)
Futhermore, if I should need to multiply integers in this way, they must
first be converted to floating point, then the product must be converted back.
Can you say expensive???
The four basic operations of arithmetic are +, -, x, /. Any computer that
can't perform them on its atomic data units [whatever the word size is]
is a joke.
--
Bob Silverman
#include <std.disclaimer>
Mitre Corporation, Bedford, MA 01730
"You can lead a horse's ass to knowledge, but you can't make him think"
bruceh@sgi.com (Bruce R. Holloway) (09/07/90)
In article <119244@linus.mitre.org> bs@gauss.UUCP (Robert D. Silverman) writes: >In article <4513@taux01.nsc.com> amos@taux01.nsc.com (Amos Shapir) writes: >:[Quoted from the referenced article by jgk@osc.COM (Joe Keane)] >: >:>There are a lot of machines out there that can do IEEE 64-bit floating point, >:>with all its precise rules and cases, but can't multiply two 32-bit integers >:>in a reasonable way. What are we to make of this? It's just dumb. > >The four basic operations of arithmetic are +, -, x, /. Any computer that >can't perform them on its atomic data units [whatever the word size is] >is a joke. Where did you learn this? By looking at a calculator? What about sqrt()? In my line of work we use it a lot & know lots of ways to approximate it depending on what's available & how fast it needs to be. Still, there is a pencil & paper method that's exact & fairly easy to do in hardware. Perhaps I should say, any computer that can't do sqrt() is a joke! The basic operations are + and *. That's why there's an algebraic construct called a "field". The other two operations are just derivative ones that involve inverses. Actually, if you did statistics on numerical programs (maybe you have), you would find that divide occurs much less frequently than the others. So those guys that sell multiplier-accumulator chips are right, because even if it takes 10x longer to divide, it probably occurs 10x less frequently anyway. Obviously, any Turing machine can do all five of these operations. So the issue isn't can the machine do it, or does it have a machine language instruction to do it, or does that instruction happen in a single cycle. I think the real issue is that the programming languages don't do the job, because you can't multiply two ints & get a double precision result without writing a subroutine. The compiler is free to do +, *, and sqrt with a single instruction or with a subroutine, but the guy who wants a double precision product or to detect overflow after addition has to write a subroutine to do it.
rwallace@vax1.tcd.ie (09/08/90)
> :If they have a good FPU, using it for integer multiplication *is* a "reasonable > :way". Besides, a bad implementation doesn't prove anything about the basic > :idea of FP. > > Having a good FPU just isn't good enough. Even with IEEE 64-bit, there are > only 53 bits of a mantissa. So just how does one multiply two 32 bit integers > together using floating point? The answer is: one can't without losing bits. > (or by doing several multiplies on the low/high halves and combining things) > Futhermore, if I should need to multiply integers in this way, they must > first be converted to floating point, then the product must be converted back. > Can you say expensive??? > > The four basic operations of arithmetic are +, -, x, /. Any computer that > can't perform them on its atomic data units [whatever the word size is] > is a joke. First, why do you need 32 x 32 -> 64? OK in principle 32 x 32 can give a 64 bit answer but in practice 99% of the time you're going to be working in 32 bits all the way and you don't want 64 bit answers (which is why C has int x int -> int not int x int -> long). Second, there is a huge amount of processing done which depends on integer + and - and fp +, -, * and / being fast, so there is hardware support for these. There is practically no processing done which depends on integer * and / being fast (accessing an array of structures doesn't count because a smart compiler can use shifts and adds), and don't bother giving anecdotal cases because it's still less than 1% of the total. Therefore chip space was not wasted on making these fast. -- "To summarize the summary of the summary: people are a problem" Russell Wallace, Trinity College, Dublin rwallace@vax1.tcd.ie
knighten@Encore.com (Robert L. Knighten) (09/08/90)
In article <1990Sep6.224018.17701@odin.corp.sgi.com> bruceh@sgi.com (Bruce R. Holloway) writes: . . . Actually, if you did statistics on numerical programs (maybe you have), you would find that divide occurs much less frequently than the others. So those guys that sell multiplier-accumulator chips are right, because even if it takes 10x longer to divide, it probably occurs 10x less frequently anyway. . . . This is one of those "which comes first" issues I've discussed with quite a few people. When working on numerically intensive code that needs to be fast I have always avoided divide when possible because it is so slow (and getting slower relative to the other instructions!) This means that such things as rational function approximations are avoided. But why is divide so slow. The hardware architects with whom I have discussed this have said, "Sure, we could do a fast divide, but it would take a lot of gates and no one uses it." And no one uses it because it is slow and . . . Has anyone done careful studies of the actual value of a fast divide (integer or floating point)? -- Bob Knighten Encore Computer Corp., 257 Cedar Hill Street, Marlborough, MA 01752 Internet: knighten@encore.com (508) 460-0500 ext. 2626 uucp: {bu-cs,decvax,gould}!encore!knighten
dave@tygra.UUCP (David Conrad) (09/09/90)
In article <6837.26e7ee92@vax1.tcd.ie> rwallace@vax1.tcd.ie writes: }First, why do you need 32 x 32 -> 64? OK in principle 32 x 32 can give a 64 }bit answer but in practice 99% of the time you're going to be working in 32 }bits all the way and you don't want 64 bit answers (which is why C has }int x int -> int not int x int -> long). }-- }Russell Wallace, Trinity College, Dublin *My* C doesn't have and int * int = int. I know because the CPU doesn't support it. What it does is int * int = long, and then it casts the long back into an int if it'll fit. -- David "Salamander" Conrad | In Cows We Trust | This space dave@tygra.ddmi.com | E Pluribus Moo | for rent -- = CAT-TALK Conferencing Network, Prototype Computer Conferencing System = - 1-800-825-3069, 300/1200/2400/9600 baud, 8/N/1. New users use 'new' - = as a login id. <<Redistribution to GEnie PROHIBITED!!!>>> = E-MAIL Address: dave@ThunderCat.COM
peter@ficc.ferranti.com (Peter da Silva) (09/09/90)
In article <6837.26e7ee92@vax1.tcd.ie> rwallace@vax1.tcd.ie writes: > First, why do you need 32 x 32 -> 64? It sure makes arbitrary precision a whole lot easier. > why C has int x int -> int not int x int -> long. Which is irritating to the folks who need that operation, because it forces them to do a full 32x32 multiply instead of a 16x16->32. Which is why Forth has 16x16->32 built in. -- Peter da Silva. `-_-' +1 713 274 5180. 'U` peter@ferranti.com
amull@Morgan.COM (Andrew P. Mullhaupt) (09/09/90)
> There is practically no processing done which depends on integer * and / being > fast (accessing an array of structures doesn't count because a smart compiler > can use shifts and adds), and don't bother giving anecdotal cases because it's > still less than 1% of the total. Therefore chip space was not wasted on making > these fast. I won't give examples but you're wrong about the address arithmetic. Some machines (like i486, RS/6000) have integer multiplies and some (SPARC) do not. Now the compilers of the world can get rid of integer multiplies in address arithmetic _if_ they can figure out the sizes of the arrays. This isn't trivial when the array may be a formal parameter (i.e. the size may not be fixed). A great deal of code uses this kind of function argument. (Nearly all of numerical linear algebra, a great deal of optimization...) I have been pretty annoyed at how slow this stuff gets on the SPARC and pretty happy with how often my i486 home computer (running UNIX & DOS) beats the Sparcstations on identical code. However, Fast is relative in this matter. The RS/6000 has integer multiply but has an extremely fast set of floating point. This is really good unless the array indexing gets in the way of the FP instructions, which turns out to be a bit tricky. The thing probably needs to have an even faster integer multiply (and this probably requires some sort of superscalar capability) in order to get the best performance. The lack of an integer multiply on the SPARCs (I think some of them are supposed to have it but the ones I use don't...) is pretty bad. But it is compensated by the register windows which allow it to call functions extremely quickly. I think that the real estate wasn't so much thought to be wasted on integer multiply as better used for register windows. A machine for scientific computing should probably not go so far as to eliminate the integer multiply. As for integer divide, less of a case can be made, and I can do without it. Although integer quotient and remainder are probably computed simultaneously, some languages (like C) force one to get those results separately. If this wasn't the case, probably integer divides would take up 25% less of our time. Since this is comp.arch, I'll ask. How much chip would a 16x32 integer multiply take? You could use this for a lot of address arithmetic and build the full 32x32 multiply out of a few calls to this. In the same vein, an 8x32 might even be a profitable use of chip. How does this VLSI trade-off work out? Later, Andrew Mullhaupt
cik@l.cc.purdue.edu (Herman Rubin) (09/10/90)
In article <6837.26e7ee92@vax1.tcd.ie>, rwallace@vax1.tcd.ie writes: > > :If they have a good FPU, using it for integer multiplication *is* a "reasonable > > :way". Besides, a bad implementation doesn't prove anything about the basic > > :idea of FP. > > > > Having a good FPU just isn't good enough. Even with IEEE 64-bit, there are > > only 53 bits of a mantissa. So just how does one multiply two 32 bit integers > > together using floating point? The answer is: one can't without losing bits. > > (or by doing several multiplies on the low/high halves and combining things) > > Futhermore, if I should need to multiply integers in this way, they must > > first be converted to floating point, then the product must be converted back. > > Can you say expensive??? > > > > The four basic operations of arithmetic are +, -, x, /. Any computer that > > can't perform them on its atomic data units [whatever the word size is] > > is a joke. > > First, why do you need 32 x 32 -> 64? OK in principle 32 x 32 can give a 64 > bit answer but in practice 99% of the time you're going to be working in 32 > bits all the way and you don't want 64 bit answers (which is why C has > int x int -> int not int x int -> long). > > Second, there is a huge amount of processing done which depends on integer + > and - and fp +, -, * and / being fast, so there is hardware support for these. > There is practically no processing done which depends on integer * and / being > fast (accessing an array of structures doesn't count because a smart compiler > can use shifts and adds), and don't bother giving anecdotal cases because it's > still less than 1% of the total. Therefore chip space was not wasted on making > these fast. All early machines had only fixed point. Floating point hardware, no matter how it is done, is kludged fixed point. There are pre- and post- shifts, the necessary fixed point arithmetic is done, and the (computed in parallel) exponent is adjusted. From this standpoint floating point is not basic. I do not suggest that it be eliminated; it is too useful. We could make better kludges now, and unless it is in hardware, the current practice of packing the exponent and mantissa in one unit would be undesirable. But this does not address the multiprecision question. Much work does not get done if the tools are too clumsy. In doing multiple precision work, the number must be broken up into units on which the hardware can operate. If we only have 32x32 -> 32, are effectively limited to 16-bit units. A count of the number of instructions requires to do a 32x32 -> 64 shows how bad it is. Also, an algorithm designer may very well use an obviously clumsier algorithm which is not subject to these problems caused by the inadequacy of hardware. Look at the devices used to get multiples of pi and ln 2. Now suppose we have to do 160-bit arithmetic to do a floating-point operation to sufficient accuracy (3 times the current double precision). The intelligent thing to do, hardware allowing the operations, would be 5 32-bit pieces. If we are to struggle with the floating-point units, and the insistence on normalization would make it a struggle, we would have to use at most 26-bit units, and need 7 of them. Using 16 bit units would take ten. Not having units operated on being an easily used size makes addressing, etc., much more difficult. So to do multiple precision work reasonably easily, we must use 16 bit units. This is not what we got from the computers of old--there the unit was 35 bits plus sign, or more, and double length products were available, as well as double/single -> quotient and remainder. No, we are essentially in the early PC period as far as arithmetic. As to why these, and other things, are not in C, I presume that the founders of C did not think of them. In general, when a limited group designs something without getting ideas from outside, this happens. Algol was supposed to be a universal algorithmic language, and it does not have even the hardware operations of the machines of that time in it. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/10/90)
In article <389@tygra.UUCP> dave@tygra.UUCP (David Conrad) writes: \\\ >*My* C doesn't have and int * int = int. I know because the CPU doesn't >support it. What it does is int * int = long, and then it casts the >long back into an int if it'll fit. \\\ This is a rather strange idea; the last time I heard of a crock whereby the type of an expression can't be determined until runtime (which is what you imply, see below) was raising an integer to integral power in Algol -- if the exponent is <0 you get a real else you get an integer. As I recall Burroughs kludged this by representing integers and reals in a common format (not _too_ unusual, see the '87 chips) but they had the binary point ON THE RIGHT. The problem with int->int->long for (any?) of the common arithmetic ops is expressions like the following (this is typical of C, but other exprs can be fudged up in most of the modern langs): int x,y,z; ... x = (x==y ? x*x : y); (We can also produce all sorts of similar arguments based on the associativety and commutativity of *). Now, since we have an x*x inside the conditional, and for the sake of argument let's say the optimizer doesn't play with this code too much (or else I'll think up a reall _big_ example where it won't) then how do we deal with the type indeterminacy as you (and previous posts) have described it? Do we hang a bit of code near the x*x and _decide_ that we have a long and cast it to an int or do we simply _always_ cast x*x to an int? If the latter then we _really_ have implemented int x int->int. There are _some_ situations where this int x int -> long may be useful, e.g. long x; int y,z; x = y*z; We really get the entire product! Perhaps your compiler kicks in in only this (or similar) case? I _still_ don't like it; it isn't portable (there's a C standard for one thing). -Kym Horsell ===
cik@l.cc.purdue.edu (Herman Rubin) (09/10/90)
In article <1660@s6.Morgan.COM>, amull@Morgan.COM (Andrew P. Mullhaupt) writes: > > There is practically no processing done which depends on integer * and / being > > fast (accessing an array of structures doesn't count because a smart compiler > > can use shifts and adds), and don't bother giving anecdotal cases because it's > > still less than 1% of the total. Therefore chip space was not wasted on making > > these fast. ....................... > As for integer divide, less of a case can be made, and I can do without > it. Although integer quotient and remainder are probably computed > simultaneously, some languages (like C) force one to get those results > separately. If this wasn't the case, probably integer divides would > take up 25% less of our time. Maybe you do not see a need for it, but lots of people do; even floating divide with integer quotient and floating remainder. Very few languages have any provisions for simultaneous quotient and remainder. This is an example of an extremely foolish and unnecessary oversight; the great bulk of computers since computers were invented had the simultaneous integer production. > Since this is comp.arch, I'll ask. How much chip would a 16x32 integer > multiply take? You could use this for a lot of address arithmetic > and build the full 32x32 multiply out of a few calls to this. In the > same vein, an 8x32 might even be a profitable use of chip. How does > this VLSI trade-off work out? The floating point unit already has at least most of the hardware required. If the 53x53 floating point unit were slightly modified to give 64 bits of output, almost no additional chip area would be required. Floating point units still do their arithmetic as integer arithmetic, with front and back ends. -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (09/10/90)
In article <1660@s6.Morgan.COM>, amull@Morgan.COM (Andrew P. Mullhaupt) writes: > As for integer divide, less of a case can be made, and I can do without > it. Although integer quotient and remainder are probably computed > simultaneously, some languages (like C) force one to get those results > separately. If this wasn't the case, probably integer divides would > take up 25% less of our time. Actually ANSI C does provide a function which does just what you said, and a type (div_t) which contains the two values. And you can define it builtin if your implementation desires. -- bill davidsen (davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen) VMS is a text-only adventure game. If you win you can use unix.
crowl@cs.rochester.edu (Lawrence Crowl) (09/10/90)
In article <1660@s6.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes: >Some machines (like i486, RS/6000) have integer multiplies and some (SPARC) >do not. Now the compilers of the world can get rid of integer multiplies in >address arithmetic _if_ they can figure out the sizes of the arrays. This is incorrect. Compilers can get rid of integer multiplies (actually implement them as shifts and adds) when the sizes of the array ELEMENTS are known. >This isn't trivial when the array may be a formal parameter (i.e. the size >may not be fixed). A great deal of code uses this kind of function argument. >(Nearly all of numerical linear algebra, a great deal of optimization...) Nearly all code has fixed-size array elements. I've seen some memory allocation routine that took element size as a parameter, but have never seen any processing routines that did. -- Lawrence Crowl 716-275-9499 University of Rochester crowl@cs.rochester.edu Computer Science Department ...!{ames,rutgers}!rochester!crowl Rochester, New York, 14627
preston@titan.rice.edu (Preston Briggs) (09/11/90)
In article <1990Sep10.162839.19226@cs.rochester.edu> crowl@cs.rochester.edu (Lawrence Crowl) writes: >In article <1660@s6.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes: >>Some machines (like i486, RS/6000) have integer multiplies and some (SPARC) >>do not. Now the compilers of the world can get rid of integer multiplies in >>address arithmetic _if_ they can figure out the sizes of the arrays. > >This is incorrect. Compilers can get rid of integer multiplies (actually >implement them as shifts and adds) when the sizes of the array ELEMENTS are >known. > >>This isn't trivial when the array may be a formal parameter (i.e. the size >>may not be fixed). A great deal of code uses this kind of function argument. >>(Nearly all of numerical linear algebra, a great deal of optimization...) It doesn't matter too much. Strength reduction will eliminate multiplies of loop induction variables by loop invariants (which include formal parameters and such). Generally, this is superior to the shift-and-add trick anyway. SR won't help so much outside of loops. -- Preston Briggs looking for the great leap forward preston@titan.rice.edu
bs@linus.mitre.org (Robert D. Silverman) (09/11/90)
In article <6837.26e7ee92@vax1.tcd.ie> rwallace@vax1.tcd.ie writes:
:> :If they have a good FPU, using it for integer multiplication *is* a "reasonable
:> :way". Besides, a bad implementation doesn't prove anything about the basic
:> :idea of FP.
:>
:> Having a good FPU just isn't good enough. Even with IEEE 64-bit, there are
:> only 53 bits of a mantissa. So just how does one multiply two 32 bit integers
:> together using floating point? The answer is: one can't without losing bits.
stuff deleted.
:> The four basic operations of arithmetic are +, -, x, /. Any computer that
:> can't perform them on its atomic data units [whatever the word size is]
:> is a joke.
:
:First, why do you need 32 x 32 -> 64? OK in principle 32 x 32 can give a 64
:bit answer but in practice 99% of the time you're going to be working in 32
Everyone keeps quoting this '99% of the time' stuff to me. I should
like to point out that there are applications for which having
32 x 32 -- > 64 and 64 / 32 = 32 quotient & remainder are absolutely
essential. Basically, ANYTHING involving multiprecision arithmetic
requires these. (on a 32 bit machine that is)
The fact that many people have never encountered this type of application
doesn't mean that it doesn't exist. For those of us doing such work,
not having these instructions in hardware is an absolute KILLER.
The SPARC architecture did not use to have integer multiply and divide.
They finally decided to add it. WHY? It is rumored that a certain
gov't. agency complained so much about its lack that SUN had to put it
in. Ask yourself what gov't. agency might require lots of multiprecise
arithmetic. I do not know this for a fact, it is merely a rumor I heard.
It is probably correct for me to assert that I personally have
burned more CPU cycles doing 32 x 32 --> 64 multiplies and 64/32 divides
than anyone else in history. Last year, I used over 400,000 CPU *hours*
[Yes, that is correct] executing code that required multi-precise
arithmetic in radix 2^30.
--
Bob Silverman
#include <std.disclaimer>
Mitre Corporation, Bedford, MA 01730
"You can lead a horse's ass to knowledge, but you can't make him think"
vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/11/90)
In article <119612@linus.mitre.org> bs@faron.UUCP (Robert D. Silverman) writes: \\\ >Everyone keeps quoting this '99% of the time' stuff to me. I should >like to point out that there are applications for which having >32 x 32 -- > 64 and 64 / 32 = 32 quotient & remainder are absolutely >essential. Basically, ANYTHING involving multiprecision arithmetic ^^^^^^^^^^^ >requires these. (on a 32 bit machine that is) ^^^^^^^^ \\\ Don't you mean `useful' of `efficient' here? The RISC folks probably have the right attitude here -- only that stuff that happens a lot, or is _absolutely_, logically _required_ need be implemented in h/w (either that, or can be `tacked on' to what you were going to do anyway as make no matter to the cost)-- design time means $$$$ after all, right? int x int -> long isn't _required_ for MP calcs, it's just _convenient_. For example, the `bc' U*X calculator uses bytes to store pairs of decimal digits and then uses good old int x int -> int to perform a paper-and-pencil multiply. Another technique is using `carryless' modular arithmetic with appropriate bases (e.g. 2^k+1). So maybe some people burn a _lot_ of cycles doing this sort of stuff (I do myself as it happens), do the economics of design time, marketting & area indicate that h/w must support it? Maybe that Si could be better spent doing things that actually happen a _lot_? -Kym Horsell P.S. I'm still interested in how much of the Sun Sparc's `bc' is written in assembler; it's _awful_ fast. So far no takers to my original post...
urjlew@uncecs.edu (Rostyk Lewyckyj) (09/11/90)
In article <119244@linus.mitre.org>, bs@linus.mitre.org (Robert D. Silverman) writes: > The four basic operations of arithmetic are +, -, x, /. Any computer that > can't perform them on its atomic data units [whatever the word size is] > is a joke. > > -- > Bob Silverman > #include <std.disclaimer> > Mitre Corporation, Bedford, MA 01730 > "You can lead a horse's ass to knowledge, but you can't make him think" If the basic unit for representing a number is a word, then to handle the product of two numbers you may well need a double word. But then you must provide operations to handle double words. So then the double word becomes a basic unit, and so on ad infinitum. If in addition to +, -, and x, you want to handle /, then you most certainly need either something like floating point, or some variant of rational fraction representation with arbitrary precision for the numerator and denominator Now if you want to handle irrationals, then even more you need something like floating point. Choose your own precision. Finally scaledd integer is really no different from floating point, except that floating point is at least standardized (IEEE) , and assisted by hardware. Imagine having to work with user's programs where everybody is doing scaled integer arithmetic in his own way. ----------------------------------------------- Reply-To: Rostyslaw Jarema Lewyckyj urjlew@ecsvax.UUCP , urjlew@unc.bitnet or urjlew@uncvm1.acs.unc.edu (ARPA,SURA,NSF etc. internet) tel. (919)-962-6501
ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/11/90)
(someone wrote) > > As for integer divide, less of a case can be made, and I can do without > > it. Although integer quotient and remainder are probably computed > > simultaneously, some languages (like C) force one to get those results > > separately. This is not true of ANSI C. There are div() and ldiv() functions (the names may be wrong) which return quotient and remainder together. -- Heuer's Law: Any feature is a bug unless it can be turned off.
cik@l.cc.purdue.edu (Herman Rubin) (09/11/90)
In article <3977@bingvaxu.cc.binghamton.edu>, vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes: > In article <119612@linus.mitre.org> bs@faron.UUCP (Robert D. Silverman) writes: ......................... > Don't you mean `useful' of `efficient' here? > > The RISC folks probably have the right attitude here -- only that stuff > that happens a lot, or is _absolutely_, logically _required_ need be > implemented in h/w (either that, or can be `tacked on' to what you > were going to do anyway as make no matter to the cost)-- design > time means $$$$ after all, right? The design time to make a decent integer multiplier, if you are going to have a floating point multiplier, is essentially ZERO. How do you think a floating point unit works, anyhow? Versatility costs little, and even the whole computing unit is relatively cheap. The only computing power that is absolutely required is an adder capable of handling addresses (and even this can be dispensed with) and a bit test, and bit store. All the adders and multipliers do is to execute these operations in such a way that the computer is not issuing hundreds of instructions for a multiply. I suggest that Mr. Horsell try to program the operations and see how many instructions are needed to do a 32x32 unsigned to 64 multiplication if only 32 bit arithmetic is present. > int x int -> long isn't _required_ for MP calcs, it's just _convenient_. > For example, the `bc' U*X calculator uses bytes to store pairs > of decimal digits and then uses good old int x int -> int to perform > a paper-and-pencil multiply. 'bc' is adequate for a more precise pocket calculator. Besides, the paper-and-pencil multiply used digit x digit -> double digit. Anyone who isn't brainwashed into decimal arithmetic (and binary was not used much back when I got my PhD) would design everything in binary, anyhow. I can discard these prejudices and work with octal and hex; for some of what I do, I must. > Another technique is using `carryless' modular arithmetic with > appropriate bases (e.g. 2^k+1). Useful for integer arithmetic when it is known that remainders are zero only. And the results cannot be read, or scaled. > So maybe some people burn a _lot_ of cycles doing this sort of > stuff (I do myself as it happens), do the economics of design time, > marketting & area indicate that h/w must support it? Maybe that > Si could be better spent doing things that actually happen a _lot_? What happens a lot depends on what hardware is available. Those who understand the computer automatically take these things into account. I have computationally efficient means of generating non-uniform random numbers from random bits. But a primitive operation such as checking whether a bit stream has a bit left, reading the bit and moving the pointer so it will not again be used, and branching on it, takes a LONG time on most machines. The checking process cannot be completely avoided, as the number of bits used to get a given amount of output is itself random. There are ways in which these things can sometimes be reasonably done, but not by using what is normally taught to CS students. > P.S. I'm still interested in how much of the Sun Sparc's `bc' is > written in assembler; it's _awful_ fast. So far no > takers to my original post... Is it that fast, considering that the machine is well over a million times as fast as a typical human? -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet) {purdue,pur-ee}!l.cc!cik(UUCP)
vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/11/90)
In article <2538@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes: \\\ >The design time to make a decent integer multiplier, if you are going to >have a floating point multiplier, is essentially ZERO. How do you think >a floating point unit works, anyhow? Versatility costs little, and even >the whole computing unit is relatively cheap. \\\ _Essentially_ zero <> zero. I design the stuff and _my_ time doesn't come cheap. I guess you have a point `tho -- versatility _does_ have a payoff, but it is just one of the factors that go into the equation. \\\ >instructions for a multiply. I suggest that Mr. Horsell try to program >the operations and see how many instructions are needed to do a 32x32 >unsigned to 64 multiplication if only 32 bit arithmetic is present. \\\ My assembler days are 10-20 years in the past (and good riddance)! Counting instructions was never my idea of fun anyway. In any case, as my supervisors at the time told me, programming time (particularly code optimization) cost more than the computer time saved. \\\ >'bc' is adequate for a more precise pocket calculator. Besides, the >paper-and-pencil multiply used digit x digit -> double digit. >Anyone who isn't brainwashed into decimal arithmetic (and binary >was not used much back when I got my PhD) would design everything >in binary, anyhow. I can discard these prejudices and work with >octal and hex; for some of what I do, I must. \\\ The point I was trying to make with `bc' was that you didn't _need_ int x int -> long; perhaps an overkill? I wasn't actually _proposing_ that h/w should do decimal arithmetic (but it makes you wonder...when the actual i/o of numeric data compares with that for _doing_ the calculations you might save time). >> Another technique is using `carryless' modular arithmetic with >> appropriate bases (e.g. 2^k+1). >Useful for integer arithmetic when it is known that remainders are >zero only. And the results cannot be read, or scaled. That's what my DEKv1 says as well. But it _is_ possible to convert them to mixed-radix representation and then do the division ya know. Another possibility is to do a restoring division using the modular form. Since the dynamic occurrence of divisions is not _that_ high you gain. \\\ >> P.S. I'm still interested in how much of the Sun Sparc's `bc' is >> written in assembler; it's _awful_ fast. So far no >> takers to my original post... > >Is it that fast, considering that the machine is well over a million >times as fast as a typical human? \\\ The original Q I posted regarded the comparison of the Sparc's `bc' v/v some other extended precision routines I had to hand, some off the net, some written elsewhere, and some from my own hand. `bc' beat them all. Why? Written in assembler maybe? -Kym Horsell
spot@TR4.GP.CS.CMU.EDU (Scott Draves) (09/11/90)
Herman Rubin sez: >The design time to make a decent integer multiplier, if you are going to >have a floating point multiplier, is essentially ZERO. How do you think This integer multiply would have to be a fp instruction, because the chip probably has separate integer and fp register files, and/or it is super-scalar. 11 bits plus extra control stuff doesn't sound like ZERO work to me. Anyway, exactly how much slower is it to hand-code arbitrary precision arithmetic? 10 times? 100 times? Do I really care? Do the chip makers care? What fraction of people use fp compared with arb. prec. arith? How often would your bit stream instruction(s) be used? What are the chances that it would fit into a RISC pipeline? You're right when you complain that chips these days aren't good for what you do. But I don't care, and neither do many other people. Flame off. Consume Scott Draves Be Silent spot@cs.cmu.edu Die
bs@linus.mitre.org (Robert D. Silverman) (09/11/90)
In article <3977@bingvaxu.cc.binghamton.edu> vu0310@bingvaxu.cc.binghamton.edu.cc.binghamton.edu (R. Kym Horsell) writes: :In article <119612@linus.mitre.org> bs@faron.UUCP (Robert D. Silverman) writes: :\\\ :>Everyone keeps quoting this '99% of the time' stuff to me. I should :>like to point out that there are applications for which having :>32 x 32 -- > 64 and 64 / 32 = 32 quotient & remainder are absolutely :>essential. Basically, ANYTHING involving multiprecision arithmetic :^^^^^^^^^^^ :>requires these. (on a 32 bit machine that is) : ^^^^^^^^ :\\\ : :Don't you mean `useful' of `efficient' here? : I wish people would refrain from shooting their mouths off about multi-precision arithmetic until they've actually done a significant amount of it. Typically, only having 32 x 32 --> 32 slows down multi-precision arithmetic by about a factor of 10. Not having it can render a machine all but useless for performing certain tasks. :The RISC folks probably have the right attitude here -- only that stuff :that happens a lot, or is _absolutely_, logically _required_ need be :implemented in h/w (either that, or can be `tacked on' to what you Happens a lot by whose definition? Yours? What the hell do you know about how much computation in areas requiring multi-precision actually takes place? The people who do that kind of stuff DON'T talk about it. :were going to do anyway as make no matter to the cost)-- design Take a look at the i860 sometime. People have been parading it as a stellar example of a RISC processor. However, a good hunk of real estate on the chip is taken up by a custom purpose processor whose purpose is to aid graphics programming. Yet the damn thing can't multiply and divide. Repeat after me: Addition, subtraction, multiplication, division. These are the four fundamental operations of arithmetic. Too often within this newsgroup I have observed that many people believe that the entire world of computing is either matrix computations [or related work] or UNIX systems programming [or related work]. It is a very parochial attitude. :time means $$$$ after all, right? Exactly. What good is a machine that takes 10 times as long as it should to compute something, simply because the designer failed to add a few gates to provide integer arithmetic. Jobs requiring integer multiply/divide may only represent a small fraction of the total number of jobs, but they typically have CPU requirements that far exceed those of the other 99%. As I pointed out, I burned over several hundred thousand CPU hours last year. I would wager that this is more than rest of the ALL the readers of this newsgroup used put together. : :int x int -> long isn't _required_ for MP calcs, it's just _convenient_. A factor of 10 in performance isn't mere 'convenience'. It can easily change a computation from one that is feasible to do to one that is not. :For example, the `bc' U*X calculator uses bytes to store pairs :of decimal digits and then uses good old int x int -> int to perform :a paper-and-pencil multiply. 'bc' is just slightly slower than molasses. It is a joke. : :Another technique is using `carryless' modular arithmetic with :appropriate bases (e.g. 2^k+1). Performing divisions in Residue Number representation is nearly impossible. About the only way is to convert to normal representation then convert back. : :So maybe some people burn a _lot_ of cycles doing this sort of :stuff (I do myself as it happens), do the economics of design time, :marketting & area indicate that h/w must support it? Maybe that The time to design a multiplier or divider is nil. Whether the market requires it depends on who you are selling to. If you intend to sell to the gov't for example, you damn well better supply the hardware they need. :P.S. I'm still interested in how much of the Sun Sparc's `bc' is : written in assembler; it's _awful_ fast. So far no This proves how little you know about 'bc'. I have code for the SPARC that beats it by almost two orders of magnitude when performing multi-precise multiplication/division. It's _awful_ slow. -- Bob Silverman #include <std.disclaimer> Mitre Corporation, Bedford, MA 01730 "You can lead a horse's ass to knowledge, but you can't make him think"
przemek@liszt.helios.nd.edu (Przemek Klosowski) (09/11/90)
In article <1990Sep10.162839.19226@cs.rochester.edu> crowl@cs.rochester.edu (Lawrence Crowl) writes: >In article <1660@s6.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes: >>do not. Now the compilers of the world can get rid of integer multiplies in >>address arithmetic _if_ they can figure out the sizes of the arrays. > >This is incorrect. Compilers can get rid of integer multiplies (actually >implement them as shifts and adds) when the sizes of the array ELEMENTS are >known. > Lawrence Crowl 716-275-9499 University of Rochester I think that Lawrence misunderstood Andrew. You DO need to know the size of array if you want to do multidimensional arrays (actually you need to know the all but the last dimensions of a matrix). For instance, assuming A as 10x10x10x10x10 matrix, stored 'column-first' with indices running from 0, address of A[1,2,3,4,5] is A+2*10*3*100+4*1000+5*10000 -- przemek klosowski (przemek@ndcva.cc.nd.edu) Physics Dept University of Notre Dame IN 46556
amull@Morgan.COM (Andrew P. Mullhaupt) (09/11/90)
In article <2534@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes: > In article <1660@s6.Morgan.COM>, amull@Morgan.COM (Andrew P. Mullhaupt) writes: > > As for integer divide, less of a case can be made, and I can do without > > it. > > Maybe you do not see a need for it, but lots of people do; even floating > divide with integer quotient and floating remainder. This one you get with most IEEE coprocessors via one or two instructions. > > > Since this is comp.arch, I'll ask. How much chip would a 16x32 integer > > multiply take? You could use this for a lot of address arithmetic > > and build the full 32x32 multiply out of a few calls to this. In the > > same vein, an 8x32 might even be a profitable use of chip. How does > > this VLSI trade-off work out? > > The floating point unit already has at least most of the hardware required. > If the 53x53 floating point unit were slightly modified to give 64 bits of > output, almost no additional chip area would be required. Floating point > units still do their arithmetic as integer arithmetic, with front and back > ends. On machines with an integrated FPU (like the i486 and i860) this makes some sense. In fact _no_ modification is necessary to "widen" the FPU since it already computes 64 bit mantissae in most cases. Now some languages have provided support for this - notably Turbo Pascal on the PC-compatibles with its extended and (just for you, Herman) comp types. On machines with off-chip FPUs, (i.e. the coprocessors of the world) this idea isn't necessarily a winner since there can be a significant delay moving the data between chips. No, the problem here is for chips like the SPARC which would have to put the instruction on a chip where _no_ real estate is devoted to the FPU. Now the need for integer multiplies is real - although strength reduction is very useful, it is nearly impossible if the access to the array is not direct. In particular, pivoting (often necessary in numerical linear algebra) disorganizes the sequential nature of array accesses and prevents strength reduction. This type of access usually occurs in a subroutine where the array is a formal parameter, and so the indexing cannot be simplified at compile time in most languages. (FORTRAN 90 is somewhat of an exception since the shape of an array parameter is always known at run time and the compiler could take advantage of this. In C this situation is pretty grim if integer multiplications are slow...) N.B.: To all those who have "never seen code where the array shape is not known at compile time" I would ask that they stop sending me mail and take a look at almost any library of subroutines for numerical computation. This kind of code is very common. Later, Andrew Mullhaupt
vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/12/90)
In article <119733@linus.mitre.org> bs@gauss.UUCP (Robert D. Silverman) writes: \\\ >This proves how little you know about 'bc'. I have code for the SPARC >that beats it by almost two orders of magnitude when performing >multi-precise multiplication/division. It's _awful_ slow. \\\ An interesting comment. Although I only do simple-minded stuff, that would mean your routines can calc the first 1000 factorials in 20 ms -- pretty good! I'd submit those routines to the net! -Kym Horsell
bs@linus.mitre.org (Robert D. Silverman) (09/12/90)
In article <1990Sep10.215549.26260@uncecs.edu> urjlew@uncecs.edu (Rostyk Lewyckyj) writes: :In article <119244@linus.mitre.org>, bs@linus.mitre.org (Robert D. Silverman) writes: :> The four basic operations of arithmetic are +, -, x, /. Any computer that :> can't perform them on its atomic data units [whatever the word size is] :> is a joke. : : :If the basic unit for representing a number is a word, then to handle :the product of two numbers you may well need a double word. But then :you must provide operations to handle double words. So then the double :word becomes a basic unit, and so on ad infinitum. If in addition to What is needed is simply a single, temporary double word. Typically after forming a double word product it is then reduced to a single word. e.g. compute AB/C, AB mod C, etc. In any event, your response adresses LANGUAGE issues, not hardware. I'm quite happy to work in assembler to handle double length registers as long as the HARDWARE provides the capability. Note that for language purposes one could provide an in-line routine to multiply two words and return the upper/lower halves. Similarly a division routine could take the two halves and divide by a third number. -- Bob Silverman #include <std.disclaimer> Mitre Corporation, Bedford, MA 01730 "You can lead a horse's ass to knowledge, but you can't make him think"
tbray@watsol.waterloo.edu (Tim Bray) (09/13/90)
Some time ago, rwallace@vax1.tcd.ie wrote: >There is practically no processing done which depends on integer * and / being >fast (accessing an array of structures doesn't count... First off, it is VERY DANGEROUS to make statements like that without a LARGE suite of benchmarks and end-user surveys to back them up. Go ask the guys who designed the sparc. Second, I think this statement is wrong. Associative lookup, i.e. symbol table management, is fundamental to many modern applications - compilers, interpreters, very high level languages, computer algebra systems, PostScript. Symbol tables are usually done with hashing. Hashing functions very often involve multiplication (but note the neat algorithm for string hashing in the recent CACM). Just another data point, Tim Bray, Open Text Systems, Waterloo, Ont.
rpeglar@csinc.UUCP (Rob Peglar) (09/13/90)
In article <119612@linus.mitre.org>, bs@linus.mitre.org (Robert D. Silverman) writes: (bunch of stuff deleted) > > The SPARC architecture did not use to have integer multiply and divide. > They finally decided to add it. WHY? It is rumored that a certain > gov't. agency complained so much about its lack that SUN had to put it > in. Ask yourself what gov't. agency might require lots of multiprecise > arithmetic. I do not know this for a fact, it is merely a rumor I heard. > Isn't it funny how quite a bit of the uP world is rediscovering, reinventing, rehashing, reliving, and redying all sorts of episodes that the MF and super world went through, lo these many years ago? "certain government agencies....forcing architectural changes..." we've seen it all before. no one should be surprised. FP vs. arbitrary integer? a *marketing* issue, not architectural. If you want to play the game, you have to know the rules. In certain worlds, FP rules. In others, integer rules. Pun intended. Sure, there are many clever ways to implement both. Ya gotta sell it, though. Note, no value judgements here - enjoyable discussions. It's just when people apply architectural prejudice on one or the other that the discussions become absurd. Re the coupla-months-ago thread about fork() vs. vfork(). Processors are fine. Let's put our brains to work on memory and I/O, the real hard issues. Rob uunet!csinc!rpeglar -- Rob Peglar Comtrol Corp. 2675 Patton Rd., St. Paul MN 55113 A Control Systems Company (800) 926-6876 ...uunet!csinc!rpeglar
pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/14/90)
On 12 Sep 90 15:15:46 GMT, bs@linus.mitre.org (Robert D. Silverman) said: bs> In any event, your response adresses LANGUAGE issues, not hardware. bs> I'm quite happy to work in assembler to handle double length bs> registers as long as the HARDWARE provides the capability. Agreed - on the other hand you should provide both single * single => double double / single => single and single * single => single single / single => single for compiler writers, because handling register pairs in code generators is notoriously a pain, and many languages only need (more or less regrettably) the all single length version (with or without an overlow indication). bs> What is needed is simply a single, temporary double word. Typically bs> after forming a double word product it is then reduced to a single bs> word. e.g. compute AB/C, AB mod C, etc. bs> Note that for language purposes one could provide an in-line routine bs> to multiply two words and return the upper/lower halves. Similarly bs> a division routine could take the two halves and divide by a third bs> number. You may find agreement in an old and very nice paper (Software Practice and Experience, early seventies) by Martin Richards (he of BCPL and tripos fame) about having just a *single* procedure call he calls muldiv, that returns the result of doing _both_ things, i.e. (single * single => double) / single => single He claims that in *many* case this is all that is needed to obviate the lack of double length integers, or even floating point, as it is the crucial thing for efficient fixed point arithmetic, and it also need never expose double length integers to the language. Such a procedure is easily codable as an inline in GNU C/Objective-C/C++, or with SUN's .il files, and can solve an awful lot of problems. As you say above, one could also have mulrem, as in (single * single => double) % single => single I may be dreaming, but I think that both actually already exist in Forth, and for the same reason as in BCPL: because they are word oriented languages, they only have a single length of integer, and thus the issue of double length (especially if the basic word is 16 bits) is pressing. It may also help to have double shifts, but I am not so sure. -- Piercarlo "Peter" Grandi | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk Dept of CS, UCW Aberystwyth | UUCP: ...!mcsun!ukc!aber-cs!pcg Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk
yodaiken@chelm.cs.umass.edu (victor yodaiken) (09/26/90)
In article <1990Sep11.035826.22880@cs.cmu.edu> spot@TR4.GP.CS.CMU.EDU (Scott Draves) writes: ... > >Anyway, exactly how much slower is it to hand-code arbitrary precision >arithmetic? 10 times? 100 times? Do I really care? Do the chip makers >care? What fraction of people use fp compared with arb. prec. arith? >How often would your bit stream instruction(s) be used? What are the >chances that it would fit into a RISC pipeline? > >You're right when you complain that chips these days aren't good for >what you do. But I don't care, and neither do many other people. >Flame off. The architectural requirements for arthmetic are interesting. Estimates of the current market are far less so.