[comp.arch] F.P. vs. arbitrary-precision

amos@taux01.nsc.com (Amos Shapir) (09/06/90)

[Quoted from the referenced article by jgk@osc.COM (Joe Keane)]
>I have to agree that 128-bit floating point isn't really such a hot idea.
>When you get right down to it, floating point is a hack.  It's a very useful
>hack; i won't argue with that.  We admit 64-bit floating point doesn't work,
>so what do we do?  We provide more of the same.

It works for most uses, and "more of the same" really is an improvement
if it's done right.  The point is, there isn't yet any better idea
that works as well.

>There are a lot of machines out there that can do IEEE 64-bit floating point,
>with all its precise rules and cases, but can't multiply two 32-bit integers
>in a reasonable way.  What are we to make of this?  It's just dumb.

If they have a good FPU, using it for integer multiplication *is* a "reasonable
way".  Besides, a bad implementation doesn't prove anything about the basic
idea of FP.

>Our current programming languages have a strong influence.  C has `float' and
>`double' types, and most machines have single-precision and double-precision
>floating point numbers.  Coincidence?  I think not.

Certainly not, but you got it backwards - C has it because PDP-11 has it
because FORTRAN has it because the early IBM machines had it...


>I don't care if you have 65536-bit floating point, it's still floating point.
>That means underflow, overflow, round-off error, loss of precision, need i
>continue?
...
>Another interesting area is hardware support for arbitrary-precision real
>numbers.

But even with arbitrary precision, at some point you have to start disposing
of insignificant bits - and suffer the same side-effects.

-- 
	Amos Shapir		amos@taux01.nsc.com, amos@nsc.nsc.com
National Semiconductor (Israel) P.O.B. 3007, Herzlia 46104, Israel
Tel. +972 52 522255  TWX: 33691, fax: +972-52-558322 GEO: 34 48 E / 32 10 N

bs@linus.mitre.org (Robert D. Silverman) (09/06/90)

In article <4513@taux01.nsc.com> amos@taux01.nsc.com (Amos Shapir) writes:
:[Quoted from the referenced article by jgk@osc.COM (Joe Keane)]
:>I have to agree that 128-bit floating point isn't really such a hot idea.
:>When you get right down to it, floating point is a hack.  It's a very useful
:>hack; i won't argue with that.  We admit 64-bit floating point doesn't work,
:>so what do we do?  We provide more of the same.
:
:It works for most uses, and "more of the same" really is an improvement
:if it's done right.  The point is, there isn't yet any better idea
:that works as well.
:
:>There are a lot of machines out there that can do IEEE 64-bit floating point,
:>with all its precise rules and cases, but can't multiply two 32-bit integers
:>in a reasonable way.  What are we to make of this?  It's just dumb.
:
:If they have a good FPU, using it for integer multiplication *is* a "reasonable
:way".  Besides, a bad implementation doesn't prove anything about the basic
:idea of FP.
 
Having a good FPU just isn't good enough. Even with IEEE 64-bit, there are
only 53 bits of a mantissa. So just how does one multiply two 32 bit integers
together using floating point? The answer is: one can't without losing bits.
(or by doing several multiplies on the low/high halves and combining things)
Futhermore, if I should need to multiply integers in this way, they must
first be converted to floating point, then the product must be converted back.
Can you say expensive???

The four basic operations of arithmetic are +, -, x, /. Any computer that
can't perform them on its atomic data units [whatever the word size is]
is a joke.

-- 
Bob Silverman
#include <std.disclaimer>
Mitre Corporation, Bedford, MA 01730
"You can lead a horse's ass to knowledge, but you can't make him think"

bruceh@sgi.com (Bruce R. Holloway) (09/07/90)

In article <119244@linus.mitre.org> bs@gauss.UUCP (Robert D. Silverman) writes:
>In article <4513@taux01.nsc.com> amos@taux01.nsc.com (Amos Shapir) writes:
>:[Quoted from the referenced article by jgk@osc.COM (Joe Keane)]
>:
>:>There are a lot of machines out there that can do IEEE 64-bit floating point,
>:>with all its precise rules and cases, but can't multiply two 32-bit integers
>:>in a reasonable way.  What are we to make of this?  It's just dumb.
> 
>The four basic operations of arithmetic are +, -, x, /. Any computer that
>can't perform them on its atomic data units [whatever the word size is]
>is a joke.

Where did you learn this?  By looking at a calculator?  What about sqrt()?
In my line of work we use it a lot & know lots of ways to approximate it
depending on what's available & how fast it needs to be.  Still, there is
a pencil & paper method that's exact & fairly easy to do in hardware.
Perhaps I should say, any computer that can't do sqrt() is a joke!

The basic operations are + and *.  That's why there's an algebraic
construct called a "field".  The other two operations are just derivative
ones that involve inverses.  Actually, if you did statistics on numerical
programs (maybe you have), you would find that divide occurs much less
frequently than the others.  So those guys that sell multiplier-accumulator
chips are right, because even if it takes 10x longer to divide, it probably
occurs 10x less frequently anyway.

Obviously, any Turing machine can do all five of these operations.  So the
issue isn't can the machine do it, or does it have a machine language
instruction to do it, or does that instruction happen in a single cycle.
I think the real issue is that the programming languages don't do the job,
because you can't multiply two ints & get a double precision result without
writing a subroutine.  The compiler is free to do +, *, and sqrt with a
single instruction or with a subroutine, but the guy who wants a double
precision product or to detect overflow after addition has to write
a subroutine to do it.

rwallace@vax1.tcd.ie (09/08/90)

> :If they have a good FPU, using it for integer multiplication *is* a "reasonable
> :way".  Besides, a bad implementation doesn't prove anything about the basic
> :idea of FP.
>  
> Having a good FPU just isn't good enough. Even with IEEE 64-bit, there are
> only 53 bits of a mantissa. So just how does one multiply two 32 bit integers
> together using floating point? The answer is: one can't without losing bits.
> (or by doing several multiplies on the low/high halves and combining things)
> Futhermore, if I should need to multiply integers in this way, they must
> first be converted to floating point, then the product must be converted back.
> Can you say expensive???
> 
> The four basic operations of arithmetic are +, -, x, /. Any computer that
> can't perform them on its atomic data units [whatever the word size is]
> is a joke.

First, why do you need 32 x 32 -> 64? OK in principle 32 x 32 can give a 64
bit answer but in practice 99% of the time you're going to be working in 32
bits all the way and you don't want 64 bit answers (which is why C has
int x int -> int not int x int -> long).

Second, there is a huge amount of processing done which depends on integer +
and - and fp +, -, * and / being fast, so there is hardware support for these.
There is practically no processing done which depends on integer * and / being
fast (accessing an array of structures doesn't count because a smart compiler
can use shifts and adds), and don't bother giving anecdotal cases because it's
still less than 1% of the total. Therefore chip space was not wasted on making
these fast.
-- 
"To summarize the summary of the summary: people are a problem"
Russell Wallace, Trinity College, Dublin
rwallace@vax1.tcd.ie

knighten@Encore.com (Robert L. Knighten) (09/08/90)

In article <1990Sep6.224018.17701@odin.corp.sgi.com> bruceh@sgi.com
(Bruce R. Holloway) writes:

   . . . Actually, if you did statistics on numerical
   programs (maybe you have), you would find that divide occurs much less
   frequently than the others.  So those guys that sell multiplier-accumulator
   chips are right, because even if it takes 10x longer to divide, it probably
   occurs 10x less frequently anyway.
   . . .

This is one of those "which comes first" issues I've discussed with quite a
few people.  When working on numerically intensive code that needs to be fast
I have always avoided divide when possible because it is so slow (and getting
slower relative to the other instructions!)  This means that such things as
rational function approximations are avoided.  But why is divide so slow.  The
hardware architects with whom I have discussed this have said, "Sure, we could
do a fast divide, but it would take a lot of gates and no one uses it."  And
no one uses it because it is slow and . . .

Has anyone done careful studies of the actual value of a fast divide (integer
or floating point)?

--
Bob Knighten
Encore Computer Corp., 257 Cedar Hill Street, Marlborough, MA 01752
Internet:  knighten@encore.com             (508) 460-0500 ext. 2626
uucp:  {bu-cs,decvax,gould}!encore!knighten

dave@tygra.UUCP (David Conrad) (09/09/90)

In article <6837.26e7ee92@vax1.tcd.ie> rwallace@vax1.tcd.ie writes:
}First, why do you need 32 x 32 -> 64? OK in principle 32 x 32 can give a 64
}bit answer but in practice 99% of the time you're going to be working in 32
}bits all the way and you don't want 64 bit answers (which is why C has
}int x int -> int not int x int -> long).
}-- 
}Russell Wallace, Trinity College, Dublin

*My* C doesn't have and int * int = int.  I know because the CPU doesn't
support it.  What it does is int * int = long, and then it casts the
long back into an int if it'll fit.
--
David "Salamander" Conrad     |     In Cows We Trust     |     This space
dave@tygra.ddmi.com           |     E  Pluribus  Moo     |      for rent
-- 
=  CAT-TALK Conferencing Network, Prototype Computer Conferencing System  =
-  1-800-825-3069, 300/1200/2400/9600 baud, 8/N/1. New users use 'new'    - 
=  as a login id.    <<Redistribution to GEnie PROHIBITED!!!>>>           =
E-MAIL Address: dave@ThunderCat.COM

peter@ficc.ferranti.com (Peter da Silva) (09/09/90)

In article <6837.26e7ee92@vax1.tcd.ie> rwallace@vax1.tcd.ie writes:
> First, why do you need 32 x 32 -> 64?

It sure makes arbitrary precision a whole lot easier.

> why C has int x int -> int not int x int -> long.

Which is irritating to the folks who need that operation, because it forces
them to do a full 32x32 multiply instead of a 16x16->32.

Which is why Forth has 16x16->32 built in.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

amull@Morgan.COM (Andrew P. Mullhaupt) (09/09/90)

> There is practically no processing done which depends on integer * and / being
> fast (accessing an array of structures doesn't count because a smart compiler
> can use shifts and adds), and don't bother giving anecdotal cases because it's
> still less than 1% of the total. Therefore chip space was not wasted on making
> these fast.

I won't give examples but you're wrong about the address arithmetic.
Some machines (like i486, RS/6000) have integer multiplies and some
(SPARC) do not. Now the compilers of the world can get rid of integer
multiplies in address arithmetic _if_ they can figure out the sizes
of the arrays. This isn't trivial when the array may be a formal 
parameter (i.e. the size may not be fixed). A great deal of code uses
this kind of function argument. (Nearly all of numerical linear algebra,
a great deal of optimization...) I have been pretty annoyed at how slow
this stuff gets on the SPARC and pretty happy with how often my i486
home computer (running UNIX & DOS) beats the Sparcstations on identical
code. However, Fast is relative in this matter. The RS/6000 has integer
multiply but has an extremely fast set of floating point. This is really
good unless the array indexing gets in the way of the FP instructions,
which turns out to be a bit tricky. The thing probably needs to have
an even faster integer multiply (and this probably requires some sort of
superscalar capability) in order to get the best performance. 

The lack of an integer multiply on the SPARCs (I think some of them are
supposed to have it but the ones I use don't...) is pretty bad. But it is
compensated by the register windows which allow it to call functions
extremely quickly. I think that the real estate wasn't so much thought
to be wasted on integer multiply as better used for register windows.
A machine for scientific computing should probably not go so far as
to eliminate the integer multiply.

As for integer divide, less of a case can be made, and I can do without
it. Although integer quotient and remainder are probably computed
simultaneously, some languages (like C) force one to get those results
separately. If this wasn't the case, probably integer divides would
take up 25% less of our time.

Since this is comp.arch, I'll ask. How much chip would a 16x32 integer
multiply take? You could use this for a lot of address arithmetic
and build the full 32x32 multiply out of a few calls to this. In the
same vein, an 8x32 might even be a profitable use of chip. How does
this VLSI trade-off work out? 


Later,
Andrew Mullhaupt

cik@l.cc.purdue.edu (Herman Rubin) (09/10/90)

In article <6837.26e7ee92@vax1.tcd.ie>, rwallace@vax1.tcd.ie writes:
> > :If they have a good FPU, using it for integer multiplication *is* a "reasonable
> > :way".  Besides, a bad implementation doesn't prove anything about the basic
> > :idea of FP.
> >  
> > Having a good FPU just isn't good enough. Even with IEEE 64-bit, there are
> > only 53 bits of a mantissa. So just how does one multiply two 32 bit integers
> > together using floating point? The answer is: one can't without losing bits.
> > (or by doing several multiplies on the low/high halves and combining things)
> > Futhermore, if I should need to multiply integers in this way, they must
> > first be converted to floating point, then the product must be converted back.
> > Can you say expensive???
> > 
> > The four basic operations of arithmetic are +, -, x, /. Any computer that
> > can't perform them on its atomic data units [whatever the word size is]
> > is a joke.
> 
> First, why do you need 32 x 32 -> 64? OK in principle 32 x 32 can give a 64
> bit answer but in practice 99% of the time you're going to be working in 32
> bits all the way and you don't want 64 bit answers (which is why C has
> int x int -> int not int x int -> long).
> 
> Second, there is a huge amount of processing done which depends on integer +
> and - and fp +, -, * and / being fast, so there is hardware support for these.
> There is practically no processing done which depends on integer * and / being
> fast (accessing an array of structures doesn't count because a smart compiler
> can use shifts and adds), and don't bother giving anecdotal cases because it's
> still less than 1% of the total. Therefore chip space was not wasted on making
> these fast.

All early machines had only fixed point.  Floating point hardware, no matter
how it is done, is kludged fixed point.  There are pre- and post- shifts,
the necessary fixed point arithmetic is done, and the (computed in parallel)
exponent is adjusted.  From this standpoint floating point is not basic.  I
do not suggest that it be eliminated; it is too useful.  We could make better
kludges now, and unless it is in hardware, the current practice of packing
the exponent and mantissa in one unit would be undesirable.

But this does not address the multiprecision question.  Much work does not
get done if the tools are too clumsy.  In doing multiple precision work, the
number must be broken up into units on which the hardware can operate.  If
we only have 32x32 -> 32, are effectively limited to 16-bit units.  A count
of the number of instructions requires to do a 32x32 -> 64 shows how bad it
is.  Also, an algorithm designer may very well use an obviously clumsier
algorithm which is not subject to these problems caused by the inadequacy
of hardware.  Look at the devices used to get multiples of pi and ln 2.

Now suppose we have to do 160-bit arithmetic to do a floating-point operation
to sufficient accuracy (3 times the current double precision).  The intelligent
thing to do, hardware allowing the operations, would be 5 32-bit pieces.  If
we are to struggle with the floating-point units, and the insistence on 
normalization would make it a struggle, we would have to use at most 26-bit
units, and need 7 of them.  Using 16 bit units would take ten.  Not having
units operated on being an easily used size makes addressing, etc., much more
difficult.  So to do multiple precision work reasonably easily, we must use
16 bit units.  This is not what we got from the computers of old--there the
unit was 35 bits plus sign, or more, and double length products were available,
as well as double/single -> quotient and remainder.  No, we are essentially in
the early PC period as far as arithmetic.

As to why these, and other things, are not in C, I presume that the founders of
C did not think of them.  In general, when a limited group designs something
without getting ideas from outside, this happens.  Algol was supposed to be
a universal algorithmic language, and it does not have even the hardware
operations of the machines of that time in it.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/10/90)

In article <389@tygra.UUCP> dave@tygra.UUCP (David Conrad) writes:
\\\
>*My* C doesn't have and int * int = int.  I know because the CPU doesn't
>support it.  What it does is int * int = long, and then it casts the
>long back into an int if it'll fit.
\\\

This is a rather strange idea; the last time I heard of a crock
whereby the type of an expression can't be determined until runtime
(which is what you imply, see below) was raising an integer to
integral power in Algol -- if the exponent is <0 you get a real
else you get an integer. As I recall Burroughs kludged this by
representing integers and reals in a common format (not _too_
unusual, see the '87 chips) but they had the binary point
ON THE RIGHT.

The problem with int->int->long for (any?) of the common
arithmetic ops is expressions like the following (this
is typical of C, but other exprs can be fudged up in most of
the modern langs):

	int x,y,z;
	...
	x = (x==y ? x*x : y);

(We can also produce all sorts of similar arguments based on the
associativety and commutativity of *).
Now, since we have an x*x inside the conditional, and for the
sake of argument let's say the optimizer doesn't play with this
code too much (or else I'll think up a reall _big_ example where
it won't) then how do we deal with the type indeterminacy
as you (and previous posts) have described it? Do we hang a bit
of code near the x*x and _decide_ that we have a long and cast
it to an int or do we simply _always_ cast x*x to an int?
If the latter then we _really_ have implemented int x int->int.

There are _some_ situations where this int x int -> long may be
useful, e.g.
	long x;
	int y,z;

	x = y*z;

We really get the entire product! Perhaps your compiler kicks in
in only this (or similar) case? I _still_ don't like it; it
isn't portable (there's a C standard for one thing). 

-Kym Horsell
===

cik@l.cc.purdue.edu (Herman Rubin) (09/10/90)

In article <1660@s6.Morgan.COM>, amull@Morgan.COM (Andrew P. Mullhaupt) writes:
> > There is practically no processing done which depends on integer * and / being
> > fast (accessing an array of structures doesn't count because a smart compiler
> > can use shifts and adds), and don't bother giving anecdotal cases because it's
> > still less than 1% of the total. Therefore chip space was not wasted on making
> > these fast.

			.......................

> As for integer divide, less of a case can be made, and I can do without
> it. Although integer quotient and remainder are probably computed
> simultaneously, some languages (like C) force one to get those results
> separately. If this wasn't the case, probably integer divides would
> take up 25% less of our time.

Maybe you do not see a need for it, but lots of people do; even floating
divide with integer quotient and floating remainder.  Very few languages
have any provisions for simultaneous quotient and remainder.  This is an
example of an extremely foolish and unnecessary oversight; the great bulk
of computers since computers were invented had the simultaneous integer
production.

> Since this is comp.arch, I'll ask. How much chip would a 16x32 integer
> multiply take? You could use this for a lot of address arithmetic
> and build the full 32x32 multiply out of a few calls to this. In the
> same vein, an 8x32 might even be a profitable use of chip. How does
> this VLSI trade-off work out? 

The floating point unit already has at least most of the hardware required.
If the 53x53 floating point unit were slightly modified to give 64 bits of
output, almost no additional chip area would be required.  Floating point
units still do their arithmetic as integer arithmetic, with front and back
ends.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (09/10/90)

In article <1660@s6.Morgan.COM>, amull@Morgan.COM (Andrew P. Mullhaupt) writes:

> As for integer divide, less of a case can be made, and I can do without
> it. Although integer quotient and remainder are probably computed
> simultaneously, some languages (like C) force one to get those results
> separately. If this wasn't the case, probably integer divides would
> take up 25% less of our time.

  Actually ANSI C does provide a function which does just what you said,
and a type (div_t) which contains the two values. And you can define it
builtin if your implementation desires.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
    VMS is a text-only adventure game. If you win you can use unix.

crowl@cs.rochester.edu (Lawrence Crowl) (09/10/90)

In article <1660@s6.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes:
>Some machines (like i486, RS/6000) have integer multiplies and some (SPARC)
>do not. Now the compilers of the world can get rid of integer multiplies in
>address arithmetic _if_ they can figure out the sizes of the arrays.

This is incorrect.  Compilers can get rid of integer multiplies (actually
implement them as shifts and adds) when the sizes of the array ELEMENTS are
known.

>This isn't trivial when the array may be a formal parameter (i.e. the size
>may not be fixed). A great deal of code uses this kind of function argument.
>(Nearly all of numerical linear algebra, a great deal of optimization...)

Nearly all code has fixed-size array elements.  I've seen some memory
allocation routine that took element size as a parameter, but have never seen
any processing routines that did.

-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
	  ...!{ames,rutgers}!rochester!crowl	Rochester, New York,  14627

preston@titan.rice.edu (Preston Briggs) (09/11/90)

In article <1990Sep10.162839.19226@cs.rochester.edu> crowl@cs.rochester.edu (Lawrence Crowl) writes:
>In article <1660@s6.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes:
>>Some machines (like i486, RS/6000) have integer multiplies and some (SPARC)
>>do not. Now the compilers of the world can get rid of integer multiplies in
>>address arithmetic _if_ they can figure out the sizes of the arrays.
>
>This is incorrect.  Compilers can get rid of integer multiplies (actually
>implement them as shifts and adds) when the sizes of the array ELEMENTS are
>known.
>
>>This isn't trivial when the array may be a formal parameter (i.e. the size
>>may not be fixed). A great deal of code uses this kind of function argument.
>>(Nearly all of numerical linear algebra, a great deal of optimization...)

It doesn't matter too much.  Strength reduction will eliminate
multiplies of loop induction variables by loop invariants (which
include formal parameters and such).  Generally, this is superior 
to the shift-and-add trick anyway.
SR won't help so much outside of loops.

-- 
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu

bs@linus.mitre.org (Robert D. Silverman) (09/11/90)

In article <6837.26e7ee92@vax1.tcd.ie> rwallace@vax1.tcd.ie writes:
:> :If they have a good FPU, using it for integer multiplication *is* a "reasonable
:> :way".  Besides, a bad implementation doesn't prove anything about the basic
:> :idea of FP.
:>  
:> Having a good FPU just isn't good enough. Even with IEEE 64-bit, there are
:> only 53 bits of a mantissa. So just how does one multiply two 32 bit integers
:> together using floating point? The answer is: one can't without losing bits.
 
stuff deleted.

:> The four basic operations of arithmetic are +, -, x, /. Any computer that
:> can't perform them on its atomic data units [whatever the word size is]
:> is a joke.
:
:First, why do you need 32 x 32 -> 64? OK in principle 32 x 32 can give a 64
:bit answer but in practice 99% of the time you're going to be working in 32
 
Everyone keeps quoting this '99% of the time' stuff to me. I should
like to point out that there are applications for which having
32 x 32 -- > 64   and 64 / 32 = 32 quotient & remainder  are absolutely
essential. Basically, ANYTHING involving multiprecision arithmetic
requires these. (on a 32 bit machine that is)

The fact that many people have never encountered this type of application
doesn't mean that it doesn't exist. For those of us doing such work,
not having these instructions in hardware is an absolute KILLER. 
 
The SPARC architecture did not use to have integer multiply and divide.
They finally decided to add it. WHY? It is rumored that a certain
gov't. agency complained so much about its lack that SUN had to put it
in. Ask yourself what gov't. agency might require lots of multiprecise
arithmetic. I do not know this for a fact, it is merely a rumor I heard.

It is probably correct for me to assert that I personally have
burned more CPU cycles doing 32 x 32 --> 64 multiplies and 64/32 divides
than anyone else in history. Last year, I used over 400,000 CPU *hours*
[Yes, that is correct] executing code that required multi-precise
arithmetic in radix 2^30.
-- 
Bob Silverman
#include <std.disclaimer>
Mitre Corporation, Bedford, MA 01730
"You can lead a horse's ass to knowledge, but you can't make him think"

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/11/90)

In article <119612@linus.mitre.org> bs@faron.UUCP (Robert D. Silverman) writes:
\\\
>Everyone keeps quoting this '99% of the time' stuff to me. I should
>like to point out that there are applications for which having
>32 x 32 -- > 64   and 64 / 32 = 32 quotient & remainder  are absolutely
>essential. Basically, ANYTHING involving multiprecision arithmetic
^^^^^^^^^^^
>requires these. (on a 32 bit machine that is)
 ^^^^^^^^
\\\

Don't you mean `useful' of `efficient' here? 

The RISC folks probably have the right attitude here -- only that stuff
that happens a lot, or is _absolutely_, logically _required_ need be
implemented in h/w (either that, or can be `tacked on' to what you
were going to do anyway as make no matter to the cost)-- design
time means $$$$ after all, right?

int x int -> long isn't _required_ for MP calcs, it's just _convenient_.
For example, the `bc' U*X calculator uses bytes to store pairs
of decimal digits and then uses good old int x int -> int to perform
a paper-and-pencil multiply.

Another technique is using `carryless' modular arithmetic with
appropriate bases (e.g. 2^k+1).

So maybe some people burn a _lot_ of cycles doing this sort of
stuff (I do myself as it happens), do the economics of design time,
marketting & area indicate that h/w must support it? Maybe that 
Si could be better spent doing things that actually happen a _lot_?

-Kym Horsell

P.S. I'm still interested in how much of the Sun Sparc's `bc' is
	written in assembler; it's _awful_ fast. So far no
	takers to my original post...

urjlew@uncecs.edu (Rostyk Lewyckyj) (09/11/90)

In article <119244@linus.mitre.org>, bs@linus.mitre.org (Robert D. Silverman) writes:
> The four basic operations of arithmetic are +, -, x, /. Any computer that
> can't perform them on its atomic data units [whatever the word size is]
> is a joke.
> 
> -- 
> Bob Silverman
> #include <std.disclaimer>
> Mitre Corporation, Bedford, MA 01730
> "You can lead a horse's ass to knowledge, but you can't make him think"

If the basic unit for representing a number is a word, then to handle
the product of two numbers you may well need a double word. But then
you must provide operations to handle double words. So then the double
word becomes a basic unit, and so on ad infinitum. If in addition to
+, -, and x, you want to handle /, then you most certainly need either
something like floating point, or some variant of rational fraction
representation with arbitrary precision for the numerator and denominator
Now if you want to handle irrationals, then even more you need something
like floating point. Choose your own precision.
Finally scaledd integer is really no different from floating point,
except that floating point is at least standardized (IEEE) , and
assisted by hardware. Imagine having to work with user's programs
where everybody is doing scaled integer arithmetic in his own way.

-----------------------------------------------
  Reply-To:  Rostyslaw Jarema Lewyckyj
             urjlew@ecsvax.UUCP ,  urjlew@unc.bitnet
       or    urjlew@uncvm1.acs.unc.edu    (ARPA,SURA,NSF etc. internet)
       tel.  (919)-962-6501

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (09/11/90)

(someone wrote)
> > As for integer divide, less of a case can be made, and I can do without
> > it. Although integer quotient and remainder are probably computed
> > simultaneously, some languages (like C) force one to get those results
> > separately.

This is not true of ANSI C.  There are div() and ldiv() functions
(the names may be wrong) which return quotient and remainder together.
-- 
Heuer's Law:  Any feature is a bug unless it can be turned off.

cik@l.cc.purdue.edu (Herman Rubin) (09/11/90)

In article <3977@bingvaxu.cc.binghamton.edu>, vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) writes:
> In article <119612@linus.mitre.org> bs@faron.UUCP (Robert D. Silverman) writes:
			.........................

> Don't you mean `useful' of `efficient' here? 
> 
> The RISC folks probably have the right attitude here -- only that stuff
> that happens a lot, or is _absolutely_, logically _required_ need be
> implemented in h/w (either that, or can be `tacked on' to what you
> were going to do anyway as make no matter to the cost)-- design
> time means $$$$ after all, right?

The design time to make a decent integer multiplier, if you are going to
have a floating point multiplier, is essentially ZERO.  How do you think
a floating point unit works, anyhow?  Versatility costs little, and even
the whole computing unit is relatively cheap.

The only computing power that is absolutely required is an adder capable
of handling addresses (and even this can be dispensed with) and a bit test,
and bit store.  All the adders and multipliers do is to execute these
operations in such a way that the computer is not issuing hundreds  of
instructions for a multiply.  I suggest that Mr. Horsell try to program
the operations and see how many instructions are needed to do a 32x32
unsigned to 64 multiplication if only 32 bit arithmetic is present.

> int x int -> long isn't _required_ for MP calcs, it's just _convenient_.
> For example, the `bc' U*X calculator uses bytes to store pairs
> of decimal digits and then uses good old int x int -> int to perform
> a paper-and-pencil multiply.

'bc' is adequate for a more precise pocket calculator.  Besides, the
paper-and-pencil multiply used digit x digit -> double digit.
Anyone who isn't brainwashed into decimal arithmetic (and binary
was not used much back when I got my PhD) would design everything
in binary, anyhow.  I can discard these prejudices and work with
octal and hex; for some of what I do, I must.

> Another technique is using `carryless' modular arithmetic with
> appropriate bases (e.g. 2^k+1).

Useful for integer arithmetic when it is known that remainders are
zero only.  And the results cannot be read, or scaled.

> So maybe some people burn a _lot_ of cycles doing this sort of
> stuff (I do myself as it happens), do the economics of design time,
> marketting & area indicate that h/w must support it? Maybe that 
> Si could be better spent doing things that actually happen a _lot_?

What happens a lot depends on what hardware is available.  Those
who understand the computer automatically take these things into
account.

I have computationally efficient means of generating non-uniform
random numbers from random bits.  But a primitive operation such
as checking whether a bit stream has a bit left, reading the bit
and moving the pointer so it will not again be used, and branching
on it, takes a LONG time on most machines.  The checking process
cannot be completely avoided, as the number of bits used to get
a given amount of output is itself random.  There are ways in
which these things can sometimes be  reasonably done, but not by using
what is normally taught to CS students.  
 
> P.S. I'm still interested in how much of the Sun Sparc's `bc' is
> 	written in assembler; it's _awful_ fast. So far no
> 	takers to my original post...

Is it that fast, considering that the machine is well over a million
times as fast as a typical human?
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet)	{purdue,pur-ee}!l.cc!cik(UUCP)

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/11/90)

In article <2538@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
\\\
>The design time to make a decent integer multiplier, if you are going to
>have a floating point multiplier, is essentially ZERO.  How do you think
>a floating point unit works, anyhow?  Versatility costs little, and even
>the whole computing unit is relatively cheap.
\\\

_Essentially_ zero <> zero. I design the stuff and _my_ time
doesn't come cheap. I guess you have a point `tho -- versatility
_does_ have a payoff, but it is just one of the factors that
go into the equation.

\\\
>instructions for a multiply.  I suggest that Mr. Horsell try to program
>the operations and see how many instructions are needed to do a 32x32
>unsigned to 64 multiplication if only 32 bit arithmetic is present.
\\\

My assembler days are 10-20 years in the past (and good riddance)!
Counting instructions was never my idea of fun anyway. In any case, as 
my supervisors at the time told me, programming time (particularly 
code optimization) cost more than the computer time saved.

\\\
>'bc' is adequate for a more precise pocket calculator.  Besides, the
>paper-and-pencil multiply used digit x digit -> double digit.
>Anyone who isn't brainwashed into decimal arithmetic (and binary
>was not used much back when I got my PhD) would design everything
>in binary, anyhow.  I can discard these prejudices and work with
>octal and hex; for some of what I do, I must.
\\\

The point I was trying to make with `bc' was that you didn't _need_
int x int -> long; perhaps an overkill? I wasn't actually
_proposing_ that h/w should do decimal arithmetic (but it
makes you wonder...when the actual i/o of numeric data compares
with that for _doing_ the calculations you might save time).

>> Another technique is using `carryless' modular arithmetic with
>> appropriate bases (e.g. 2^k+1).

>Useful for integer arithmetic when it is known that remainders are
>zero only.  And the results cannot be read, or scaled.

That's what my DEKv1 says as well. But it _is_ possible to convert them to 
mixed-radix representation and then do the division ya know. 
Another possibility is to do a restoring division using the modular form.
Since the dynamic occurrence of divisions is not _that_ high you gain. 

\\\
>> P.S. I'm still interested in how much of the Sun Sparc's `bc' is
>> 	written in assembler; it's _awful_ fast. So far no
>> 	takers to my original post...
>
>Is it that fast, considering that the machine is well over a million
>times as fast as a typical human?
\\\

The original Q I posted regarded the comparison of the Sparc's `bc'
v/v some other extended precision routines I had to hand,
some off the net, some written elsewhere, and some from my own
hand. `bc' beat them all. Why? Written in assembler maybe?

-Kym Horsell

spot@TR4.GP.CS.CMU.EDU (Scott Draves) (09/11/90)

Herman Rubin sez:
>The design time to make a decent integer multiplier, if you are going to
>have a floating point multiplier, is essentially ZERO.  How do you think

This integer multiply would have to be a fp instruction, because the
chip probably has separate integer and fp register files, and/or it
is super-scalar.  11 bits plus extra control stuff doesn't sound like
ZERO work to me.

Anyway, exactly how much slower is it to hand-code arbitrary precision
arithmetic? 10 times?  100 times?  Do I really care? Do the chip makers
care? What fraction of people use fp compared with arb. prec. arith?
How often would your bit stream instruction(s) be used?  What are the
chances that it would fit into a RISC pipeline?

You're right when you complain that chips these days aren't good for
what you do.  But I don't care, and neither do many other people.
Flame off.


			Consume
Scott Draves		Be Silent
spot@cs.cmu.edu		Die

bs@linus.mitre.org (Robert D. Silverman) (09/11/90)

In article <3977@bingvaxu.cc.binghamton.edu> vu0310@bingvaxu.cc.binghamton.edu.cc.binghamton.edu (R. Kym Horsell) writes:
:In article <119612@linus.mitre.org> bs@faron.UUCP (Robert D. Silverman) writes:
:\\\
:>Everyone keeps quoting this '99% of the time' stuff to me. I should
:>like to point out that there are applications for which having
:>32 x 32 -- > 64   and 64 / 32 = 32 quotient & remainder  are absolutely
:>essential. Basically, ANYTHING involving multiprecision arithmetic
:^^^^^^^^^^^
:>requires these. (on a 32 bit machine that is)
: ^^^^^^^^
:\\\
:
:Don't you mean `useful' of `efficient' here? 
:
 
I wish people would refrain from shooting their mouths off about
multi-precision arithmetic until they've actually done a significant
amount of it. Typically, only having 32 x 32 --> 32 slows down
multi-precision arithmetic by about a factor of 10. Not having it
can render a machine all but useless for performing certain tasks.

:The RISC folks probably have the right attitude here -- only that stuff
:that happens a lot, or is _absolutely_, logically _required_ need be
:implemented in h/w (either that, or can be `tacked on' to what you
 
Happens a lot by whose definition? Yours? What the hell do you know
about how much computation in areas requiring multi-precision actually
takes place? The people who do that kind of stuff DON'T talk about it.

:were going to do anyway as make no matter to the cost)-- design
 
Take a look at the i860 sometime. People have been parading it as a
stellar example of a RISC processor. However, a good hunk of real
estate on the chip is taken up by a custom purpose processor whose
purpose is to aid graphics programming. Yet the damn thing can't
multiply and divide.

Repeat after me:

Addition, subtraction, multiplication, division.

These are the four fundamental operations of arithmetic.
 

Too often within this newsgroup I have observed that many people believe
that the entire world of computing is either matrix computations [or 
related work] or UNIX systems programming [or related work]. It is a
very parochial attitude.

:time means $$$$ after all, right?
 
Exactly. What good is a machine that takes 10 times as long as it should
to compute something, simply because the designer failed to add a few gates
to provide integer arithmetic.

Jobs requiring integer multiply/divide may only represent a small fraction
of the total number of jobs, but they typically have CPU requirements that
far exceed those of the other 99%. As I pointed out, I burned over
several hundred thousand CPU hours last year. I would wager that this is
more than rest of the ALL the readers of this newsgroup used put together.

:
:int x int -> long isn't _required_ for MP calcs, it's just _convenient_.
 
A factor of 10 in performance isn't mere 'convenience'. It can easily
change a computation from one that is feasible to do to one that is not.

:For example, the `bc' U*X calculator uses bytes to store pairs
:of decimal digits and then uses good old int x int -> int to perform
:a paper-and-pencil multiply.
 
'bc' is just slightly slower than molasses. It is a joke.

:
:Another technique is using `carryless' modular arithmetic with
:appropriate bases (e.g. 2^k+1).
 
Performing divisions in Residue Number representation is nearly
impossible. About the only way is to convert to normal representation
then convert back.

:
:So maybe some people burn a _lot_ of cycles doing this sort of
:stuff (I do myself as it happens), do the economics of design time,
:marketting & area indicate that h/w must support it? Maybe that 
 
The time to design a multiplier or divider is nil. Whether the market
requires it depends on who you are selling to. If you intend to sell
to the gov't for example, you damn well better supply the hardware they
need.

:P.S. I'm still interested in how much of the Sun Sparc's `bc' is
:	written in assembler; it's _awful_ fast. So far no
 
This proves how little you know about 'bc'. I have code for the SPARC
that beats it by almost two orders of magnitude when performing 
multi-precise multiplication/division. It's _awful_ slow.

-- 
Bob Silverman
#include <std.disclaimer>
Mitre Corporation, Bedford, MA 01730
"You can lead a horse's ass to knowledge, but you can't make him think"

przemek@liszt.helios.nd.edu (Przemek Klosowski) (09/11/90)

In article <1990Sep10.162839.19226@cs.rochester.edu> crowl@cs.rochester.edu (Lawrence Crowl) writes:
>In article <1660@s6.Morgan.COM> amull@Morgan.COM (Andrew P. Mullhaupt) writes:
>>do not. Now the compilers of the world can get rid of integer multiplies in
>>address arithmetic _if_ they can figure out the sizes of the arrays.
>
>This is incorrect.  Compilers can get rid of integer multiplies (actually
>implement them as shifts and adds) when the sizes of the array ELEMENTS are
>known.
>  Lawrence Crowl		716-275-9499	University of Rochester

I think that Lawrence misunderstood Andrew. You DO need to know the size
of array if you want to do multidimensional arrays (actually you need to 
know the all but the last dimensions of a matrix).  For instance, assuming
A as 10x10x10x10x10 matrix, stored 'column-first' with indices running
from 0, address of A[1,2,3,4,5] is A+2*10*3*100+4*1000+5*10000


--
			przemek klosowski (przemek@ndcva.cc.nd.edu)
			Physics Dept
			University of Notre Dame IN 46556

amull@Morgan.COM (Andrew P. Mullhaupt) (09/11/90)

In article <2534@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes:
> In article <1660@s6.Morgan.COM>, amull@Morgan.COM (Andrew P. Mullhaupt) writes:
> > As for integer divide, less of a case can be made, and I can do without
> > it.
> 
> Maybe you do not see a need for it, but lots of people do; even floating
> divide with integer quotient and floating remainder.  

This one you get with most IEEE coprocessors via one or two instructions.

> 
> > Since this is comp.arch, I'll ask. How much chip would a 16x32 integer
> > multiply take? You could use this for a lot of address arithmetic
> > and build the full 32x32 multiply out of a few calls to this. In the
> > same vein, an 8x32 might even be a profitable use of chip. How does
> > this VLSI trade-off work out? 
> 
> The floating point unit already has at least most of the hardware required.
> If the 53x53 floating point unit were slightly modified to give 64 bits of
> output, almost no additional chip area would be required.  Floating point
> units still do their arithmetic as integer arithmetic, with front and back
> ends.

On machines with an integrated FPU (like the i486 and i860) this makes
some sense. In fact _no_ modification is necessary to "widen" the FPU
since it already computes 64 bit mantissae in most cases. Now some
languages have provided support for this - notably Turbo Pascal on the
PC-compatibles with its extended and (just for you, Herman) comp
types. On machines with off-chip FPUs, (i.e. the coprocessors of the
world) this idea isn't necessarily a winner since there can be a 
significant delay moving the data between chips. No, the problem here
is for chips like the SPARC which would have to put the instruction
on a chip where _no_ real estate is devoted to the FPU.


Now the need for integer multiplies is real - although strength
reduction is very useful, it is nearly impossible if the access
to the array is not direct. In particular, pivoting (often
necessary in numerical linear algebra) disorganizes the sequential
nature of array accesses and prevents strength reduction. This
type of access usually occurs in a subroutine where the array is
a formal parameter, and so the indexing cannot be simplified at
compile time in most languages. (FORTRAN 90 is somewhat of an
exception since the shape of an array parameter is always known
at run time and the compiler could take advantage of this. In C
this situation is pretty grim if integer multiplications are
slow...) N.B.: To all those who have "never seen code where the array
shape is not known at compile time" I would ask that they stop
sending me mail and take a look at almost any library of subroutines
for numerical computation. This kind of code is very common.

Later,
Andrew Mullhaupt

vu0310@bingvaxu.cc.binghamton.edu (R. Kym Horsell) (09/12/90)

In article <119733@linus.mitre.org> bs@gauss.UUCP (Robert D. Silverman) writes:
\\\
>This proves how little you know about 'bc'. I have code for the SPARC
>that beats it by almost two orders of magnitude when performing 
>multi-precise multiplication/division. It's _awful_ slow.
\\\

An interesting comment. Although I only do simple-minded stuff, that
would mean your routines can calc the first 1000 factorials in 20 ms --
pretty good! I'd submit those routines to the net!

-Kym Horsell

bs@linus.mitre.org (Robert D. Silverman) (09/12/90)

In article <1990Sep10.215549.26260@uncecs.edu> urjlew@uncecs.edu (Rostyk Lewyckyj) writes:
:In article <119244@linus.mitre.org>, bs@linus.mitre.org (Robert D. Silverman) writes:
:> The four basic operations of arithmetic are +, -, x, /. Any computer that
:> can't perform them on its atomic data units [whatever the word size is]
:> is a joke.
:
:
:If the basic unit for representing a number is a word, then to handle
:the product of two numbers you may well need a double word. But then
:you must provide operations to handle double words. So then the double
:word becomes a basic unit, and so on ad infinitum. If in addition to

What is needed is simply a single, temporary double word. Typically
after forming a double word product it is then reduced to a single
word. e.g. compute AB/C,  AB mod C, etc. In any event, your response
adresses LANGUAGE issues, not hardware. I'm quite happy to work in 
assembler to handle double length registers as long as the HARDWARE
provides the capability. 

Note that for language purposes one could provide an in-line routine
to multiply two words and return the upper/lower halves. Similarly
a division routine could take the two halves and divide by a third
number.

-- 
Bob Silverman
#include <std.disclaimer>
Mitre Corporation, Bedford, MA 01730
"You can lead a horse's ass to knowledge, but you can't make him think"

tbray@watsol.waterloo.edu (Tim Bray) (09/13/90)

Some time ago, rwallace@vax1.tcd.ie wrote:
>There is practically no processing done which depends on integer * and / being
>fast (accessing an array of structures doesn't count...

First off, it is VERY DANGEROUS to make statements like that without a LARGE
suite of benchmarks and end-user surveys to back them up.  Go ask the guys
who designed the sparc.

Second, I think this statement is wrong.  Associative lookup, i.e. symbol
table management, is fundamental to many modern applications - compilers,
interpreters, very high level languages, computer algebra systems,
PostScript.  Symbol tables are usually done with hashing.  Hashing functions
very often involve multiplication (but note the neat algorithm for string
hashing in the recent CACM).

Just another data point,
Tim Bray, Open Text Systems, Waterloo, Ont.

rpeglar@csinc.UUCP (Rob Peglar) (09/13/90)

In article <119612@linus.mitre.org>, bs@linus.mitre.org (Robert D. Silverman) writes:
(bunch of stuff deleted)
>  
> The SPARC architecture did not use to have integer multiply and divide.
> They finally decided to add it. WHY? It is rumored that a certain
> gov't. agency complained so much about its lack that SUN had to put it
> in. Ask yourself what gov't. agency might require lots of multiprecise
> arithmetic. I do not know this for a fact, it is merely a rumor I heard.
> 

Isn't it funny how quite a bit of the uP world is rediscovering, reinventing,
rehashing, reliving, and redying all sorts of episodes that the MF and super
world went through, lo these many years ago?

"certain government agencies....forcing architectural changes..."

we've seen it all before.  no one should be surprised.

FP vs. arbitrary integer?  a *marketing* issue, not architectural.  If you
want to play the game, you have to know the rules.  In certain worlds, FP
rules.  In others, integer rules.  Pun intended.  Sure, there are many
clever ways to implement both.  Ya gotta sell it, though.  Note, no value
judgements here - enjoyable discussions.  It's just when people apply 
architectural prejudice on one or the other that the discussions become
absurd.  Re the coupla-months-ago thread about fork() vs. vfork().

Processors are fine.  Let's put our brains to work on memory and I/O,
the real hard issues.

Rob

uunet!csinc!rpeglar
-- 
Rob Peglar	Comtrol Corp.	2675 Patton Rd., St. Paul MN 55113
		A Control Systems Company	(800) 926-6876

...uunet!csinc!rpeglar

pcg@cs.aber.ac.uk (Piercarlo Grandi) (09/14/90)

On 12 Sep 90 15:15:46 GMT, bs@linus.mitre.org (Robert D. Silverman) said:

bs> In any event, your response adresses LANGUAGE issues, not hardware.
bs> I'm quite happy to work in assembler to handle double length
bs> registers as long as the HARDWARE provides the capability.

Agreed - on the other hand you should provide both

	single * single => double
	double / single => single

and

	single * single => single
	single / single => single
	
	
for compiler writers, because handling register pairs in code generators
is notoriously a pain, and many languages only need (more or less
regrettably) the all single length version (with or without an overlow
indication).

bs> What is needed is simply a single, temporary double word. Typically
bs> after forming a double word product it is then reduced to a single
bs> word. e.g. compute AB/C, AB mod C, etc.

bs> Note that for language purposes one could provide an in-line routine
bs> to multiply two words and return the upper/lower halves. Similarly
bs> a division routine could take the two halves and divide by a third
bs> number.

You may find agreement in an old and very nice paper (Software Practice
and Experience, early seventies) by Martin Richards (he of BCPL and
tripos fame) about having just a *single* procedure call he calls
muldiv, that returns the result of doing _both_ things, i.e.

	(single * single => double) / single => single

He claims that in *many* case this is all that is needed to obviate the
lack of double length integers, or even floating point, as it is the
crucial thing for efficient fixed point arithmetic, and it also need
never expose double length integers to the language.

Such a procedure is easily codable as an inline in GNU C/Objective-C/C++,
or with SUN's .il files, and can solve an awful lot of problems.

As you say above, one could also have mulrem, as in

	(single * single => double) % single => single

I may be dreaming, but I think that both actually already exist in
Forth, and for the same reason as in BCPL: because they are word
oriented languages, they only have a single length of integer, and thus
the issue of double length (especially if the basic word is 16 bits) is
pressing.

It may also help to have double shifts, but I am not so sure.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

yodaiken@chelm.cs.umass.edu (victor yodaiken) (09/26/90)

In article <1990Sep11.035826.22880@cs.cmu.edu> spot@TR4.GP.CS.CMU.EDU (Scott Draves) writes:
...
>
>Anyway, exactly how much slower is it to hand-code arbitrary precision
>arithmetic? 10 times?  100 times?  Do I really care? Do the chip makers
>care? What fraction of people use fp compared with arb. prec. arith?
>How often would your bit stream instruction(s) be used?  What are the
>chances that it would fit into a RISC pipeline?
>
>You're right when you complain that chips these days aren't good for
>what you do.  But I don't care, and neither do many other people.
>Flame off.

The architectural
requirements for arthmetic are interesting. Estimates of the current
market are far less so.