[comp.arch] Multiplying two shorts...

cik@l.cc.purdue.edu (Herman Rubin) (08/11/88)

In article <8101@alice.UUCP>, ark@alice.UUCP writes:
> In article <948@srs.UUCP>, dan@srs.UUCP writes:
> 
> > Sun's compilers seem to let you multiply two short integers and get all 
> > 32 product bits, if you code like this:
> >     register short x, y;
> >     register long z;
> 
> >     z = x * y;
> 
> > If the compiler is nice, it emits the correct (16x16=32) multiply instruction.
> > If it isn't nice, it emits the slower (32x32=64) instruction and throws
> > away the top 32 bits.
> 
> > Do most modern compilers perform this optimization?
> > It would be nice to be able to depend on it.
> 
> The effect of the statment above is undefined.
> 
> To get the right answer, you should write
> 
> 	z = (long) x * (long) y;
> 
> Actually, it is enough to cast only one of x and y, so you
> can say
> 
> 	z = (long) x * y;
> 
> but that may be a little confusing.  If your compiler is clever,
> it will realize that it can use the multiply instruction that
> takes short operands into a long result.  If it's not clever,
> you'll still get the right answer.
> 
> Wouldn't you rather risk having your program get the right answer
> slowly rather than risk getting the wrong answer outright?

It is a rare compiler which can recognize that it can ignore type
casting and use a different instruction to get a result.  It is hard
enough to get a compiler to do the obvious.  I find the attitude of
"ark" execrable; this is what has produced our current bad software
and also bad hardware.  Unfortunately, hardware manufacturers do not
realize that people like dan and me know how to use machine instructions
which the language gurus have not seen fit to put into the language,
and to recognize that hardware constructs beyond the scope of the
language can be useful.

We need the type of operator overloading which dan is using.  If the
hardware supports 16 x 16 -> 32 multiplication, it should be done with
that instruction.  On the VAX or PYRAMID or CRAY or CYBER, it can only
be done the way "ark" suggests.  I suggest that the compiler implementer
be required to enable dan to put in his improvement even if the implementer
had not thought of it.

Another example of the neglect is the problem of the power operator.  This
has been discussed in this group, and I will not elaborate.  However, the
minimalists (do it using the current C tools) do not even meet "ark"s
requirement of getting the correct answer slowly.

How about the use of long long?  I needed to compute a function which
required unpacking a double into an exponent field and a mantissa (long long),
doing integer and boolean arithmetic starting with the mantissa, etc.,
and putting things together to get a double result.  Needless to say, I
had to do this in assembler.  BTW, my biggest difficulty was the fact that
hex was not the default.

How about multiplying two longs to get a long long, both signed and unsigned?
Few machines have the unsigned version, cheap in hardware but expensive in
software.  And how about division?  One much needed instruction, cheap in
hardware and expensive in software, is dividing a float (of any length) by
a float and getting an integer quotient and a floating remainder.  And there
was the long discussion in this group about how integer/integer should be
treated if they are not positive.

We need software which allows those who understand their machine to exploit
the capabilities.  We need hardware which provides the capabilities.  We do
not need to be told that "this is not portable; you must not do it."
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

radford@calgary.UUCP (Radford Neal) (08/14/88)

In article <948@srs.UUCP>, dan@srs.UUCP writes:

> Sun's compilers seem to let you multiply two short integers and get all 
> 32 product bits, if you code like this:
>     register short x, y;
>     register long z;
>
>     z = x * y;

In article <8101@alice.UUCP>, ark@alice.UUCP writes:

> The effect of the statment above is undefined.
> 
> To get the right answer, you should write
> 
> 	z = (long) x * (long) y;
> ...
> Wouldn't you rather risk having your program get the right answer
> slowly rather than risk getting the wrong answer outright?

In article <864@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes:

> It is a rare compiler which can recognize that it can ignore type
> casting and use a different instruction to get a result.  It is hard
> enough to get a compiler to do the obvious.  I find the attitude of
> "ark" execrable...

Very puzzling. Why is informing people that their programs may break
when moved to another compiler such a crime? If you want to argue that
the C language _ought_ to be defined so that the product of two shorts
is computed to long precision, that's another issue, though some might
object to the code this would imply for:

       int a, b, c, d;

       a = (a*b*c*d) % 10;

By your logic, the intermediate result (a*b*c*d) should be computed to
128-bit precision, assuming int's are 32 bits.

> We need software which allows those who understand their machine to exploit
> the capabilities.

We already have such software. It's called an assembler.

> We need hardware which provides the capabilities...

If you're willing to pay enough, I'm sure someone will oblige. Actually, 
your best bet in this case would seem to be RISC machines with software
multiplication and division, then you can do whatever you want.

> We do not need to be told that "this is not portable; you must not do it."

Any professional programmer ought to realize when a programming technique
is not portable. Whether they choose to do it anyway is another matter.

In fact, "ark" was arguing that including casts got you both - a portable
program, and a fast one if the compiler is smart. Contorting your program
to get fast code out of dumb compilers is a technique of limited usefullness.

     Radford Neal

throopw@xyzzy.UUCP (Wayne A. Throop) (08/16/88)

> From: cik@l.cc.purdue.edu (Herman Rubin)
>> ark@alice.UUCP writes:
>> Wouldn't you rather risk having your program get the right answer
>> slowly rather than risk getting the wrong answer outright?

> [...] I find the attitude of
> "ark" execrable; this is what has produced our current bad software
> and also bad hardware.

"Thud."  Sorry, my jaw just hit the floor.  It is hard to conceive of
anybody finding ark's "attitude" anything perjorative, let long
"excrable".  Look, folks, let me clue you in.  I don't care how fast
your CPU, how clever your legions of coders and microcoders, nor do I
care how many femtoseconds can be shaved by using the software that
results upon such a CPU.  If the silly thing gets the WRONG ANSWER, it
just doesn't matter how fast it is delivered.  I simply don't need a
hyper-fast pile of junk driven by oh-so-clever bit-twiddlers telling me
that pi is 3.  (Or, to use real examples, drawing multiple copies of my
cursor, or aborting a login session because one program is ill, or
crashing a system because one program uses "too much" memory, or any
other of the "wrong answers" I am given every day in the name of
"speed").

Hmpf.

--
Nothing is impossible.  Some things are just less likely than othes.
                        --- Jonathan Winters in "The Twilight Zone"
-- 
Wayne Throop      <the-known-world>!mcnc!rti!xyzzy!throopw

Paul_L_Schauble@cup.portal.com (08/17/88)

> From: cik@l.cc.purdue.edu (Herman Rubin)
>> ark@alice.UUCP writes:
>> Wouldn't you rather risk having your program get the right answer
>> slowly rather than risk getting the wrong answer outright?

> [...] I find the attitude of
> "ark" execrable; this is what has produced our current bad software
> and also bad hardware.

Gee. If you don't mind getting the wrong answer, I can make ANY program
run a lot faster.

        RESULT = 0.
        RETURN
        END
 
  --PLS

jlg@beta.lanl.gov (Jim Giles) (08/20/88)

In article <1810@vaxb.calgary.UUCP>, radford@calgary.UUCP (Radford Neal) writes:
>        int a, b, c, d;
> 
>        a = (a*b*c*d) % 10;
> 
> By your logic, the intermediate result (a*b*c*d) should be computed to
> 128-bit precision, assuming int's are 32 bits.

That's not necessarily true.  Your particular example does not require
any precision above 32 bits.  The exact answer for the given statement
is the same as for:

	int a, b, c, d;

	a=((a%10)*(b%10)*(c%10)*(d%10))%10

or (with 64-bit intermediates):

	a=(((a*b)%10) * ((c*d)%10)) %10

Of course, neither of these really works in C because parenthesis
don't force execution order.  But the compiler is free to do the
computation in either way (or any of several other ways).  The same
is true if some of the multiplies are replaced by adds or subtracts.

I tend to agree with those who claim a 'correct' answer is better
than a fast answer.  The problem is that C doesn't specify what the
'correct' answer is.  If C explicitly said that all integer arithmetic
should be carried out modulo 2^(wordsize), then (a*b*c*d)%10 should be
done the obvious fast way - people who naively assume that the program
should give the mathematically 'correct' answer would deserve what they
got.  On the other hand, if C explicitly said that all integer arithemtic
was mathematically exact (as long as the answer didn't overflow), then
one of the more careful methods of evaluating the result should be used.

This whole problem arises because C (neither K&R nor the ANSI standard)
doesn't say how the arithmetic should be done.

J. Giles