[comp.lang.c] 16x16 = 32

dca@kesmai.COM (David C. Albrecht) (10/14/87)

Okay,
It seems quite apparent that I wasn't clear enough in my original posting
because most of the responses I have gotten via mail or on the net don't
really address the point I trying to get commentary on.

Okay, first off I am quite well aquainted with the C typing rules.
Second, my point was one of OPTIMIZATION not typing.  There is
no question that the fully unoptimized expression (where ints are 16 bits
and longs are 32 bits).

   ((long) (x * y)) where x and y are ints says:

   get int x
   get int y
   multiply using operation appropriate for 16 bit ints
   convert result to long (32 bits).

   This is obvious.

In no way was I trying to say that the code produced by Mark Williams
for this operation is incorrect as it most certainly is.  I was addressing
one of the other ways to skin a cat.

In virtually every machine I know of the multiplies produce a
double width result (PDP-11, VAX, 68000, 8?86, ad nauseum).  Extending
the result of a multiply to long is a NEEDLESS operation.  Eliminating
NEEDLESS operations is what optimization is all about.  The whole point
of my posting was not that the code they generated was incorrect, it most
certainly was correct code.  The point was that they claimed that
eliminating the extend to long was INCORRECT and would cause their compiler
to fail acceptance testing.  That is what I think is BOGUS.  What possible
point could there be to testing for garbage out from a multiply when the
numbers coming in cause a result too large to fit in the same type.
I can't think of any instance when the full 32bit accuracy wouldn't
be preferable.

The only response to this that I thought even came close to addressing the
point was that someone made the comment that some compilers produce code
that traps on results longer than 16 bits from a 16x16 multiply.  If
Mark Williams did this in their code then I would agree that is more
correct result and it would be preferable to just taking the full 32
bit accuracy.
But they don't and it ain't.

You have to realise that my application is a flight simulator and it
uses a large number of multiplies.  It is hardly a negligable affect
as it gives roughly a 30% speedup on the Macintosh to use MULS instead
of the software simulated 32 bit multiply.  Getting the compiler to
generate the MULS is preferable because it makes better use of registers
and doesn't cause the call overhead.  Barring that, there is of course
always assembly code.  Most likely I will use a macro to do the multiplies
that will generate either a procedure call or my ((long) (x*y)) construct.
If the compiler has an intelligent optimizer I will use it, if it doesn't
I can always fall back to the assembler call.

I think it highly unlikely that ((long) x) * y would give me a better
result as then the compiler has to realise that an expression now cast
as long was previously short so that a 16x16 multiply is
perfectly viable.  I would speculate that this is easily as complicated
if not more than realising that the result of a MULS operation doesn't
require an extend to long.  Granted, it expresses the result wanted
better but in reality I think it highly doubtful you could find a compiler
for the Amiga, ST, or Mac that would generate anything but 32x32 operations
from the expression.

Finally.  Again, I'm not debating that the code MW generates is correct. I'm
debating their assertion that code that doesn't throw away the high
bits in the result of a 16x16 multiply when extending to 32 bits
is incorrect and would fail acceptance testing.  I was trying to
see if anyone knew of an acceptance test or a VALID reason why
not sign extending over the high bits is an INVALID result.

David Albrecht

firth@sei.cmu.edu (Robert Firth) (10/16/87)

In article <143@kesmai.COM> dca@kesmai.COM (David C. Albrecht) writes:

>In virtually every machine I know of the multiplies produce a
>double width result (PDP-11, VAX, 68000, 8?86, ad nauseum)

(1) On the PDP-11, the MUL instruction produces a single-length
    result if the destination is an odd-numbered register, and
    a double-length result if it's an even-numbered register

(2) On the Vax-11, the MULx instructions all produce single-length
    results.  A double-length result requires either first widening
    byte or word, or using EMUL for longword.

(3) On the MC68020, the result of MULS is either single-length or
    double-length, depending on the Sz field of the instruction;
    the lower members of the range do not have a hardware 32-bit
    multiply.

And with that piece of gratuitous misinformation out of the way, to the
point.  The task of a compiler is not to provide a more readable form
of the machine's instruction set.  It is to implement the language as
defined in the standard.  If the standard says that int*int => int, then
that's what you implement, and anything else is plain WRONG.  If the
standard says you are allowed to keep intermediate results to higher
range or precision, then you may do so; but you would be well advised
to understand the consequences - in particular, that seemingly harmless
changes to a working program, like taking an expression out of line, or
doing common subexpression elimination by hand, may break a working
program.

When in doubt, implement the canonical semantics.  At the very least, it
is a courtesy that will be appreciated by the user who has taken the
trouble to read the reference manual.

crowl@cs.rochester.edu (Lawrence Crowl) (10/16/87)

In article <143@kesmai.COM> dca@kesmai.COM (David C. Albrecht) writes:
>The point was that [Mark Williams] claimed that eliminating the extend to long
>was INCORRECT and would cause their compiler to fail acceptance testing.  That
>is what I think is BOGUS.  ... I was trying to see if anyone knew of an
>acceptance test or a VALID reason why not sign extending over the high bits is
>an INVALID result.

Many random number generators and hash functions rely on the result of an out
of range multiplication returning a result within the range, modulo (sort of)
the word size.  They rely on 7 + 3 == -6 for 4-bit two's complement words.

C does not provide integers, it provides words.  Words happen to have some
operations that are very similar to the operations we use for integers.
-- 
  Lawrence Crowl		716-275-9499	University of Rochester
		      crowl@cs.rochester.edu	Computer Science Department
...!{allegra,decvax,rutgers}!rochester!crowl	Rochester, New York,  14627

rwhite@nusdhub.UUCP (Robert C. White Jr.) (10/16/87)

In article <143@kesmai.COM>, dca@kesmai.COM (David C. Albrecht) writes:
> Finally.  Again, I'm not debating that the code MW generates is correct. I'm
> debating their assertion that code that doesn't throw away the high
> bits in the result of a 16x16 multiply when extending to 32 bits
> is incorrect and would fail acceptance testing.  I was trying to
> see if anyone knew of an acceptance test or a VALID reason why
> not sign extending over the high bits is an INVALID result.

This is fairly easy:

	1)  First, we must assume the compiler is designed to work
"general case" solutions, which most are.
	2)  Take the largest possible positive interger, or something
close to it.
	3)  Multiply this interger by something like 4 or 12.  Two
(2) is questionable, as the result would probably still fit in the
interger space.

	If you do not do sign extend durring multiply, i.e. land
the 16x16 value in a 32, the sign and value of the result is
anybodys guess.  Without the extra instructions, the sign is
whatever happened to land in the high order bit. [see ROL]
	The definition of "*" in C where:
int	x, y, z;
...
	z = x * y;

to the best of my knowlege is 16x16 placed in (sign+least-sig-15-bits)
with overflow checking if enabled.  If you used a 16 bit sig primitive
you would either need a chip that supports sign preservation in
all cases, or live with absolutly-no-idea-of-what-really-happend-on-
overflow situation, with no way to recover.
	If you use the "tightened definition" you would need a
totally different aproach dependant on each app subsection.  i.e.
doing a 64bit statistical myltiply in 8 bit segments to preserve
accuracy, while useing a tight 16bit for all small values of x, or
two+ 8bits for larger...etc.

SUMMARY:  You are a victem of best-aproach-for-general-case combined
	with a this-is-the-chip-you-get-so-tough-cookies arcitecture.
	If you *KNOW* the values are going to *ALWAYS* fit in
	15bits+sign then turn off overflow checking, or do it in
	some custom assembly.

Rob.

guy%gorodish@Sun.COM (Guy Harris) (10/18/87)

> ...There is no question that the fully unoptimized expression (where ints
> are 16 bits and longs are 32 bits).
> 
>    ((long) (x * y)) where x and y are ints says:
> 
>    get int x
>    get int y
>    multiply using operation appropriate for 16 bit ints
>    convert result to long (32 bits).
> ...
>
> ...I was addressing one of the other ways to skin a cat.

Given the semantics of C, the correct way to do this on a 68K C implementation,
with 16-bit "int"s, would be to do a MULS and then to sign-extend the low-order
16 bits of the result into a 32-bit quantity.  If, however, the result were
immediately converted to an "int" (e.g., if you had

	int x, y, z;

	z = (long) (x * y);

) it would be permissible merely to throw the low-order 16 bits into "z" and
throw away the high-order 16 bits.

(This means that a "typical" 68000 16-bit-"int" implementation of C will, for
the code fragment:

	int x = 30000;
	int y = 30000;

	printf("%ld\n", (long) (x * y));

print "-5888".  30,000 times 30,000 is 900,000,000; the hex value of this is
0x35a4e900.  Truncated to 16 bits, this is 0xe900, so the result of (x * y)
would, in this implementation, be 0xe900, which is -5888 in two's complement.
A C implementation with 16-bit "int"s could, of course give a different result,
or trap on overflow, because 900,000,000 won't fit into an "int", but "int" is
the type of the result of "(x * y)"; as K&R say, "the handling of overflow and
divide check in expression evaluation is machine-dependent".)

Now, if MWC says that they must sign-extend the result of the multiplication
*even if the result is immediately converted to "int"*, but they define the
result of a multiplication of an "int" by an "int" as the result you get by
multiplying the integer values of the two "int"s (i.e., doing real live
arithmetic, not machine arithmetic) and then taking the result modulo 2^16
(i.e., chopping off all but the low 16 bits), then they're full of prunes.
Given that the sign-extension will only tack 16 bits of 0 or 1 to the
front of the low-order 16 bits, and that the conversion of this back to "int"
will only chop those bits off again (K&R, "6.1 Characters and Integers": "When
a longer integer is converted to a shorter or to a 'char', it is truncated on
the left; excess bits are simply discarded."), doing the sign extension is
indeed needless *if* the "long" value is not to be used except to convert it
back to "int".

However, you are presumably not doing that (why convert it to "long" only to
immediately convert it back to "int"?); you say you want a 16x16->32 multiply,
which is not what "int * int" is in a C implementation with 16-bit "int"s.

> I think it highly unlikely that ((long) x) * y would give me a better
> result as then the compiler has to realise that an expression now cast
> as long was previously short so that a 16x16 multiply is
> perfectly viable.  I would speculate that this is easily as complicated
> if not more than realising that the result of a MULS operation doesn't
> require an extend to long.  Granted, it expresses the result wanted
> better but in reality I think it highly doubtful you could find a compiler
> for the Amiga, ST, or Mac that would generate anything but 32x32 operations
> from the expression.

Naah.  Our 68K C compiler, when asked to do:

	short x, y;
	long z;

	z = (long)x * (long)y;

will quite cheerfully do a MULS instruction to multiply "s1" and "s2", and
stuff the 32-bit result into "z".  It will do this even if you *don't* specify
optimization!  Since our compiler has 32-bit "int"s, the example above most
closely matches what a 16-bit compiler would do with "x" and "y" being "int"s.

It may be complicated (I seem to remember hearing that getting this right was a
bit tricky), but it's certainly not impossible.  If you can't find an Amiga,
ST, or Mac compiler that would turn:

	int x, y;
	long z;

	z = (long)x * (long)y;

into a MULS followed by a MOVE.L of the 32-bit result of the MULS into "z", it
means the state of the art for Amiga, ST, and Mac compilers isn't very advanced
at all.

> Finally.  Again, I'm not debating that the code MW generates is correct. I'm
> debating their assertion that code that doesn't throw away the high
> bits in the result of a 16x16 multiply when extending to 32 bits
> is incorrect and would fail acceptance testing.

Well, you're wrong.  In a C implementation with 16-bit "int"s, any
multiplication of two "int"s must, if it yields a result at all (which it may
not, if overflows are checked for), MUST yield a 16-bit result.  Sorry, but
that's the way it is.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

gwyn@brl-smoke.ARPA (Doug Gwyn ) (10/19/87)

In article <31167@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>In a C implementation with 16-bit "int"s, any
>multiplication of two "int"s must, if it yields a result at all (which it may
>not, if overflows are checked for), MUST yield a 16-bit result.

Although this is true for most operations on (unsigned), operations
on (int) that overflow are in the implementation-dependent category
and may even keep additional high-order bits, as I understand it.

jimp@cognos.uucp (Jim Patterson) (10/19/87)

In article <143@kesmai.COM> dca@kesmai.COM (David C. Albrecht) writes:
>Okay, first off I am quite well acquainted with the C typing rules.
>Second, my point was one of OPTIMIZATION not typing.  There is
>no question that the fully unoptimized expression (where ints are 16 bits
>and longs are 32 bits).
>
>   ((long) (x * y)) where x and y are ints says:
>
>   get int x
>   get int y
>   multiply using operation appropriate for 16 bit ints
>   convert result to long (32 bits).

>  The point was that they claimed that
>eliminating the extend to long was INCORRECT and would cause their compiler
>to fail acceptance testing.  That is what I think is BOGUS.  What possible
>point could there be to testing for garbage out from a multiply when the
>numbers coming in cause a result too large to fit in the same type.
>I can't think of any instance when the full 32bit accuracy wouldn't
>be preferable.

I think your logic is correct, and consistent with the ANSI C draft. A
compiler implementer could treat the 16x16 multiply operation as a
special case, and skip the sign-extend.  This is so because if the
multiply result fits in 16 bits, then the sign extend is unnecessary
(being done by the multiply instruction).  If it doesn't fit within 16
bits, then this constitutes an exception and the behavior is UNDEFINED
according to the ANSI draft (section 3.3 Expressions). The definition
of undefined behavior (section 1.6 definitions) states that
"Permissible undefined behavior ranges from ignoring the situation
... , to behaving ... in a documented manner characteristic of the
environment ... to terminating ..." (my paraphrasing). By this
definition, it is perfectly valid to implement the 16-bit multiply
optimization and document that 16-bit multiply's, extended to long,
will properly represent the 32-bit result. This fits with the note
"behaving in a documented manner".

Neither of these points is to say that compilers must make such
optimizations (I think you agree based on your earlier comments).  It
is only to say that a compiler implementor is allowed to make these
optimizations. For this reason, relying on a compiler to deal with the
16-bit multiply in this fashion is non-portable, as you have found
out. Given that compiler writers first need to be concerned with
implementing the semantics correctly before they can spend much
time on optimizations in the code generator, it isn't surprising that
many 16 bit compilers don't do this optimization. 
-- 
Jim Patterson                              Cognos Incorporated
UUCP:decvax!utzoo!dciem!nrcaer!cognos!jimp P.O. BOX 9707    
PHONE:(613)738-1440                        3755 Riverside Drive
                                           Ottawa, Ont  K1G 3Z4

guy%gorodish@Sun.COM (Guy Harris) (10/23/87)

> Although this is true for most operations on (unsigned), operations
> on (int) that overflow are in the implementation-dependent category
> and may even keep additional high-order bits, as I understand it.

But if the result of such an operation is an "int", where *could* it keep the
high-order bits?
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com