dan@srs.UUCP (Dan Kegel) (08/11/88)
Sun's compilers seem to let you multiply two short integers and get all 32 product bits, if you code like this: register short x, y; register long z; z = x * y; If the compiler is nice, it emits the correct (16x16=32) multiply instruction. If it isn't nice, it emits the slower (32x32=64) instruction and throws away the top 32 bits. Do most modern compilers perform this optimization? It would be nice to be able to depend on it. -- Dan Kegel "We had to get it passed before the columnists attacked!" srs!dan@cs.rochester.edu rochester!srs!dan dan%srs.uucp@harvard.harvard.edu
ark@alice.UUCP (08/11/88)
In article <948@srs.UUCP>, dan@srs.UUCP writes: > Sun's compilers seem to let you multiply two short integers and get all > 32 product bits, if you code like this: > register short x, y; > register long z; > z = x * y; > If the compiler is nice, it emits the correct (16x16=32) multiply instruction. > If it isn't nice, it emits the slower (32x32=64) instruction and throws > away the top 32 bits. > Do most modern compilers perform this optimization? > It would be nice to be able to depend on it. The effect of the statment above is undefined. To get the right answer, you should write z = (long) x * (long) y; Actually, it is enough to cast only one of x and y, so you can say z = (long) x * y; but that may be a little confusing. If your compiler is clever, it will realize that it can use the multiply instruction that takes short operands into a long result. If it's not clever, you'll still get the right answer. Wouldn't you rather risk having your program get the right answer slowly rather than risk getting the wrong answer outright?
cik@l.cc.purdue.edu (Herman Rubin) (08/11/88)
In article <8101@alice.UUCP>, ark@alice.UUCP writes: > In article <948@srs.UUCP>, dan@srs.UUCP writes: > > > Sun's compilers seem to let you multiply two short integers and get all > > 32 product bits, if you code like this: > > register short x, y; > > register long z; > > > z = x * y; > > > If the compiler is nice, it emits the correct (16x16=32) multiply instruction. > > If it isn't nice, it emits the slower (32x32=64) instruction and throws > > away the top 32 bits. > > > Do most modern compilers perform this optimization? > > It would be nice to be able to depend on it. > > The effect of the statment above is undefined. > > To get the right answer, you should write > > z = (long) x * (long) y; > > Actually, it is enough to cast only one of x and y, so you > can say > > z = (long) x * y; > > but that may be a little confusing. If your compiler is clever, > it will realize that it can use the multiply instruction that > takes short operands into a long result. If it's not clever, > you'll still get the right answer. > > Wouldn't you rather risk having your program get the right answer > slowly rather than risk getting the wrong answer outright? It is a rare compiler which can recognize that it can ignore type casting and use a different instruction to get a result. It is hard enough to get a compiler to do the obvious. I find the attitude of "ark" execrable; this is what has produced our current bad software and also bad hardware. Unfortunately, hardware manufacturers do not realize that people like dan and me know how to use machine instructions which the language gurus have not seen fit to put into the language, and to recognize that hardware constructs beyond the scope of the language can be useful. We need the type of operator overloading which dan is using. If the hardware supports 16 x 16 -> 32 multiplication, it should be done with that instruction. On the VAX or PYRAMID or CRAY or CYBER, it can only be done the way "ark" suggests. I suggest that the compiler implementer be required to enable dan to put in his improvement even if the implementer had not thought of it. Another example of the neglect is the problem of the power operator. This has been discussed in this group, and I will not elaborate. However, the minimalists (do it using the current C tools) do not even meet "ark"s requirement of getting the correct answer slowly. How about the use of long long? I needed to compute a function which required unpacking a double into an exponent field and a mantissa (long long), doing integer and boolean arithmetic starting with the mantissa, etc., and putting things together to get a double result. Needless to say, I had to do this in assembler. BTW, my biggest difficulty was the fact that hex was not the default. How about multiplying two longs to get a long long, both signed and unsigned? Few machines have the unsigned version, cheap in hardware but expensive in software. And how about division? One much needed instruction, cheap in hardware and expensive in software, is dividing a float (of any length) by a float and getting an integer quotient and a floating remainder. And there was the long discussion in this group about how integer/integer should be treated if they are not positive. We need software which allows those who understand their machine to exploit the capabilities. We need hardware which provides the capabilities. We do not need to be told that "this is not portable; you must not do it." -- Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907 Phone: (317)494-6054 hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)
braner@batcomputer.tn.cornell.edu (braner) (08/12/88)
[] I ran into a related bug in Laser C on the Atari ST. The (legal) code was: struct foo *p; /* 32-bit pointers */ int n; /* 16-bit ints <<<<<<<<<<<<<<<<<<<< */ /* (In my view the right thing on a 68000) */ ... p += n; /* <<<< here was the bug */ The compiler, turns out, did the last instruction as follows: load n load sizeof(struct foo) multiply using the 68000 mulu instruction (16*16=32 bits) truncate to an int (16 bits) <<<<<<<<<< !!!!!! extend to a long (32 bits) (actually the last two were done together as ext.l d#) long-add to p Of course, for (n > 64K/sizeof(struct foo)) the address thus calculated was wrong. The ironic part of it was that the compiler had the 32 bit result right there, and had to stomp on it with the extra "ext.l"... When coding the related C expression &p[n] the same compiler did it correctly -- but slowly, as: load n extend to long load sizeof(struct foo) as a LONG constant call the long-multiply library function (32*32=32 bits) long-add result to p this time not using the perfectly adequate "mulu" machine instruction, which could be recognized as adequate since 'n' was a 16-bit int and sizeof(struct foo), a constant known at compile time, was 6 bytes (< 64K). I was forced to replace "p+=n;" with "p=&p[n];" and take the performance loss. Happy debugging! - Moshe Braner (Try debugging on a parallel machine :-)
bill@proxftl.UUCP (T. William Wells) (08/12/88)
In article <8101@alice.UUCP> ark@alice.UUCP writes: : In article <948@srs.UUCP>, dan@srs.UUCP writes: : : > Sun's compilers seem to let you multiply two short integers and get all : > 32 product bits, if you code like this: : > register short x, y; : > register long z; : : > z = x * y; : : > If the compiler is nice, it emits the correct (16x16=32) multiply instruction. : > If it isn't nice, it emits the slower (32x32=64) instruction and throws : > away the top 32 bits. : : > Do most modern compilers perform this optimization? : > It would be nice to be able to depend on it. : : The effect of the statment above is undefined. That is not true. The bit of code above is equivalent to: register short x, y; register long z; int tmp1; int tmp2; int tmp3; tmp1 = (int)x; tmp2 = (int)y; tmp3 = tmp1 * tmp2; z = (long)tmp3; What this means is that if the product of the values of the two shorts will fit in an int, the code will work as expected. If not, then the result IS undefined. To make this statement work do: : z = (long) x * y; as suggested. And, as long as the product of the two shorts will fit in a long, this will work. --- Bill novavax!proxftl!bill
radford@calgary.UUCP (Radford Neal) (08/14/88)
In article <948@srs.UUCP>, dan@srs.UUCP writes: > Sun's compilers seem to let you multiply two short integers and get all > 32 product bits, if you code like this: > register short x, y; > register long z; > > z = x * y; In article <8101@alice.UUCP>, ark@alice.UUCP writes: > The effect of the statment above is undefined. > > To get the right answer, you should write > > z = (long) x * (long) y; > ... > Wouldn't you rather risk having your program get the right answer > slowly rather than risk getting the wrong answer outright? In article <864@l.cc.purdue.edu>, cik@l.cc.purdue.edu (Herman Rubin) writes: > It is a rare compiler which can recognize that it can ignore type > casting and use a different instruction to get a result. It is hard > enough to get a compiler to do the obvious. I find the attitude of > "ark" execrable... Very puzzling. Why is informing people that their programs may break when moved to another compiler such a crime? If you want to argue that the C language _ought_ to be defined so that the product of two shorts is computed to long precision, that's another issue, though some might object to the code this would imply for: int a, b, c, d; a = (a*b*c*d) % 10; By your logic, the intermediate result (a*b*c*d) should be computed to 128-bit precision, assuming int's are 32 bits. > We need software which allows those who understand their machine to exploit > the capabilities. We already have such software. It's called an assembler. > We need hardware which provides the capabilities... If you're willing to pay enough, I'm sure someone will oblige. Actually, your best bet in this case would seem to be RISC machines with software multiplication and division, then you can do whatever you want. > We do not need to be told that "this is not portable; you must not do it." Any professional programmer ought to realize when a programming technique is not portable. Whether they choose to do it anyway is another matter. In fact, "ark" was arguing that including casts got you both - a portable program, and a fast one if the compiler is smart. Contorting your program to get fast code out of dumb compilers is a technique of limited usefullness. Radford Neal
throopw@xyzzy.UUCP (Wayne A. Throop) (08/16/88)
> From: cik@l.cc.purdue.edu (Herman Rubin) >> ark@alice.UUCP writes: >> Wouldn't you rather risk having your program get the right answer >> slowly rather than risk getting the wrong answer outright? > [...] I find the attitude of > "ark" execrable; this is what has produced our current bad software > and also bad hardware. "Thud." Sorry, my jaw just hit the floor. It is hard to conceive of anybody finding ark's "attitude" anything perjorative, let long "excrable". Look, folks, let me clue you in. I don't care how fast your CPU, how clever your legions of coders and microcoders, nor do I care how many femtoseconds can be shaved by using the software that results upon such a CPU. If the silly thing gets the WRONG ANSWER, it just doesn't matter how fast it is delivered. I simply don't need a hyper-fast pile of junk driven by oh-so-clever bit-twiddlers telling me that pi is 3. (Or, to use real examples, drawing multiple copies of my cursor, or aborting a login session because one program is ill, or crashing a system because one program uses "too much" memory, or any other of the "wrong answers" I am given every day in the name of "speed"). Hmpf. -- Nothing is impossible. Some things are just less likely than othes. --- Jonathan Winters in "The Twilight Zone" -- Wayne Throop <the-known-world>!mcnc!rti!xyzzy!throopw
Paul_L_Schauble@cup.portal.com (08/17/88)
> From: cik@l.cc.purdue.edu (Herman Rubin) >> ark@alice.UUCP writes: >> Wouldn't you rather risk having your program get the right answer >> slowly rather than risk getting the wrong answer outright? > [...] I find the attitude of > "ark" execrable; this is what has produced our current bad software > and also bad hardware. Gee. If you don't mind getting the wrong answer, I can make ANY program run a lot faster. RESULT = 0. RETURN END --PLS
jfh@rpp386.UUCP (The Beach Bum) (08/17/88)
In article <576@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes: >What this means is that if the product of the values of the two >shorts will fit in an int, the code will work as expected. If >not, then the result IS undefined. since this discussion orginally involved 16x16=32 multiply instructions, i thought it might be nice to point out that any two 16 bit numbers can be multiplied, and the result will always fit in 32 bits, without overflow. unsigned: FFFF x FFFF = FFFE0001 signed: 7FFF x 7FFF = 3FFF0001 -- John F. Haugh II +--------- Cute Chocolate Quote --------- HASA, "S" Division | "USENET should not be confused with UUCP: killer!rpp386!jfh | something that matters, like CHOCOLATE" DOMAIN: jfh@rpp386.uucp | -- apologizes to Dennis O'Connor
jlg@beta.lanl.gov (Jim Giles) (08/20/88)
In article <1810@vaxb.calgary.UUCP>, radford@calgary.UUCP (Radford Neal) writes: > int a, b, c, d; > > a = (a*b*c*d) % 10; > > By your logic, the intermediate result (a*b*c*d) should be computed to > 128-bit precision, assuming int's are 32 bits. That's not necessarily true. Your particular example does not require any precision above 32 bits. The exact answer for the given statement is the same as for: int a, b, c, d; a=((a%10)*(b%10)*(c%10)*(d%10))%10 or (with 64-bit intermediates): a=(((a*b)%10) * ((c*d)%10)) %10 Of course, neither of these really works in C because parenthesis don't force execution order. But the compiler is free to do the computation in either way (or any of several other ways). The same is true if some of the multiplies are replaced by adds or subtracts. I tend to agree with those who claim a 'correct' answer is better than a fast answer. The problem is that C doesn't specify what the 'correct' answer is. If C explicitly said that all integer arithmetic should be carried out modulo 2^(wordsize), then (a*b*c*d)%10 should be done the obvious fast way - people who naively assume that the program should give the mathematically 'correct' answer would deserve what they got. On the other hand, if C explicitly said that all integer arithemtic was mathematically exact (as long as the answer didn't overflow), then one of the more careful methods of evaluating the result should be used. This whole problem arises because C (neither K&R nor the ANSI standard) doesn't say how the arithmetic should be done. J. Giles