laura@hoptoad.uucp (Laura Creighton) (02/04/86)
In article <11689@ucbvax.BERKELEY.EDU> weemba@brahms.UUCP (Matthew P. Wiener) writes: >.... There are several identities >running around that are incompatible. > > (1) a == (a/b) * b + a%b > (2) (-a)/b == -(a/b) > (3) (a+b)/b == a/b + 1 > (4) (a+b)%b == a%b > >Notice that (3) and (4) are compatible with what the number theorists want, >but (2) isn't. Sure the naive user is fooled by (2) under the version we >want, but then he's fooled by (3) and (4) in the usual version. (1) holds >when the / and % are both what the number theorist wants or when neither are >what the number theorist wants. While it is true that number theorists want 3 and 4, it is not the naive user who will be fooled by 2. It is the naive *mathematician*, which is just about everybody. To non-mathematicians, 2 is a law, with about the same force as the law of gravity, and not something that you can redefine. Sure they are wrong about the force of this law -- but blame the way mathematics is taught in hich schools and grade schools. In the meantime, only the mathematicians, as a class, will have the perspective to see this. Everybody else (again as a class) will look at (-a)/b != -(a/b) and say ***BUG!!!*** So, as a practical matter, the mathematicians will have to come up with a work-around, since only they are going to be able to understand what they want. In letting non-mathematicians design the languages they use, mathematicians may have seriously goofed, because the non-mathematicians may not understand what it is that the mathematicians want -- if they did they would be mathematicians. Of course, I am not sure that there is consensus among mathematicians as to what they want. If there is, maybe they should write their own language. [I'm glad I wrote that last line. It tells me where to post this fool thing, which has been in the back of my mind, bothering me, the whole time I was writing this. Before that, I was strongly tempted to post to net.philosophy...] -- Laura Creighton ihnp4!hoptoad!laura hoptoad!laura@lll-crg.arpa
lambert@boring.uucp (Lambert Meertens) (02/06/86)
In article <484@hoptoad.uucp> laura@hoptoad.UUCP (Laura Creighton) writes: > In article <11689@ucbvax.BERKELEY.EDU> weemba@brahms.UUCP (Matthew P. Wiener) > writes: >> .... There are several identities >> running around that are incompatible. >> >> (1) a == (a/b) * b + a%b >> (2) (-a)/b == -(a/b) >> (3) (a+b)/b == a/b + 1 >> (4) (a+b)%b == a%b >> >> Notice that (3) and (4) are compatible with what the number theorists want, >> but (2) isn't. Sure the naive user is fooled by (2) under the version we >> want, but then he's fooled by (3) and (4) in the usual version. [...] > [...] To non-mathematicians, 2 is a law, with about the same force as > the law of gravity, and not something that you can redefine. [...] > [They] will look at (-a)/b != -(a/b) and say ***BUG!!!*** I should think that to most people remembering something about what they were taught at school (3) is a forceful law as well. Together, (2) and (3) imply (1/2)*2 = 1 (which I dimly remember as also having been taught:-). ALGOL 68 has (2) and (4), and so lacks (1) and (3). A note I wrote about this (A note on integral division, ALGOL Bull. 39 (1976), 30-32) came to the conclusion that dropping (2) in favour of (1) and (3) would have been preferable. Next to arguments about what naive users would expect (where you obviously can't have it all), an important question is what is the most *useful* (convenient) set of laws in programming. Sometimes (2) is more useful, but my experience from over 25 years of programming is that (3) is the more convenient law in the majority of cases, even for non-mathematical problems. For a language for personal computing that I co-designed, honouring naive expectations was one of the design principles. The problem has been resolved there by the drastic measure of not providing a primitive operation for integral division. The mod function obeys (4). -- Lambert Meertens ...!{seismo,okstate,garfield,decvax,philabs}!lambert@mcvax.UUCP CWI (Centre for Mathematics and Computer Science), Amsterdam
jst@abic.UUCP (Shack Toms) (02/15/86)
[Kenneth Almquist @ Bell Labs, Holmdel, NJ writes (single >)] > > Perhaps K&R thought that the performance penalty of implementing a > > consistent modulus (or divide) was not justified, since negative > > integers are rarely encountered in "C" [this comment cannot be traced > > to K&R.] However, this performance penalty can be avoided simply by > > declaring unsigned integers as "unsigned int". > > On the VAX, the unsigned remainder function is implemented as a > subroutine call. The standard divide instruction cannot be used > because the divide instruction does not work on unsigned integers. Thanks for getting me to check this out! On my VAX using VAXC V2.1, the unsigned division is done with very ugly inline code... Thats ok, but it is incorrect ugly inline code. Seems that it does not correctly perform x%y where x = 0x7fffffff, y = 0xfffffff2. It gets the answer 1, rather than 0x7fffffff. However, the following inline fragment should work ok. clrl r1 movl x,r0 ; load x into r0,r1 as quadword movl y,r2 ; y in r2, set condition codes bgeq 1$ ; y < 2**31, ediv is ok cmpl r2,r0 ; y >= 2**31, we know x < 2*y bgtru 2$ ; y > x ==> x%y == x subl2 r2,r0 ; y <= x < 2*y ==> x%y == x-y brb 2$ ; 1$: ediv r2,r0,r1,r0 ; y in range for ediv, x%y put in r0 2$: ; r0 now contains x%y For signed integers x%y, the following seems about optimal, [this is from the VAXC code (almost)] emul #0,#0,x,r0 ; place x in r0,r1 as signed quadword ediv y,r0,r1,r0 ; r0 now contains x%y Comparing the two fragments, the code for the unsigned divide will probably run about as fast as the code for the signed divide, so long as the value of y is less than 2**31. This is because the VAX lacks not only an unsigned divide, but also a signed longword to quadword conversion (except for the trick with emul). And the signed modulus instruction (ediv) requires a quadword dividend. The clear winner in all of this is probably x%y, where x is unsigned and y is a short (either signed or unsigned). The code is then: clrl r1 movl x,r0 ; Unsigned x in r0,r1 movzwl y,r2 ; Unsigned short y in r2 (cvtwl for short y) ediv r2,r0,r1,r0 ; r0 contains x%y But enough of implemention details... > In Ada, division truncates towards zero, and there are separate > remainder and modulo operators. The % operator in C is a remainder > operator; we could have a new operator %% for modulo. (On the other > hand, maybe not. We don't want C to get as large as Ada!) I certainly agree with that last remark. You say, however, that the % operator is a remainder operator. Sure, the definition is that a/b can round up or down [except when both a and b are positive], but a/b*b+a%b must equal a. The problem is that this selection of properties is shared by at least 8 possible funcions from Z X Z --> Z. Indeed, there may be many more than 8 functions. As I read my K&R there is nothing wrong with a compiler evaluating division of constants (constant folding) in a different manner from the run-time evaluation of the same values. Furthermore, the preprocessor may evaluate division in compile time expressions (say in #if statements) using a different algorithm. (These possibilities are probably greater with cross-compilers). Furthermore, if the denominator is a constant power of two, then the compiler might generate shifts and masks, and produce a result different from that of a division of the same values had they both been stored in variables. Similarly, for certain values of the numerator, the division can be optimized into a comparison with a result generated according to its own rule. The requirement seems to be only that, in each case, the a/b*b+a%b == a rule must hold. That is, whenever an optimization can be made for an expression involving "/", that the corresponding optimization also must be made for "%". Giving this operator a simple name ("remainder operator") belies its fundamental ambiguity. A very simple solution, which is upward compatible with the current language definition, is to define division to always round in a precise way, and then to keep % as the remainder function. That way, the language need not be cluttered with a %% operator. The problem is to choose that definition which has the most widespread application. Now to get back to the original subject [which way to round], in my opinion the most useful of these eight choices for division is made by adding the constraint that the sign of any non-zero remainder should be equal to the sign of the divisor. My experiences have led me to believe that this is the most convenient choice. These experiences were gained largely using languages which do integer division equivalently to: a/b = trunc(float(a)/float(b)) (i.e. the "wrong" way). *Whenever* I have sought the result of a%b with b>0, I have wanted the smallest non-negative remainder, regardless of the sign of a. The symmetric choice is to then have the sign of a%b determined by the sign of b. This choice may lead [as would any choice] to [marginally] increased overhead on machines which have asymmetric support for signed vs. unsigned integers, but it is no secret that C runs best on machines with symmetric architectures.
weemba@brahms.BERKELEY.EDU (Matthew P. Wiener) (02/19/86)
In article <736@abic.UUCP> jst@abic.UUCP (Shack Toms) writes: >> In article <731@abic.UUCP> jst@abic.UUCP (Shack Toms) writes: >> >However: One might use a%b<0 iff a<0 in an algorithm which printed >> >the value of an integer in a given radix. The least significant >> >digit of a in radix b would then be |a%b|. :-) >> >> So would |a|%b, and it works under either convention. :-) > >Except that |a| is not available for the full range of a. In >particular, on a 16 bit computer |-32768| is not expressible. I don't know about you, but I'm too paranoid about the least negative number to begin with. In fact, I try not to get close, if possible. Frankly, if it's a question of a language getting integer division done correctly on -32767 to 32767 or getting it done incorrectly on -32768 to 32767, I think there is no debate about which is preferable. Or, to express my point in less prejudicial terms: in arguing A vs B in a language, the question of what happens with the least negative number is almost always irrelevant. >The real point [of the :-)] etc. Agreed etc. Personal to Shack Toms- I got mail asking me to keep this issue off of net.arch. As you reposted to net.arch, I wonder if you did? As it is, I'm sending this up to net.lang, since it is not really a C specific question. ucbvax!brahms!weemba Matthew P Wiener/UCB Math Dept/Berkeley CA 94720
cramer@kontron.UUCP (Clayton Cramer) (03/06/86)
> In article <736@abic.UUCP> jst@abic.UUCP (Shack Toms) writes: > >> In article <731@abic.UUCP> jst@abic.UUCP (Shack Toms) writes: > >> >However: One might use a%b<0 iff a<0 in an algorithm which printed > >> >the value of an integer in a given radix. The least significant > >> >digit of a in radix b would then be |a%b|. :-) > >> > >> So would |a|%b, and it works under either convention. :-) > > > >Except that |a| is not available for the full range of a. In > >particular, on a 16 bit computer |-32768| is not expressible. > > I don't know about you, but I'm too paranoid about the least negative > number to begin with. In fact, I try not to get close, if possible. > > Frankly, if it's a question of a language getting integer division done > correctly on -32767 to 32767 or getting it done incorrectly on -32768 > to 32767, I think there is no debate about which is preferable. Or, > to express my point in less prejudicial terms: in arguing A vs B in a > language, the question of what happens with the least negative number > is almost always irrelevant. > > ucbvax!brahms!weemba Matthew P Wiener/UCB Math Dept/Berkeley CA 94720 In my experience, people start caring about least negative numbers when they want to use these weird values in the form: #define CantHappenFlag -32768 so that they can test for absurd values. Instead, these people should be using enumerated types. (I will admit to doing things like this before some gave me a C book with enumerated types described in it.)
jst@abic.UUCP (Shack Toms) (03/08/86)
> > In article <736@abic.UUCP> jst@abic.UUCP (Shack Toms) writes: > > >> In article <731@abic.UUCP> jst@abic.UUCP (Shack Toms) writes: > > >> >However: One might use a%b<0 iff a<0 in an algorithm which printed > > >> >the value of an integer in a given radix. The least significant > > >> >digit of a in radix b would then be |a%b|. :-) > > >> > > >> So would |a|%b, and it works under either convention. :-) > > > > > >Except that |a| is not available for the full range of a. In > > >particular, on a 16 bit computer |-32768| is not expressible. > > > > I don't know about you, but I'm too paranoid about the least negative > > number to begin with. In fact, I try not to get close, if possible. > > > > Frankly, if it's a question of a language getting integer division done > > correctly on -32767 to 32767 or getting it done incorrectly on -32768 > > to 32767, I think there is no debate about which is preferable. Or, > > to express my point in less prejudicial terms: in arguing A vs B in a > > language, the question of what happens with the least negative number > > is almost always irrelevant. > > > > ucbvax!brahms!weemba Matthew P Wiener/UCB Math Dept/Berkeley CA 94720 > > In my experience, people start caring about least negative numbers when > they want to use these weird values in the form: > > #define CantHappenFlag -32768 > > so that they can test for absurd values. Instead, these people should be > using enumerated types. (I will admit to doing things like this before > some gave me a C book with enumerated types described in it.) In my experience, people care about least negative numbers any time they define a function which requires an integer argument which can meaningfully be from the full range. This could be anything from a random number generator (which requires a seed) to an integer modulus function :-). I don't believe I have ever used them as a CantHappenFlag, it is generally too easy to come across them as the result of ordinary computation. Shack Toms