[net.lang] Integer division

laura@hoptoad.uucp (Laura Creighton) (02/04/86)

In article <11689@ucbvax.BERKELEY.EDU> weemba@brahms.UUCP (Matthew P. Wiener) writes:
>....  There are several identities
>running around that are incompatible.
>
> (1)	      a == (a/b) * b + a%b
> (2)	 (-a)/b == -(a/b)
> (3)	(a+b)/b == a/b + 1
> (4)	(a+b)%b == a%b
>
>Notice that (3) and (4) are compatible with what the number theorists want,
>but (2) isn't.  Sure the naive user is fooled by (2) under the version we
>want, but then he's fooled by (3) and (4) in the usual version.  (1) holds
>when the / and % are both what the number theorist wants or when neither are
>what the number theorist wants.

While it is true that number theorists want 3 and 4, it is not the naive user
who will be fooled by 2.  It is the naive *mathematician*, which is just about
everybody.  To non-mathematicians, 2 is a law, with about the same force as
the law of gravity, and not something that you can redefine.

Sure they are wrong about the force of this law -- but blame the way mathematics
is taught in hich schools and grade schools.  In the meantime, only the 
mathematicians, as a class, will have the perspective to see this. Everybody
else (again as a class) will look at (-a)/b != -(a/b) and say ***BUG!!!***
So, as a practical matter, the mathematicians will have to come up with a
work-around, since only they are going to be able to understand what they want.
In letting non-mathematicians design the languages they use, mathematicians
may have seriously goofed, because the non-mathematicians may not understand
what it is that the mathematicians want -- if they did they would be 
mathematicians.  Of course, I am not sure that there is consensus among
mathematicians as to what they want.  If there is, maybe they should write
their own language.

[I'm glad I wrote that last line.  It tells me where to post this fool thing,
which has been in the back of my mind, bothering me, the whole time I was
writing this.  Before that, I was strongly tempted to post to net.philosophy...]
-- 
Laura Creighton		
ihnp4!hoptoad!laura 
hoptoad!laura@lll-crg.arpa

lambert@boring.uucp (Lambert Meertens) (02/06/86)

In article <484@hoptoad.uucp> laura@hoptoad.UUCP (Laura Creighton) writes:
> In article <11689@ucbvax.BERKELEY.EDU> weemba@brahms.UUCP (Matthew P. Wiener)
> writes:
>> ....  There are several identities
>> running around that are incompatible.
>> 
>>  (1)	      a == (a/b) * b + a%b
>>  (2)	 (-a)/b == -(a/b)
>>  (3)	(a+b)/b == a/b + 1
>>  (4)	(a+b)%b == a%b
>> 
>> Notice that (3) and (4) are compatible with what the number theorists want,
>> but (2) isn't.  Sure the naive user is fooled by (2) under the version we
>> want, but then he's fooled by (3) and (4) in the usual version.  [...]

> [...]  To non-mathematicians, 2 is a law, with about the same force as
> the law of gravity, and not something that you can redefine.  [...]
> [They] will look at (-a)/b != -(a/b) and say ***BUG!!!***

I should think that to most people remembering something about what they
were taught at school (3) is a forceful law as well.  Together, (2) and (3)
imply (1/2)*2 = 1 (which I dimly remember as also having been taught:-).

ALGOL 68 has (2) and (4), and so lacks (1) and (3).  A note I wrote about
this (A note on integral division, ALGOL Bull. 39 (1976), 30-32) came to
the conclusion that dropping (2) in favour of (1) and (3) would have been
preferable.

Next to arguments about what naive users would expect (where you obviously
can't have it all), an important question is what is the most *useful*
(convenient) set of laws in programming.  Sometimes (2) is more useful, but
my experience from over 25 years of programming is that (3) is the more
convenient law in the majority of cases, even for non-mathematical problems.

For a language for personal computing that I co-designed, honouring naive
expectations was one of the design principles.  The problem has been
resolved there by the drastic measure of not providing a primitive
operation for integral division.  The mod function obeys (4).

-- 

     Lambert Meertens
     ...!{seismo,okstate,garfield,decvax,philabs}!lambert@mcvax.UUCP
     CWI (Centre for Mathematics and Computer Science), Amsterdam

jst@abic.UUCP (Shack Toms) (02/15/86)

[Kenneth Almquist @ Bell Labs, Holmdel, NJ writes (single >)]
> > Perhaps K&R thought that the performance penalty of implementing a
> > consistent modulus (or divide) was not justified, since negative
> > integers are rarely encountered in "C" [this comment cannot be traced
> > to K&R.]  However, this performance penalty can be avoided simply by
> > declaring unsigned integers as "unsigned int".
> 
> On the VAX, the unsigned remainder function is implemented as a
> subroutine call.  The standard divide instruction cannot be used
> because the divide instruction does not work on unsigned integers.

Thanks for getting me to check this out!  On my VAX using VAXC V2.1, the
unsigned division is done with very ugly inline code...  Thats ok,
but it is incorrect ugly inline code.  Seems that it does not
correctly perform x%y where x = 0x7fffffff, y = 0xfffffff2.  It gets
the answer 1, rather than 0x7fffffff.

However, the following inline fragment should work ok.

	clrl	r1
	movl	x,r0		; load x into r0,r1 as quadword
	movl	y,r2		; y in r2, set condition codes
	bgeq	1$		; y < 2**31, ediv is ok
	cmpl	r2,r0		; y >= 2**31, we know x < 2*y
	bgtru	2$		; y > x ==> x%y == x
	subl2	r2,r0		; y <= x < 2*y ==> x%y == x-y
	brb	2$		;
1$:	ediv	r2,r0,r1,r0	; y in range for ediv, x%y put in r0
2$:				; r0 now contains x%y

For signed integers x%y,  the following seems about optimal, [this
is from the VAXC code (almost)]

	emul	#0,#0,x,r0	; place x in r0,r1 as signed quadword
	ediv	y,r0,r1,r0	; r0 now contains x%y

Comparing the two fragments, the code for the unsigned divide will
probably run about as fast as the code for the signed divide, so long as
the value of y is less than 2**31.  This is because the VAX lacks not
only an unsigned divide, but also a signed longword to quadword
conversion (except for the trick with emul).  And the signed modulus
instruction (ediv) requires a quadword dividend.

The clear winner in all of this is probably x%y, where x is unsigned
and y is a short (either signed or unsigned).  The code is then:

	clrl	r1
	movl	x,r0		; Unsigned x in r0,r1
	movzwl	y,r2		; Unsigned short y in r2 (cvtwl for short y)
	ediv	r2,r0,r1,r0	; r0 contains x%y

But enough of implemention details...

> In Ada, division truncates towards zero, and there are separate
> remainder and modulo operators.  The % operator in C is a remainder
> operator; we could have a new operator %% for modulo.  (On the other
> hand, maybe not.  We don't want C to get as large as Ada!)

I certainly agree with that last remark.  You say, however, that
the % operator is a remainder operator.  Sure, the definition is
that a/b can round up or down [except when both a and b are positive],
but a/b*b+a%b must equal a.  The problem is that this selection of
properties is shared by at least 8 possible funcions from Z X Z --> Z.
Indeed, there may be many more than 8 functions.  As I read my
K&R there is nothing wrong with a compiler evaluating division
of constants (constant folding) in a different manner
from the run-time evaluation of the same values.  Furthermore,
the preprocessor may evaluate division in compile time expressions (say in
#if statements) using a different algorithm. (These possibilities
are probably greater with cross-compilers).  Furthermore,
if the denominator is a constant power of two, then the compiler
might generate shifts and masks, and produce a result different
from that of a division of the same values had they both been stored in
variables.  Similarly, for certain values of the numerator, the
division can be optimized into a comparison with a result generated
according to its own rule.  The requirement seems to be only that,
in each case, the a/b*b+a%b == a rule must hold.  That is, whenever
an optimization can be made for an expression involving "/", that
the corresponding optimization also must be made for "%".  Giving this operator
a simple name ("remainder operator") belies its fundamental ambiguity.

A very simple solution, which is upward compatible with the current
language definition, is to define division to always round in a
precise way, and then to keep % as the remainder function.  That way,
the language need not be cluttered with a %% operator.  The problem
is to choose that definition which has the most widespread application.

Now to get back to the original subject [which way to round],
in my opinion the most useful of these eight choices for division
is made by adding the constraint that the sign of any non-zero remainder
should be equal to the sign of the divisor.  My experiences have
led me to believe that this is the most convenient choice.  These
experiences were gained largely using languages which do integer division
equivalently to: a/b = trunc(float(a)/float(b)) (i.e. the "wrong" way).

*Whenever* I have sought the result of a%b with b>0, I have wanted the
smallest non-negative remainder, regardless of the sign of a.  The
symmetric choice is to then have the sign of a%b determined by the sign
of b.  This choice may lead [as would any choice] to [marginally] increased
overhead on machines which have asymmetric support for signed vs. unsigned
integers, but it is no secret that C runs best on machines with symmetric
architectures.

weemba@brahms.BERKELEY.EDU (Matthew P. Wiener) (02/19/86)

In article <736@abic.UUCP> jst@abic.UUCP (Shack Toms) writes:
>> In article <731@abic.UUCP> jst@abic.UUCP (Shack Toms) writes:
>> >However:  One might use a%b<0 iff a<0 in an algorithm which printed
>> >the value of an integer in a given radix.  The least significant
>> >digit of a in radix b would then be |a%b|.  :-)
>> 
>> So would |a|%b, and it works under either convention.  :-)
>
>Except that |a| is not available for the full range of a.  In
>particular, on a 16 bit computer |-32768| is not expressible.

I don't know about you, but I'm too paranoid about the least negative
number to begin with.  In fact, I try not to get close, if possible.

Frankly, if it's a question of a language getting integer division done
correctly on -32767 to 32767 or getting it done incorrectly on -32768
to 32767, I think there is no debate about which is preferable.  Or,
to express my point in less prejudicial terms: in arguing A vs B in a
language, the question of what happens with the least negative number 
is almost always irrelevant.

>The real point [of the :-)] etc.
Agreed etc.

Personal to Shack Toms- I got mail asking me to keep this issue off
of net.arch.  As you reposted to net.arch, I wonder if you did?  As
it is, I'm sending this up to net.lang, since it is not really a C
specific question.

ucbvax!brahms!weemba	Matthew P Wiener/UCB Math Dept/Berkeley CA 94720

cramer@kontron.UUCP (Clayton Cramer) (03/06/86)

> In article <736@abic.UUCP> jst@abic.UUCP (Shack Toms) writes:
> >> In article <731@abic.UUCP> jst@abic.UUCP (Shack Toms) writes:
> >> >However:  One might use a%b<0 iff a<0 in an algorithm which printed
> >> >the value of an integer in a given radix.  The least significant
> >> >digit of a in radix b would then be |a%b|.  :-)
> >> 
> >> So would |a|%b, and it works under either convention.  :-)
> >
> >Except that |a| is not available for the full range of a.  In
> >particular, on a 16 bit computer |-32768| is not expressible.
> 
> I don't know about you, but I'm too paranoid about the least negative
> number to begin with.  In fact, I try not to get close, if possible.
> 
> Frankly, if it's a question of a language getting integer division done
> correctly on -32767 to 32767 or getting it done incorrectly on -32768
> to 32767, I think there is no debate about which is preferable.  Or,
> to express my point in less prejudicial terms: in arguing A vs B in a
> language, the question of what happens with the least negative number 
> is almost always irrelevant.
> 
> ucbvax!brahms!weemba	Matthew P Wiener/UCB Math Dept/Berkeley CA 94720

In my experience, people start caring about least negative numbers when
they want to use these weird values in the form:

#define		CantHappenFlag -32768

so that they can test for absurd values.  Instead, these people should be
using enumerated types.  (I will admit to doing things like this before
some gave me a C book with enumerated types described in it.)

jst@abic.UUCP (Shack Toms) (03/08/86)

> > In article <736@abic.UUCP> jst@abic.UUCP (Shack Toms) writes:
> > >> In article <731@abic.UUCP> jst@abic.UUCP (Shack Toms) writes:
> > >> >However:  One might use a%b<0 iff a<0 in an algorithm which printed
> > >> >the value of an integer in a given radix.  The least significant
> > >> >digit of a in radix b would then be |a%b|.  :-)
> > >> 
> > >> So would |a|%b, and it works under either convention.  :-)
> > >
> > >Except that |a| is not available for the full range of a.  In
> > >particular, on a 16 bit computer |-32768| is not expressible.
> > 
> > I don't know about you, but I'm too paranoid about the least negative
> > number to begin with.  In fact, I try not to get close, if possible.
> > 
> > Frankly, if it's a question of a language getting integer division done
> > correctly on -32767 to 32767 or getting it done incorrectly on -32768
> > to 32767, I think there is no debate about which is preferable.  Or,
> > to express my point in less prejudicial terms: in arguing A vs B in a
> > language, the question of what happens with the least negative number 
> > is almost always irrelevant.
> > 
> > ucbvax!brahms!weemba	Matthew P Wiener/UCB Math Dept/Berkeley CA 94720
> 
> In my experience, people start caring about least negative numbers when
> they want to use these weird values in the form:
> 
> #define		CantHappenFlag -32768
> 
> so that they can test for absurd values.  Instead, these people should be
> using enumerated types.  (I will admit to doing things like this before
> some gave me a C book with enumerated types described in it.)

In my experience, people care about least negative numbers any time they
define a function which requires an integer argument which can meaningfully
be from the full range.  This could be anything from a random number generator
(which requires a seed) to an integer modulus function :-).  I don't
believe I have ever used them as a CantHappenFlag,  it is generally
too easy to come across them as the result of ordinary computation.

Shack Toms