[comp.lang.c] int divided by unsigned.

jlg@lanl.gov (Jim Giles) (06/16/89)

#include "stdio.h"
main(){
   int a;
   unsigned b;

   a= -5;
   b=1000;
   printf("%d\n",a/=b);
}

I tried the above program on my sun workstation and it printed 4294967.
This is aparently the "correct" answer according to the proposed C
standard.  On the Cray under UNICOS, the same program prints 0.  The
"correct" answer for the Cray would have been 18446744073709551.  This
is an example of a case where deviating from the C definition produces
desireable results.  I hope Cray doesn't "fix" their C compiler.

(Note: this behaviour occurs because C requires arguments to be "promoted"
to unsigned if either is already unsigned.  The preferable rule would be
that if one argument is an int and the other is an unsigned, _both_ should
be promoted to long before the operator is applied.  Unsigned is _not_ a
promotion from int - it is a break-even semantics change.)

karl@haddock.ima.isc.com (Karl Heuer) (06/17/89)

In article <13940@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>I hope Cray doesn't "fix" their C compiler.

Instead of hoping, why don't you add a explicit casts to your code, so that it
will do what you want whether or not they fix the bug?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

scjones@sdrc.UUCP (Larry Jones) (06/17/89)

In article <13940@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> 
> [ example of -5 / 1000U -- Cray gets zero instead of the large
>   number required by ANSI ]
> 
> This
> is an example of a case where deviating from the C definition produces
> desireable results.  I hope Cray doesn't "fix" their C compiler.
> 
> (Note: this behaviour occurs because C requires arguments to be "promoted"
> to unsigned if either is already unsigned.  The preferable rule would be
> that if one argument is an int and the other is an unsigned, _both_ should
> be promoted to long before the operator is applied.  Unsigned is _not_ a
> promotion from int - it is a break-even semantics change.)

Well, there're two good reasons why you DON'T want to promote to
long:  long might well be the same size as int and, if it's not,
it may well take a lot longer to compute the answer.  In the
first case, you get an answer which is no more useful than the
unsigned version, in the second case you violate the Spirit of C
by doing non-obvious things behind the programmer's back.  If you
want long division, put in a cast!  (Then when it doesn't work
because long and int are the same size, people can blame you
instead of the ANSI committee :-).
----
Larry Jones                         UUCP: uunet!sdrc!scjones
SDRC                                      scjones@SDRC.UU.NET
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150-2789             AT&T: (513) 576-2070
"You can't get a body like mine in a bottle --
unless you push REAL HARD." - Judy Tenuta / Dr. Pepper

jlg@lanl.gov (Jim Giles) (06/19/89)

From article <748@sdrc.UUCP>, by scjones@sdrc.UUCP (Larry Jones):
> [...]
> Well, there're two good reasons why you DON'T want to promote to
> long:  long might well be the same size as int and, if it's not,
> it may well take a lot longer to compute the answer.  In the
> first case, you get an answer which is no more useful than the
> unsigned version, [...

This is not true.  If long is the same as int, then I would at least
get signed arithmetic performed.  So, in my original example, -5/1000
would equal 0.  In fact, 'promoting' unsigned to int would be better
than the other way around.  Since most arithmetic is not carried out
on "large" numbers, promoting to int would produce expected results
more often than the other way around.

> ...]              in the second case you violate the Spirit of C
> by doing non-obvious things behind the programmer's back.

But you are _already_ doing that by casting everything to unsigned!
The point of my submission was that -5/1000 == bignumber _IS_ a non-
obvious thing.  The point is that the default 'promotion' order is
has been poorly defined.

john@frog.UUCP (John Woods) (06/24/89)

In article <13940@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
O> #include "stdio.h"
u> main(){
r>    int a = -5;			/* editted for rebroadcast */
 >    unsigned b = 1000;
C>    printf("%d\n",a/=b);
 > }
"> I tried the above program on my sun workstation and it printed 4294967.
e> This is aparently the "correct" answer according to the proposed C
x> standard.  On the Cray under UNICOS, the same program prints 0.  The
p> "correct" answer for the Cray would have been 18446744073709551.  This
e> is an example of a case where deviating from the C definition produces
r> desireable results.  I hope Cray doesn't "fix" their C compiler.
t> 
">

Try

	main() {
		int a = -5;
		unsigned b = 1000;

		printf("%d\n", a /= (int)b);
	}

A correct program giving desirable results.  Fancy that.

-- 
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu
    People...How you gonna FIGURE 'em?
    Don't bother, S.L.--Just stand back and enjoy the EVOLUTIONARY PROCESS...

jlg@lanl.gov (Jim Giles) (06/27/89)

From article <1549@frog.UUCP>, by john@frog.UUCP (John Woods):
> 	main() {
> 		int a = -5;
> 		unsigned b = 1000;
> 		printf("%d\n", a /= (int)b);
> 	}
> A correct program giving desirable results.  Fancy that.

And all it requires is some non-intuitive (and undesireable) clutter
in the expression.  It would _obviously_ be better if the semantics of
the given expression were the _default_ and the present default were
the one which required the extra syntax.  That is, all the following
would be _equal_:

      a /= b, (int)a /= b, a /= (int)b, (int)a /= (int)b

and, to get the current interpretation, you should have to do: 

      (unsigned)a /= b

But, that would require C to do something in a reasonable way - so I
guess we can forget that.

chris@mimsy.UUCP (Chris Torek) (06/27/89)

In article <13958@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>... It would _obviously_ be better if [signed] semantics of [int divided
>by unsigned] expression[s] were the _default_ and the present default were
>the one which required the extra syntax.

And it would `obviously' be better if the sky were green and grass were
blue.  Good grief, what makes your `obvious' any more obvious than mine?
(Now, I do happen to think that the pANS' `value-preserving' sign semantics
are inferior to PCC's `unsigned-preserving' semantics, but I have a very
specific reason for thinking this.  I do not have any particular reason
to believe that the result of combining signed and unsigned should be
one or the other.  I am happy with things as is, and would probably be
just as happy if they had always been the other way.)

Instead of simply asserting `it is obvious that ...', you might explain
why you feel that way, for those of us to whom it is not obvious.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn) (06/27/89)

In article <13958@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>... to get the current interpretation, you should have to do: 
>      (unsigned)a /= b
>But, that would require C to do something in a reasonable way - so I
>guess we can forget that.

"Reason" is in the mind of the beholder.  To require that assignment
be allowed to an rvalue strikes me as exceedingly UNreasonable.
C has a definite set of rules governing what happens when types
"collide"; they seem pretty reasonable to me.

If you want a different language, feel free to design one.

jlg@lanl.gov (Jim Giles) (06/28/89)

From article <18296@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):
> Instead of simply asserting `it is obvious that ...', you might explain
> why you feel that way, for those of us to whom it is not obvious.

In every other programming language I am familiar with, integer division
is guaranteed to produce a result which is smaller than (or equal to)
the numerator in absolute value.  That is, you can count on the following
relation:
            | a/b | <= | a |

Other languages which have unsigned either don't allow mixed mode at all
(like Modula I, II, etc) or they treat unsigned as inferior to inferior
to signed for automatic conversion.  This later decision conforms to
intuition (ie. the relation above) more often than the procedure C uses.

But, as I say, C rarely obeys the principle of least astonishment.

jlg@lanl.gov (Jim Giles) (06/28/89)

From article <10457@smoke.BRL.MIL>, by gwyn@smoke.BRL.MIL (Doug Gwyn):
> If you want a different language, feel free to design one.

I am.  In fact I have recent read more than two dozen books on the
subject of programming language design.  The principle of least
astonishment is considered, by most designers, to be something worth
obeying.  C rarely does.  In fact, there have been many studies of
various programming language features to determine the effect they
have on programmer productivity.  There are perhaps a dozen features
which are consistently found to be damaging to productivity.  Of course,
C has all of these features.

chris@mimsy.UUCP (Chris Torek) (06/28/89)

In article <13959@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>In every other programming language I am familiar with, integer division
>is guaranteed to produce a result which is smaller than (or equal to)
>the numerator in absolute value.  That is, you can count on the following
>relation:
>            | a/b | <= | a |

Well, you can count on it in C, too, because in `unsigned a; int b; a/b'
you have unsigned division, rather than integer division; I find this
no more odd than the fact that `a/b' is sometimes integer division and
sometimes floating point division.  But maybe I am just used to it.

>Other languages which have unsigned either don't allow mixed mode at all
>(like Modula I, II, etc)

This I dislike (purely as a matter of taste), although as long as there
is a mechanism for explicit conversions, the language has not lost any
abilities.

>or they treat unsigned as inferior to inferior to signed for automatic
>conversion.  This later decision conforms to intuition (ie. the relation
>above) more often than the procedure C uses.

That depends on where you get your intuition.  I never would have
expected people to put up with traffic jams either---it seems
intuitively obvious that people would agree to flexible work hours
instead, at least in jobs that permit it (most `white collar' work).
That is, arrive any time between 0600 and 1000---sort of an extended
version of `flex time'.  But maybe that has something to do with
me getting up at 1800 one day, 2250 the next, and around 0200
the day after. . . .

Anyway, I happen to like `sticky unsigned' operation, but as I say,
perhaps I am just used to it.  It seems to me that a C-like language
with `sticky signed' operation would work as well, although the
results of such mixed mode operations would astonish me for a while.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

spl@mcnc.org (Steve Lamont) (06/28/89)

In article <13960@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
#From article <10457@smoke.BRL.MIL>, by gwyn@smoke.BRL.MIL (Doug Gwyn):
#> If you want a different language, feel free to design one.
#
#I am.  ...

Oh no, not another one... :-)

#                     ...  In fact, there have been many studies of
#various programming language features to determine the effect they
#have on programmer productivity.  There are perhaps a dozen features
#which are consistently found to be damaging to productivity.  Of course,
#C has all of these features.

Well, what are they?  Don't keep us in suspense.  Expiring minds wanna know...

You mean I would be able to produce more than 1000 lines of code in a single
weekend?  Yow!
-- 
							spl
Steve Lamont, sciViGuy			EMail:	spl@ncsc.org
North Carolina Supercomputing Center	Phone: (919) 248-1120
Box 12732/RTP, NC 27709

nevin1@cbnewsc.ATT.COM (nevin.j.liber) (07/06/89)

In article <13960@lanl.gov> jlg@lanl.gov (Jim Giles) writes:

|The principle of least
|astonishment is considered, by most designers, to be something worth
|obeying.  C rarely does.  In fact, there have been many studies of
|various programming language features to determine the effect they
|have on programmer productivity.  There are perhaps a dozen features
|which are consistently found to be damaging to productivity.  Of course,
|C has all of these features.

Could you please elaborate (in comp.lang.misc, of course)?  I'm sure
that many of us would be interested in seeing that list.

Thanks (in advance),
-- 
NEVIN ":-)" LIBER  AT&T Bell Laboratories  nevin1@ihlpb.ATT.COM  (312) 979-4751

mouse@mcgill-vision.UUCP (der Mouse) (07/08/89)

In article <13959@lanl.gov>, jlg@lanl.gov (Jim Giles) writes:
> From article <18296@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):
>> Instead of simply asserting `it is obvious that ...', you might
>> explain why [...], for those of us to whom it is not obvious.

> In every other programming language I am familiar with, [...] you can
> count on the following relation:

>             | a/b | <= | a |

> Other languages which have unsigned either don't allow mixed mode at
> all (like Modula I, II, etc) or they treat unsigned as inferior to
> inferior to signed for automatic conversion.  This later decision
> conforms to intuition (ie. the relation above) more often than the
> procedure C uses.

Really?  As in 65500U / 6 giving -6 (sixteen bit ints)?[%]  Take your
pick, you get either that -100 / 10U giving 654.  These aren't
integers our code is working with; they're just approximations.  You're
bound to be able to find discrepancies, you just have your choice of
where you want them to show up.

[%] Yes, I know this obeys the inequality you gave above.  But somehow
    `least astonishment' seems to want something more like 10916. :-)

> But, as I say, C rarely obeys the principle of least astonishment.

Depends on how your astonishment sensor has been trained.

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu

jlg@lanl.gov (Jim Giles) (07/11/89)

From article <1578@mcgill-vision.UUCP>, by mouse@mcgill-vision.UUCP (der Mouse):
> [...]
> Really?  As in 65500U / 6 giving -6 (sixteen bit ints)?[%]  Take your
> pick, you get either that -100 / 10U giving 654.  [...]

If you've really been following this discussion, you will remember that
my _favorite_ fix for the mixed mode problem is to promote _both_ operands
to long.  This would give 65500/6 == 10916 _AND_ it would give -5/1000u == 0.
Of course, as someone also pointed out, C foolishly doesn't require 'short',
'int', and 'long' to be different data types.  Oh well.

flaps@jarvis.csri.toronto.edu (Alan J Rosenthal) (07/12/89)

jlg@lanl.gov (Jim Giles) writes:
>C foolishly doesn't require 'short', 'int', and 'long' to be different data
>types.  Oh well.

Sure it does.  It just doesn't require them to be different sizes.  Requiring
them to be different sizes would be foolish.

ajr

jlg@lanl.gov (Jim Giles) (07/12/89)

From article <1989Jul11.215930.9042@jarvis.csri.toronto.edu>, by flaps@jarvis.csri.toronto.edu (Alan J Rosenthal):
> jlg@lanl.gov (Jim Giles) writes:
>>C foolishly doesn't require 'short', 'int', and 'long' to be different data
>>types.  Oh well.
> 
> Sure it does.  It just doesn't require them to be different sizes.  Requiring
> them to be different sizes would be foolish.

Requiring them to be different sizes would make sense.  Allowing them to
be the same size is done for backward compatibility (_both_ meanings of
the word backward intended).  This failing wouldn't have existed if the
language had made proper requirements on the data types from the start.

The only case I can think of where it would be useful to have two distinct
data types allowed to be identically implemented would be 'char' vs. 'ascii'.
Here, 'char' could be the machine specific character set and 'ascii' would
be the ASCII character set.  The two would be identical on machines in
which the usual character set _is_ ASCII.  Other than that, if a language
has two distinct data types, they should have different properties.  Even
better, distinct data types should differ from each other in predictable
ways.  What's wrong with short must be twice as precise as char?  Or int
should be at least twice as precise as short?  Or long must be at least
twice as precise as int?  Etc..

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/12/89)

In article <13981@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>C foolishly doesn't require 'short', 'int', and 'long' to be
>different data types.

These certainly are distinct data types in C.

If you don't like C, that's your business, but please quit grumbling
to the C newsgroup, especially when you don't know what you're talking
about.  Thanks..

Horne-Scott@cs.yale.edu (Scott Horne) (07/12/89)

In article <13983@lanl.gov>, jlg@lanl (Jim Giles) writes:
> From article <1989Jul11.215930.9042@jarvis.csri.toronto.edu>, by flaps@jarvis.csri.toronto.edu (Alan J Rosenthal):
> > jlg@lanl.gov (Jim Giles) writes:
> >>C foolishly doesn't require 'short', 'int', and 'long' to be different data
> >>types.  Oh well.
> > 
> > Sure it does.  It just doesn't require them to be different sizes.  Requiring
> > them to be different sizes would be foolish.
> 
> Requiring them to be different sizes would make sense.  Allowing them to
> be the same size is done for backward compatibility (_both_ meanings of
> the word backward intended).  This failing wouldn't have existed if the
> language had made proper requirements on the data types from the start.

There's nothing ``backward'' about it in either sense.

> Even
> better, distinct data types should differ from each other in predictable
> ways.  What's wrong with short must be twice as precise as char?  Or int
> should be at least twice as precise as short?  Or long must be at least
> twice as precise as int?  Etc..

What's wrong with it?  Architecture is what's wrong with it.  Suppose we were
designing a C compiler for the IBM PC.  Which sizes shall we use?  Well,
according to you, we should make `char', `short', `int', and `long' different
sizes, preferably with each twice the size of the previous (with `long' perhaps
more than twice the size of `int').  Now, it makes sense to let `char' be one
byte, as you'll probably agree.  (Consider the purpose of `char'.)  If we are
to make `short' bigger than `char', and if we accept your rather arbitrary
choice of twice the size, then we shall give it a size of two bytes.  So far,
so good.  You probably don't want to make `int' a three-byte type; thus, make
it four bytes.  (This is consistent with your twice-the-size argument, too.)
What to do with `long'?  Well, you want it to be at least twice the size of
an `int'.  But that's eight bytes--and the machine instructions can't handle
eight-byte integers conveniently!  Heavens above!  I guess it's reasonable
then to do it the way most PC C compilers do:  1-byte `char's, 2-byte `short's
and `int's, and 4-byte `long's.  How coincidental that the language doesn't
make such demands.

Other problems arise, such as alignment.  Anyway, your demands force all
implementations to use 1-byte `char's, 2-byte `short's, and 4-byte `int's.
Doesn't this seem daft?

					--Scott

Scott Horne                              Hacker-in-Chief, Yale CS Dept Facility
horne@cs.Yale.edu                         ...!{harvard,cmcl2,decvax}!yale!horne
Home: 203 789-0877     SnailMail:  Box 7196 Yale Station, New Haven, CT   06520
Work: 203 432-6428              Summer residence:  175 Dwight St, New Haven, CT
Dare I speak for the amorphous gallimaufry of intellectual thought called Yale?

henry@utzoo.uucp (Henry Spencer) (07/12/89)

In article <13983@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>...distinct data types should differ from each other in predictable
>ways...

They do.  Barring truly peculiar machines, long is longer than short, and
int is whichever of the two is more efficient.  (The latter is *important*,
because for many housekeeping variables you don't care much about the
range but you want speed.)
-- 
$10 million equals 18 PM       |     Henry Spencer at U of Toronto Zoology
(Pentagon-Minutes). -Tom Neff  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

jlg@lanl.gov (Jim Giles) (07/13/89)

From article <66203@yale-celray.yale.UUCP>, by Horne-Scott@cs.yale.edu (Scott Horne):
> [...]    You probably don't want to make `int' a three-byte type; thus, make
> it four bytes.  (This is consistent with your twice-the-size argument, too.)
> What to do with `long'?  Well, you want it to be at least twice the size of
> an `int'.  But that's eight bytes--and the machine instructions can't handle
> eight-byte integers conveniently!  Heavens above! [...]

Oh, gee ... The language design might not be _convenient_ for some
machines.  That means my Smalltalk environment on the PC (with arbitrary
precision integer arithmetic) has done something _inconvenient_.  I have
other _compiled_ languages which have 64 bit integers, why can't C?

> Other problems arise, such as alignment.  Anyway, your demands force all
> implementations to use 1-byte `char's, 2-byte `short's, and 4-byte `int's.
> Doesn't this seem daft?

No, "daft" isn't the word I'd choose.  I might lean toward "portable", or
"well defined", but certainly not "daft".

peter@ficc.uu.net (Peter da Silva) (07/14/89)

Is it just me, or does anyone else think Jim Giles and Herman Rubin need
their own news-group? comp.lang.wishlists?
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.
Business: peter@ficc.uu.net, +1 713 274 5180. | Th-th-th-that's all folks...
Personal: peter@sugar.hackercorp.com.   `-_-' |  -- Mel Blanc
Quote: Have you hugged your wolf today?  'U`  |     May 30 1908 - Jul 10 1989