[comp.lang.c] Of Standards and Inventions: A Cautionary Tale

chris@mimsy.UUCP (Chris Torek) (04/06/88)

[Typography convention: /word/ represents /italics/; |word|
represents typewriter-text.]

By now most of you know my sentiments towards `noalias'.  Here,
however, is a sequence showing how even the most innocent-seeming
inventions can interact to produce surprising results.

First, a note about unsignedness:  In the C language, the unsigned
attribute on a type can be viewed as `sticky': operations on unsigned
numbers always yeild an unsigned result.  (The only exception is the
ternary e1?e2:e3, whose result is independent of the type of e1.)
The condition can, of course, be cleared by a cast to a signed
type.

Second, we have a long-standing clause in the draft standard on /integer
constants/, one that determines the type of a constant from its value
and that value's representation on your machine.  In itself this is
nothing new: even K&R say that whether |34567| is an |int| or a |long|
will depend on the number of bits in your |int|.  The dpANS further
says that a constant may become an |unsigned long|.  In particular, on
machines with 32 bit |long|s, values in 2147483648..4294967295 are
|unsigned long|.  This is certainly reasonable, or at least seems so.

Next we have the introduction of explicitly-unsigned constants.  |12U|
is to be equivalent to |(unsigned)12|; |99LU| or |99UL| is equivalent
to |(unsigned long)99|.  This is quite a notational convenience, just
as is the existing L suffix, and adding it to compilers is simple:  It
took perhaps a dozen lines to add it to the 4.3BSD Vax and Tahoe
compilers.  Again, reasonable, if something of a frill.

But now that we have this U suffix, and various files that use it, I
find that the preprocessor must do something with it.  And indeed, the
draft tells us that the preprocessor now has the notion of unsigned
arithmetic.  Rather than do everything in |long|s, ignoring any U
suffixes, it must obey the compiler's rules for combining |long| and
|unsigned long|.  Is this such a burden?  Perhaps; perhaps not: a close
approximation in the Reiser preprocessor---making unsigned
`sticky'---took only a few changes (the approximation fails only for
e1?e2:e3 as noted above).  But having unsigned arithmetic available in
the preprocessor is clearly semantically desirable: it should be nice
to be able to tell whether the maximum unsigned short is greater than
65535U:

	#include <limits.h>

	/*
	 * Define a type to hold values in 0..65536.  We will
	 * have a large array of these numbers, so use as little
	 * space as possible.
	 */
	#if USHRT_MAX > 65535U
	typedef unsigned short bigunum;
	#else
	typedef unsigned long bigunum;	/* dpANS says u_long must suffice */
	#endif

Each of these inventions (for inventions they are, at least as they
have been phrased) seems perfectly reasonable.  At least, each one
seems so to me.  But lo! what has happened when we combine them all?
The answer to that lies in the following question:

	On a machine with 32 bit |long|s and two's complement
	arithmetic, what is the type of -2147483648 in the preprocessor?

Since the preprocessor is required to follow the same rules as the
compiler, and is possesed of the notion of unsigned, we find that it is
first to compute 2147483648 and then to negate it, and when it does the
former it finds that the type is |unsigned long|.  The negation changes
nothing: /neither the type nor the value/.  As noted earlier, the only
way to remove the unsigned attribute is to use a cast.  But since the
preprocessor explicitly disallows casts, there is no way to get
-2147483648!  In particular, this means that

	#include <limits.h>
	#if LONG_MIN > 0

is guaranteed to be /true/ on any two's complement machine!

The moral, if you will, of this story is that even obvious and
well-behaved inventions may not always work together.  If something as
simple as putting unsigned arithmetic in the preprocessor has such a
surprising result, what can we expect of inventions like |noalias|?
Perhaps this will show why I am uneasy about /every/ invention in
this draft standard, even such obvious improvements as prototypes.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (04/06/88)

By the way, I should probably have mentioned that there is
an easy way to fix the `LONG_MIN > 0' without any real changes
to the draft, and that is to use

	#define LONG_MIN (-0x7fffffff-1)

rather than

	#define LONG_MIN -0x80000000

The example <limits.h> (which also sets minimal maxima) is for a
one's complement machine and does not suffer from this `feature'.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

jss@hector.UUCP (Jerry Schwarz) (04/07/88)

In article <10949@mimsy.UUCP> chris@mimsy.umd.edu (Chris Torek) writes:
>	On a machine with 32 bit |long|s and two's complement
>	arithmetic, what is the type of -2147483648 in the preprocessor?
>
>Since the preprocessor is required to follow the same rules as the
>compiler,  ...


Actually not.  Specifically, section 3.8.1, where it is discussing the
evaluation of the expression in a #if:

        "Then the usual arithmetic conversions apply during the
        evaluation of the expression, which takes place using
        arithmetic that has at least the ranges specified in section
        2.2.4.2."

My reading of this is that the ranges of the numbers used in evaluating
these expressions do not have to be the same as those used by the
target code.  

nevin1@ihlpf.ATT.COM (00704a-Liber) (04/08/88)

In article <10949@mimsy.UUCP> chris@mimsy.umd.edu (Chris Torek) writes:
|[...]  But lo! what has happened when we combine them all?
|The answer to that lies in the following question:
|
|	On a machine with 32 bit |long|s and two's complement
|	arithmetic, what is the type of -2147483648 in the preprocessor?
|
|Since the preprocessor is required to follow the same rules as the
|compiler, and is possesed of the notion of unsigned, we find that it is
|first to compute 2147483648 and then to negate it, and when it does the
|former it finds that the type is |unsigned long|.  The negation changes
|nothing: /neither the type nor the value/.  As noted earlier, the only
|way to remove the unsigned attribute is to use a cast.  But since the
|preprocessor explicitly disallows casts, there is no way to get
|-2147483648!

What about doing something like

	(-2147483647 - 1)??

(Yes, I will admit it looks kludgy and I don't particularly like it, but
it should work.)

|The moral, if you will, of this story is that even obvious and
|well-behaved inventions may not always work together.

And even the obvious deficiencies due to new inventions have workarounds!
:-) :-)  Seriously though, I do agree that any changes made have to be
thought out very, very carefully.
-- 
 _ __			NEVIN J. LIBER	..!ihnp4!ihlpf!nevin1	(312) 510-6194
' )  )				"The secret compartment of my ring I fill
 /  / _ , __o  ____		 with an Underdog super-energy pill."
/  (_</_\/ <__/ / <_	These are solely MY opinions, not AT&T's, blah blah blah

bright@Data-IO.COM (Walter Bright) (04/09/88)

In article <10949@mimsy.UUCP> chris@mimsy.umd.edu (Chris Torek) writes:
<	On a machine with 32 bit |long|s and two's complement
<	arithmetic, what is the type of -2147483648 in the preprocessor?
<As noted earlier, the only
<way to remove the unsigned attribute is to use a cast.  But since the
<preprocessor explicitly disallows casts, there is no way to get
<-2147483648!

I don't understand why ANSI C doesn't allow casts and sizeofs in
preprocessor expressions. The
only restriction that is reasonable is disallowing typedef'd types
in the cast or the sizeof, because then the preprocessor has to have
information from the compiler's symbol table. Also, preprocessor
expressions are computed as longs by default instead of ints.

In fact, preprocessor expressions should follow the
SAME rules as C expressions.

In my compiler, they follow the same rules because it's the same code!

dg@lakart.UUCP (David Goodenough) (04/09/88)

From article <10949@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek):
> First, a note about unsignedness:  In the C language, the unsigned
> attribute on a type can be viewed as `sticky': operations on unsigned
> numbers always yeild an unsigned result.  (The only exception is the
> ternary e1?e2:e3, whose result is independent of the type of e1.)
> The condition can, of course, be cleared by a cast to a signed
> type.

I throw this into the wind as food for thought. If nobody likes the
idea, that is all fine and dandy, but I find it a sometimes useful system.

I am in the process of implementing a language in the likes of BCPL and B,
i.e. variables are typeless. HOWEVER, what I did was to type the appropriate
operators: so

		-2 / 2	== -1 (signed)

		-2 ./ 2 == 32767 (unsigned) (16 bit implementation)

The ./ is the unsigned divide, similarly .> is unsigned greater etc. etc.

Thoughts anyone?
--
	dg@lakart.UUCP - David Goodenough		+---+
							| +-+-+
	....... !harvard!adelie!cfisun!lakart!dg	+-+-+ |
						  	  +---+

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/10/88)

In article <1525@dataio.Data-IO.COM> bright@dataio.UUCP (Walter Bright) writes:
>I don't understand why ANSI C doesn't allow casts and sizeofs in
>preprocessor expressions. ...
>In my compiler, they follow the same rules because it's the same code!

We didn't want to mandate that the preprocessor be integrated into the
language parser proper.  I agree that the language would be nicer if
it WERE so integrated, but for historical reasons it wasn't.

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/11/88)

In article <7637@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
| [...]
| We didn't want to mandate that the preprocessor be integrated into the
| language parser proper.  I agree that the language would be nicer if
| it WERE so integrated, but for historical reasons it wasn't.

  This would be true for sizeof a user type or variable, but certainly
not for a predefined type, such as sizeof int. With programs traveling
between 32 bit machines and 16 bit machines (286, 11s) I want to say:
	#if	sizeof int < 32
	#define INT	long
	#else
	#define INT	int
	#endif
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

bright@Data-IO.COM (Walter Bright) (04/12/88)

In article <7637@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
<In article <1525@dataio.Data-IO.COM< bright@dataio.UUCP (Walter Bright) writes:
<<I don't understand why ANSI C doesn't allow casts and sizeofs in
<<preprocessor expressions. ...
<<In my compiler, they follow the same rules because it's the same code!
<We didn't want to mandate that the preprocessor be integrated into the
<language parser proper.  I agree that the language would be nicer if
<it WERE so integrated, but for historical reasons it wasn't.

What I proposed didn't mandate this. What I proposed was to COPY (or use
#ifdef's) the code for the expression parser into the preprocessor code.
This would then ensure that the behavior was the same. The only thing
that the preprocessor's parser couldn't handle would be symbol table
lookups of things that aren't macros. It could and should be able to
handle things like:

#if sizeof(long) != sizeof(int)

and:

#if (unsigned long) 1.2345e+6 > WHATEVER

variations of which I've wanted to do many times.

joss@ur-tut (Josh Sirota) (04/12/88)

In article <10353@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>With programs traveling between 32 bit machines and 16 bit machines
>(286, 11s) I want to say:
>	#if	sizeof int < 32
>	#define INT	long
>	#else
>	#define INT	int
>	#endif

Christ!  Use long all the time ... a simple
	#define INT	long
would suffice, if you insist on having this INT thing at all.

Why would you want to do this?  If you want 4 byte values, specify long
on ANY machine.

					Josh
-- 
Josh Sirota
INTERNET: joss@tut.cc.rochester.edu          BITNET: joss_ss@uordbv.bitnet
          ur-tut!joss@cs.rochester.edu       UUCP: ...!rochester!ur-tut!joss

ok@quintus.UUCP (Richard A. O'Keefe) (04/12/88)

In article <10353@steinmetz.ge.com>, davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes:
> With programs traveling
> between 32 bit machines and 16 bit machines (286, 11s) I want to say:
> 	#if	sizeof int < 32
> 	#define INT	long
> 	#else
> 	#define INT	int
> 	#endif

I haven't got a copy of the latest dpANS (~ $70 in California).  For the
specific case of discerning the word-length of a machine known to be 2s-
complement, would
	#if (1<<1) < 0
	#define	int_size_in_bits	2
	...
	#elif (1<<15) < 0
	#define int_size_in_bits	16
	...
	#elif (1<<63) < 0
	#define int_size_in_bits	64
	#endif

	#if	int_size_in_bits < 32
	#define	INT long
	#else
	#define	INT int
	#endif
do the job, or may the preprocessor and compiler interpret int constants
differently?  Better yet, the dpANS provides a file <limits.h> which has
things like the size of various things in bits already defined in it.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/12/88)

In article <1526@dataio.Data-IO.COM> bright@dataio.UUCP (Walter Bright) writes:
-In article <7637@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
-<In article <1525@dataio.Data-IO.COM< bright@dataio.UUCP (Walter Bright) writes:
-<<I don't understand why ANSI C doesn't allow casts and sizeofs in
-<<preprocessor expressions. ...
-<<In my compiler, they follow the same rules because it's the same code!
-<We didn't want to mandate that the preprocessor be integrated into the
-<language parser proper.  I agree that the language would be nicer if
-<it WERE so integrated, but for historical reasons it wasn't.
-What I proposed didn't mandate this. What I proposed was to COPY (or use
-#ifdef's) the code for the expression parser into the preprocessor code.

Ah, one could indeed duplicate the portion of the compiler that deals
with types in the preprocessor, which would be required in order to
handle sizeof() properly there, but that's what we didn't want to
require as I recall.  At that point one might as well integrate the
preprocessor into the lexer instead of duplicating all that code.

-#if sizeof(long) != sizeof(int)

Yes, I've wanted to be able to do this too.

bright@Data-IO.COM (Walter Bright) (04/13/88)

In article <1758@ur-tut.UUCP> joss@tut.cc.rochester.edu (Josh Sirota) writes:
>In article <10353@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes:
>>With programs traveling between 32 bit machines and 16 bit machines
>>(286, 11s) I want to say:
>>	#if	sizeof int < 32
>>	#define INT	long
>>	#else
>>	#define INT	int
>>	#endif
>Why would you want to do this?  If you want 4 byte values, specify long
>on ANY machine.

Here is a portion of a package to handle bit vectors in C. It demonstrates
a reasonable use for allowing casts and sizeofs in preprocessor expressions.
----------------------------------------------------------------
/* Use base type that is probably the most efficient for this machine	*/
#define vec_t unsigned		/* preprocessor can't see typedefs	*/

/* This code depends on 8 bit bytes. Put check in for this.	*/
/* I don't care about 1's complement machines.			*/
#if (unsigned char) -1 != 255
#error	"bytes are not 8 bits"
#endif

#define BITSPERBYTE 8

#define VECMASK	(sizeof(vec_t)*BITSPERBYTE - 1)	/* mask for bit position */

/* Determine VECSHIFT, the number of bits set in VECMASK.	*/
/* If anyone knows a nifty way to convert from VECMASK to	*/
/* VECSHIFT, in such a way that VECSHIFT is a constant that	*/
/* can be folded with others, please send me mail!		*/
#if sizeof(vec_t) * BITSPERBYTE == 16
#define VECSHIFT 4
#elif sizeof(vec_t) * BITSPERBYTE == 32
#define VECSHIFT 5
#else
#error	"need to fix this"
#endif

void vec_setbit(b,v)	/* Set bit b in vector v	*/
unsigned b;
vec_t *v;
{
  *(v + (b >> VECSHIFT)) |= 1 << (b & VECMASK);
}

ltf@killer.UUCP (Lance Franklin) (04/13/88)

In article <40@lakart.UUCP> dg@lakart.UUCP writes:
>I throw this into the wind as food for thought. If nobody likes the
>idea, that is all fine and dandy, but I find it a sometimes useful system.
>
>I am in the process of implementing a language in the likes of BCPL and B,
>i.e. variables are typeless. HOWEVER, what I did was to type the appropriate
>operators: so
>
>		-2 / 2	== -1 (signed)
>
>		-2 ./ 2 == 32767 (unsigned) (16 bit implementation)
>
>The ./ is the unsigned divide, similarly .> is unsigned greater etc. etc.
>
>Thoughts anyone?

Well, this sounds familiar...some extended versions of BCPL used this very
same scheme for handling floating point numbers (which were, of course, the
same size (32 bits) as everything else).  Floating point constants were
of the following form:
        i.jEk
        i.j
        iEk
The arithmetic and relational operators for floating point quantities were:
        #* #/ #+ #- #= #^= #<= #>= #< #>
with the same precedence as the corresponding integer operations.
 
They also had two monadic funtions FIX(x) and FLOAT(x) for conversion between
integers and floating point numbers.
 
Ah, memories.  One of these days I gotta bring up BCPL on my Amiga...I still
have a copy of Martin Richard's transport tape (MR1084) around here somewhere.
A quick conversion from EBCDIC to ASCII and I'm ready to go!   :-)
 
 


-- 
+------------------+ +------------------------------------------------------+
| Lance T Franklin | |  Now accepting suggestions for clever, humourous or  |
|    ltf@killer    | |  incredibly insightful .signature quote.  Send Now!  |
+------------------+ +------------------------------------------------------+

henry@utzoo.uucp (Henry Spencer) (04/13/88)

> 	#elif (1<<63) < 0
> 	#define int_size_in_bits	64
> 	...
> do the job, or may the preprocessor and compiler interpret int constants
> differently?

As I recall (my copy of the draft isn't handy), there is some room for
different interpretation of compile-time operations.  Furthermore, even
ignoring that, there is another problem:  the result of shifting beyond
the available number of bits is implementation-defined (or possibly
even undefined).
-- 
"Noalias must go.  This is           |  Henry Spencer @ U of Toronto Zoology
non-negotiable."  --DMR              | {allegra,ihnp4,decvax,utai}!utzoo!henry