chris@mimsy.UUCP (Chris Torek) (04/06/88)
[Typography convention: /word/ represents /italics/; |word| represents typewriter-text.] By now most of you know my sentiments towards `noalias'. Here, however, is a sequence showing how even the most innocent-seeming inventions can interact to produce surprising results. First, a note about unsignedness: In the C language, the unsigned attribute on a type can be viewed as `sticky': operations on unsigned numbers always yeild an unsigned result. (The only exception is the ternary e1?e2:e3, whose result is independent of the type of e1.) The condition can, of course, be cleared by a cast to a signed type. Second, we have a long-standing clause in the draft standard on /integer constants/, one that determines the type of a constant from its value and that value's representation on your machine. In itself this is nothing new: even K&R say that whether |34567| is an |int| or a |long| will depend on the number of bits in your |int|. The dpANS further says that a constant may become an |unsigned long|. In particular, on machines with 32 bit |long|s, values in 2147483648..4294967295 are |unsigned long|. This is certainly reasonable, or at least seems so. Next we have the introduction of explicitly-unsigned constants. |12U| is to be equivalent to |(unsigned)12|; |99LU| or |99UL| is equivalent to |(unsigned long)99|. This is quite a notational convenience, just as is the existing L suffix, and adding it to compilers is simple: It took perhaps a dozen lines to add it to the 4.3BSD Vax and Tahoe compilers. Again, reasonable, if something of a frill. But now that we have this U suffix, and various files that use it, I find that the preprocessor must do something with it. And indeed, the draft tells us that the preprocessor now has the notion of unsigned arithmetic. Rather than do everything in |long|s, ignoring any U suffixes, it must obey the compiler's rules for combining |long| and |unsigned long|. Is this such a burden? Perhaps; perhaps not: a close approximation in the Reiser preprocessor---making unsigned `sticky'---took only a few changes (the approximation fails only for e1?e2:e3 as noted above). But having unsigned arithmetic available in the preprocessor is clearly semantically desirable: it should be nice to be able to tell whether the maximum unsigned short is greater than 65535U: #include <limits.h> /* * Define a type to hold values in 0..65536. We will * have a large array of these numbers, so use as little * space as possible. */ #if USHRT_MAX > 65535U typedef unsigned short bigunum; #else typedef unsigned long bigunum; /* dpANS says u_long must suffice */ #endif Each of these inventions (for inventions they are, at least as they have been phrased) seems perfectly reasonable. At least, each one seems so to me. But lo! what has happened when we combine them all? The answer to that lies in the following question: On a machine with 32 bit |long|s and two's complement arithmetic, what is the type of -2147483648 in the preprocessor? Since the preprocessor is required to follow the same rules as the compiler, and is possesed of the notion of unsigned, we find that it is first to compute 2147483648 and then to negate it, and when it does the former it finds that the type is |unsigned long|. The negation changes nothing: /neither the type nor the value/. As noted earlier, the only way to remove the unsigned attribute is to use a cast. But since the preprocessor explicitly disallows casts, there is no way to get -2147483648! In particular, this means that #include <limits.h> #if LONG_MIN > 0 is guaranteed to be /true/ on any two's complement machine! The moral, if you will, of this story is that even obvious and well-behaved inventions may not always work together. If something as simple as putting unsigned arithmetic in the preprocessor has such a surprising result, what can we expect of inventions like |noalias|? Perhaps this will show why I am uneasy about /every/ invention in this draft standard, even such obvious improvements as prototypes. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
chris@mimsy.UUCP (Chris Torek) (04/06/88)
By the way, I should probably have mentioned that there is an easy way to fix the `LONG_MIN > 0' without any real changes to the draft, and that is to use #define LONG_MIN (-0x7fffffff-1) rather than #define LONG_MIN -0x80000000 The example <limits.h> (which also sets minimal maxima) is for a one's complement machine and does not suffer from this `feature'. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
jss@hector.UUCP (Jerry Schwarz) (04/07/88)
In article <10949@mimsy.UUCP> chris@mimsy.umd.edu (Chris Torek) writes: > On a machine with 32 bit |long|s and two's complement > arithmetic, what is the type of -2147483648 in the preprocessor? > >Since the preprocessor is required to follow the same rules as the >compiler, ... Actually not. Specifically, section 3.8.1, where it is discussing the evaluation of the expression in a #if: "Then the usual arithmetic conversions apply during the evaluation of the expression, which takes place using arithmetic that has at least the ranges specified in section 2.2.4.2." My reading of this is that the ranges of the numbers used in evaluating these expressions do not have to be the same as those used by the target code.
nevin1@ihlpf.ATT.COM (00704a-Liber) (04/08/88)
In article <10949@mimsy.UUCP> chris@mimsy.umd.edu (Chris Torek) writes: |[...] But lo! what has happened when we combine them all? |The answer to that lies in the following question: | | On a machine with 32 bit |long|s and two's complement | arithmetic, what is the type of -2147483648 in the preprocessor? | |Since the preprocessor is required to follow the same rules as the |compiler, and is possesed of the notion of unsigned, we find that it is |first to compute 2147483648 and then to negate it, and when it does the |former it finds that the type is |unsigned long|. The negation changes |nothing: /neither the type nor the value/. As noted earlier, the only |way to remove the unsigned attribute is to use a cast. But since the |preprocessor explicitly disallows casts, there is no way to get |-2147483648! What about doing something like (-2147483647 - 1)?? (Yes, I will admit it looks kludgy and I don't particularly like it, but it should work.) |The moral, if you will, of this story is that even obvious and |well-behaved inventions may not always work together. And even the obvious deficiencies due to new inventions have workarounds! :-) :-) Seriously though, I do agree that any changes made have to be thought out very, very carefully. -- _ __ NEVIN J. LIBER ..!ihnp4!ihlpf!nevin1 (312) 510-6194 ' ) ) "The secret compartment of my ring I fill / / _ , __o ____ with an Underdog super-energy pill." / (_</_\/ <__/ / <_ These are solely MY opinions, not AT&T's, blah blah blah
bright@Data-IO.COM (Walter Bright) (04/09/88)
In article <10949@mimsy.UUCP> chris@mimsy.umd.edu (Chris Torek) writes:
< On a machine with 32 bit |long|s and two's complement
< arithmetic, what is the type of -2147483648 in the preprocessor?
<As noted earlier, the only
<way to remove the unsigned attribute is to use a cast. But since the
<preprocessor explicitly disallows casts, there is no way to get
<-2147483648!
I don't understand why ANSI C doesn't allow casts and sizeofs in
preprocessor expressions. The
only restriction that is reasonable is disallowing typedef'd types
in the cast or the sizeof, because then the preprocessor has to have
information from the compiler's symbol table. Also, preprocessor
expressions are computed as longs by default instead of ints.
In fact, preprocessor expressions should follow the
SAME rules as C expressions.
In my compiler, they follow the same rules because it's the same code!
dg@lakart.UUCP (David Goodenough) (04/09/88)
From article <10949@mimsy.UUCP>, by chris@mimsy.UUCP (Chris Torek): > First, a note about unsignedness: In the C language, the unsigned > attribute on a type can be viewed as `sticky': operations on unsigned > numbers always yeild an unsigned result. (The only exception is the > ternary e1?e2:e3, whose result is independent of the type of e1.) > The condition can, of course, be cleared by a cast to a signed > type. I throw this into the wind as food for thought. If nobody likes the idea, that is all fine and dandy, but I find it a sometimes useful system. I am in the process of implementing a language in the likes of BCPL and B, i.e. variables are typeless. HOWEVER, what I did was to type the appropriate operators: so -2 / 2 == -1 (signed) -2 ./ 2 == 32767 (unsigned) (16 bit implementation) The ./ is the unsigned divide, similarly .> is unsigned greater etc. etc. Thoughts anyone? -- dg@lakart.UUCP - David Goodenough +---+ | +-+-+ ....... !harvard!adelie!cfisun!lakart!dg +-+-+ | +---+
gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/10/88)
In article <1525@dataio.Data-IO.COM> bright@dataio.UUCP (Walter Bright) writes: >I don't understand why ANSI C doesn't allow casts and sizeofs in >preprocessor expressions. ... >In my compiler, they follow the same rules because it's the same code! We didn't want to mandate that the preprocessor be integrated into the language parser proper. I agree that the language would be nicer if it WERE so integrated, but for historical reasons it wasn't.
davidsen@steinmetz.ge.com (William E. Davidsen Jr) (04/11/88)
In article <7637@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: | [...] | We didn't want to mandate that the preprocessor be integrated into the | language parser proper. I agree that the language would be nicer if | it WERE so integrated, but for historical reasons it wasn't. This would be true for sizeof a user type or variable, but certainly not for a predefined type, such as sizeof int. With programs traveling between 32 bit machines and 16 bit machines (286, 11s) I want to say: #if sizeof int < 32 #define INT long #else #define INT int #endif -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
bright@Data-IO.COM (Walter Bright) (04/12/88)
In article <7637@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: <In article <1525@dataio.Data-IO.COM< bright@dataio.UUCP (Walter Bright) writes: <<I don't understand why ANSI C doesn't allow casts and sizeofs in <<preprocessor expressions. ... <<In my compiler, they follow the same rules because it's the same code! <We didn't want to mandate that the preprocessor be integrated into the <language parser proper. I agree that the language would be nicer if <it WERE so integrated, but for historical reasons it wasn't. What I proposed didn't mandate this. What I proposed was to COPY (or use #ifdef's) the code for the expression parser into the preprocessor code. This would then ensure that the behavior was the same. The only thing that the preprocessor's parser couldn't handle would be symbol table lookups of things that aren't macros. It could and should be able to handle things like: #if sizeof(long) != sizeof(int) and: #if (unsigned long) 1.2345e+6 > WHATEVER variations of which I've wanted to do many times.
joss@ur-tut (Josh Sirota) (04/12/88)
In article <10353@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: >With programs traveling between 32 bit machines and 16 bit machines >(286, 11s) I want to say: > #if sizeof int < 32 > #define INT long > #else > #define INT int > #endif Christ! Use long all the time ... a simple #define INT long would suffice, if you insist on having this INT thing at all. Why would you want to do this? If you want 4 byte values, specify long on ANY machine. Josh -- Josh Sirota INTERNET: joss@tut.cc.rochester.edu BITNET: joss_ss@uordbv.bitnet ur-tut!joss@cs.rochester.edu UUCP: ...!rochester!ur-tut!joss
ok@quintus.UUCP (Richard A. O'Keefe) (04/12/88)
In article <10353@steinmetz.ge.com>, davidsen@steinmetz.ge.com (William E. Davidsen Jr) writes: > With programs traveling > between 32 bit machines and 16 bit machines (286, 11s) I want to say: > #if sizeof int < 32 > #define INT long > #else > #define INT int > #endif I haven't got a copy of the latest dpANS (~ $70 in California). For the specific case of discerning the word-length of a machine known to be 2s- complement, would #if (1<<1) < 0 #define int_size_in_bits 2 ... #elif (1<<15) < 0 #define int_size_in_bits 16 ... #elif (1<<63) < 0 #define int_size_in_bits 64 #endif #if int_size_in_bits < 32 #define INT long #else #define INT int #endif do the job, or may the preprocessor and compiler interpret int constants differently? Better yet, the dpANS provides a file <limits.h> which has things like the size of various things in bits already defined in it.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (04/12/88)
In article <1526@dataio.Data-IO.COM> bright@dataio.UUCP (Walter Bright) writes: -In article <7637@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes: -<In article <1525@dataio.Data-IO.COM< bright@dataio.UUCP (Walter Bright) writes: -<<I don't understand why ANSI C doesn't allow casts and sizeofs in -<<preprocessor expressions. ... -<<In my compiler, they follow the same rules because it's the same code! -<We didn't want to mandate that the preprocessor be integrated into the -<language parser proper. I agree that the language would be nicer if -<it WERE so integrated, but for historical reasons it wasn't. -What I proposed didn't mandate this. What I proposed was to COPY (or use -#ifdef's) the code for the expression parser into the preprocessor code. Ah, one could indeed duplicate the portion of the compiler that deals with types in the preprocessor, which would be required in order to handle sizeof() properly there, but that's what we didn't want to require as I recall. At that point one might as well integrate the preprocessor into the lexer instead of duplicating all that code. -#if sizeof(long) != sizeof(int) Yes, I've wanted to be able to do this too.
bright@Data-IO.COM (Walter Bright) (04/13/88)
In article <1758@ur-tut.UUCP> joss@tut.cc.rochester.edu (Josh Sirota) writes: >In article <10353@steinmetz.ge.com> davidsen@crdos1.UUCP (bill davidsen) writes: >>With programs traveling between 32 bit machines and 16 bit machines >>(286, 11s) I want to say: >> #if sizeof int < 32 >> #define INT long >> #else >> #define INT int >> #endif >Why would you want to do this? If you want 4 byte values, specify long >on ANY machine. Here is a portion of a package to handle bit vectors in C. It demonstrates a reasonable use for allowing casts and sizeofs in preprocessor expressions. ---------------------------------------------------------------- /* Use base type that is probably the most efficient for this machine */ #define vec_t unsigned /* preprocessor can't see typedefs */ /* This code depends on 8 bit bytes. Put check in for this. */ /* I don't care about 1's complement machines. */ #if (unsigned char) -1 != 255 #error "bytes are not 8 bits" #endif #define BITSPERBYTE 8 #define VECMASK (sizeof(vec_t)*BITSPERBYTE - 1) /* mask for bit position */ /* Determine VECSHIFT, the number of bits set in VECMASK. */ /* If anyone knows a nifty way to convert from VECMASK to */ /* VECSHIFT, in such a way that VECSHIFT is a constant that */ /* can be folded with others, please send me mail! */ #if sizeof(vec_t) * BITSPERBYTE == 16 #define VECSHIFT 4 #elif sizeof(vec_t) * BITSPERBYTE == 32 #define VECSHIFT 5 #else #error "need to fix this" #endif void vec_setbit(b,v) /* Set bit b in vector v */ unsigned b; vec_t *v; { *(v + (b >> VECSHIFT)) |= 1 << (b & VECMASK); }
ltf@killer.UUCP (Lance Franklin) (04/13/88)
In article <40@lakart.UUCP> dg@lakart.UUCP writes: >I throw this into the wind as food for thought. If nobody likes the >idea, that is all fine and dandy, but I find it a sometimes useful system. > >I am in the process of implementing a language in the likes of BCPL and B, >i.e. variables are typeless. HOWEVER, what I did was to type the appropriate >operators: so > > -2 / 2 == -1 (signed) > > -2 ./ 2 == 32767 (unsigned) (16 bit implementation) > >The ./ is the unsigned divide, similarly .> is unsigned greater etc. etc. > >Thoughts anyone? Well, this sounds familiar...some extended versions of BCPL used this very same scheme for handling floating point numbers (which were, of course, the same size (32 bits) as everything else). Floating point constants were of the following form: i.jEk i.j iEk The arithmetic and relational operators for floating point quantities were: #* #/ #+ #- #= #^= #<= #>= #< #> with the same precedence as the corresponding integer operations. They also had two monadic funtions FIX(x) and FLOAT(x) for conversion between integers and floating point numbers. Ah, memories. One of these days I gotta bring up BCPL on my Amiga...I still have a copy of Martin Richard's transport tape (MR1084) around here somewhere. A quick conversion from EBCDIC to ASCII and I'm ready to go! :-) -- +------------------+ +------------------------------------------------------+ | Lance T Franklin | | Now accepting suggestions for clever, humourous or | | ltf@killer | | incredibly insightful .signature quote. Send Now! | +------------------+ +------------------------------------------------------+
henry@utzoo.uucp (Henry Spencer) (04/13/88)
> #elif (1<<63) < 0 > #define int_size_in_bits 64 > ... > do the job, or may the preprocessor and compiler interpret int constants > differently? As I recall (my copy of the draft isn't handy), there is some room for different interpretation of compile-time operations. Furthermore, even ignoring that, there is another problem: the result of shifting beyond the available number of bits is implementation-defined (or possibly even undefined). -- "Noalias must go. This is | Henry Spencer @ U of Toronto Zoology non-negotiable." --DMR | {allegra,ihnp4,decvax,utai}!utzoo!henry