notes@ucbcad.UUCP (10/26/83)
#N:ucbesvax:4800029:000:459 ucbesvax!turner Oct 26 05:51:00 1983 I have a question about enum types. What size are they? Ritchie says that his compiler treats them as ints. But what about pcc? Are they sizeof int or sizeof char *? The latter would be preferable to me, since I am using enum's to hide pointers from the users of a package; for this to work across all machines, enum types must be (at least) as large as the largest possible pointer- size. Thanks in advance, Michael Turner (ucbvax!ucbesvax.turner)
mjs@rabbit.UUCP (10/27/83)
The type of an enum is int, though some compilers may shorten that to short or char if the enumerated values fit. -- Marty Shannon UUCP: {alice,rabbit,research}!mjs Phone: 201-582-3199
mrm@datagen.UUCP (06/12/84)
The X3J11 draft that is being worked on this week (6/11 -- 6/15) will state that: sizeof expression does not cause any side-effects to occur. Michael Meissner Data General Corporation ...{ allegra, ihpn4, rocky2, decvax!ittvax }!datagen!mrm
cottrell@nbs-vms.ARPA (01/19/85)
K&R page 192 first paragraph: "The compilers currently allow a pointer to be assigned to an integer, an integer to a pointer, and a pointer to a pointer of another type. THE ASSIGNMENT IS A PURE COPY OPERATION, WITH NO CONVERSION. This usage is nonportable, and may produce pointers which cause addressing exceptions when used. However, it is guaranteed that the assignment of the constant 0 to a pointer will produce a null pointer distinguishable from a pointer to any object." This says to me that the sizes must be the same. Changing the size is a conversion in my eye. I believe you when you say that there are compilers in wide use that do this, but I have heard lots of weird stuff about what someone's compiler does. Brain damage is everywhere! */
guy@rlgvax.UUCP (Guy Harris) (01/19/85)
> K&R page 192 first paragraph: > > "The compilers currently allow a pointer to be assigned to an integer, > an integer to a pointer, and a pointer to a pointer of another type. > THE ASSIGNMENT IS A PURE COPY OPERATION, WITH NO CONVERSION. ... > > This says to me that the sizes must be the same. Changing the size is > a conversion in my eye. ... Under "Explicit pointer conversions", p. 210: A pointer may be converted *to any of the integral types large enough to hold it. Whether an "int" or "long" is required is machine dependent.* ("Italics" mine.) Note that "integer" does not mean "int". "4. What's in a name", last paragraph, p. 182: Up to three sizes of integer, declared "short int", "int", and "long int", are available. So what they meant to say on p. 192 was that a pointer may be assigned to an integer large enough to hold it. On some machines, "int" may not be large enough to hold a pointer, and "long int" is the only integer to which a pointer may be assigned. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
thomas@utah-gr.UUCP (Spencer W. Thomas) (01/20/85)
In article <7527@brl-tgr.ARPA> cottrell@nbs-vms.ARPA writes: > > K&R page 192 first paragraph: > > "The compilers currently allow a pointer to be assigned to an integer, ********* >an integer to a pointer, and a pointer to a pointer of another type. >THE ASSIGNMENT IS A PURE COPY OPERATION, WITH NO CONVERSION. This usage is **** ***** ** >nonportable, and may produce pointers which cause addressing exceptions *********** > >This says to me that the sizes must be the same. Changing the size is >a conversion in my eye. Note the words I have underlined above. Nowhere in this paragraph does it say that this is a feature which a C compiler MUST have, only that this is a feature of CURRENT compilers. -- =Spencer ({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA) <<< Silly quote of the week >>>
jim@ISM780B.UUCP (01/21/85)
> K&R page 192 first paragraph: > > "The compilers currently allow a pointer to be assigned to an integer, >an integer to a pointer, and a pointer to a pointer of another type. >THE ASSIGNMENT IS A PURE COPY OPERATION, WITH NO CONVERSION. This usage is >nonportable, and may produce pointers which cause addressing exceptions >when used. However, it is guaranteed that the assignment of the constant >0 to a pointer will produce a null pointer distinguishable from a pointer >to any object." K&R is obsolete! K&R is obsolete! K&R is obsolete! K&R is obsolete! K&R is obsolete! K&R is obsolete! K&R is obsolete! K&R is obsolete! K&R is obsolete! K&R is obsolete! K&R is obsolete! K&R is obsolete! It is old, it is outdated, it is wrong. Those compilers that allow structure assignment, separate name spaces for structure members, and enums, that is, the compilers that implement the real C language, the one described by the C reference manual that AT&T distributes, and the one on which the ANSI standard is being based, generate a warning when such assignments occur. The C reference manual does not allow them; values must be explicitly cast before being assigned. >This says to me that the sizes must be the same. Changing the size is >a conversion in my eye. Have you seen an ophthalmologist lately? :-) Even given K&R, with its cavalier approach toward formal specification, this is not a reasonable interpretation, because the above paragraph says "integer", not "int", and a pointer can't be the same size as a char, a short, an int, and a long all at the same time. -- Jim Balter, INTERACTIVE Systems (ima!jim)
peterc@ecr.UUCP (Peter Curran) (01/22/85)
Of course, C SHOULD be defined to allow sizeof(int) != sizeof(int *). However, due to one point in the Reference Manual, and K&R (and, I assume, the standard, although I haven't checked), they are actually required to be equal. The problem is that "0" is defined to be the Null pointer constant. When "0" is passed as a parameter to a function, the compiler cannot tell whether an int or an int * is intended. The effect of this is that sizeof(int) must equal sizeof(int *), and even more, the value of the Null address constant must be bit-for-bit identical to the value of ((int) 0). Of course, many compilers do not conform to this requirement. The problem can be avoided by, for example, always using (say) NULL as the Null address constant, where NULL is #defined as something like ((char *) 0). Doing so conforms to the Reference Manual, but not doing so also conforms (and of course many, if not most, C programs don't follow this practice). The real solution, of course, would be to introduce a new keyword, say "null", which represents the Null address constant, with an implementation- defined value. However, I doubt that that will ever come about. Anyone who has made much effort at porting C code has no doubt encountered this problem, but I don't think it is as well known as it should be.
crandell@ut-sally.UUCP (Jim Crandell) (01/23/85)
> Even given K&R, with its cavalier approach toward formal specification, > this is not a reasonable interpretation, because the above paragraph says > "integer", not "int", and a pointer can't be the same size as a char, > a short, an int, and a long all at the same time. Unless, of course, char, short, int and long int are all the same, which K&R also fails to proscribe. -- Jim Crandell, C. S. Dept., The University of Texas at Austin {ihnp4,seismo,ctvax}!ut-sally!crandell
quenton@ecr.UUCP (Dennis Smith) (01/23/85)
The problem of passing 0 for a null pointer (as a parameter), and the solution of "#define NULL ...", as pointed out by P.Curran, is valid. However, the use of - #define NULL ((char *)0) although portable will cause many compilers to complain about differing pointer types, and will also cause lint to generate many additional useless messages. The only generally useable solution that I know of is - #define NULL 0 /** when sizeof(xxx *) == sizeof(int) **/ #define NULL 0L /** when sizeof(xxx *) == sizeof(long) **/ This unfortunately means that the "define" must be changed whenever the target machine/compiler/environment changes. One possible solution for the future could be the use of - #define NULL ((void *)0) which seems compatable with the notation of (void *) being a generic pointer type. It might also be noted, although I have had no experience with them, some compilers for certain older generations of computers, generate pointers of differing sizes. This occurs when the machine is not byte addressable, so that a pointer to a word aligned item might be "n" bits long, but a pointer to a character must point to the word and also indicate which character within the word. This would make the even more disastrous situation of sizeof(char *) != sizeof(int *) making the defintion of something like NULL even more incomprehensible.
chris@umcp-cs.UUCP (Chris Torek) (01/27/85)
> [...] C SHOULD be defined to allow sizeof(int) != sizeof(int *). > However, due to one point in the Reference Manual, [...] they are > actually required to be equal. The problem is that "0" is defined to > be the Null pointer constant. When "0" is passed as a parameter to a > function, the compiler cannot tell whether an int or an int * is > intended. The effect of this is that sizeof(int) must equal > sizeof(int *), and even more, the value of the Null address constant > must be bit-for-bit identical to the value of ((int) 0). NO! NO! and NO! [please turn your volume control way up] PASSING AN UNCASTED ZERO TO A ROUTINE THAT EXPECTS A POINTER IS NOT PORTABLE, AND IS JUST PLAIN WRONG. GET THAT STRAIGHT *NOW*! [you can turn your volume control back down] The following code is NOT portable and probably fails on half the existing implementations of C: #define NULL 0 /* this from <stdio.h> */ f() { g(NULL); } g(p) int *p; { if (p == NULL) do_A(); else do_B(); } The value ``f'' passes to ``g'' is the integer zero. What that represents inside g is completely undefined. It is not the nil pointer, unless your compiler just happens to work that way (not uncommon but not universal). It may not even be the same size (in bits or bytes or decidigits or whatever your hardware uses). One tiny little simple change fixes it: f() { g((int *)NULL); } It is now portable, and all that good stuff. You can write the first and hope real hard, or you can write the second and know. The point is that the zero value and the nil pointer are two completely different things, and the compiler happens to be obliged to convert the former to the latter in expressions where this is forced (e.g., casts or comparison with another pointer). It is NOT forced in function calls (though under the ANSI standard it would be in some cases). (I claim that it IS forced in expressions such as if (p) where p is a pointer; this is "if (p != 0)" where type-matching p and 0 forces the conversion.) (Now I WILL agree that if you have the option of making the nil pointer and the zero bit pattern the same, then you will have less trouble with existing programs if you do....) -- (This line accidently left nonblank.) In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690) UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (01/27/85)
Bogus, bogus. sizeof (int) is not required to be the same as sizeof (int *). sizeof (int *) is also not necessarily the same as sizeof (char *). 0 is not the same as (char *)0.
guy@rlgvax.UUCP (Guy Harris) (01/28/85)
> Of course, C SHOULD be defined to allow sizeof(int) != sizeof(int *). > However, due to one point in the Reference Manual, and K&R (and, I assume, > the standard, although I haven't checked), they are actually required to > be equal. The problem is that "0" is defined to be the Null pointer > constant. This has been said about 1.0E6 times before, but "0" is NOT defined to be the null pointer constant. K&R says: ...it is guaranteed that *ASSIGNMENT* (emphasis mine) of the constant 0 to a pointer will produce a null pointer distinguishable from a pointer to any object. Passing something as an argument doesn't behave like an assignment, precisely because the compiler can't perform the proper type coercions. You have to do so yourself; if you want to pass an "int" to a routine that expects a "double", you have to say "(double) foo", whereas if you wanted to assign that int to a double you could omit the case and the compiler would perform the coercion for you. The same holds for passing 0 to a routine expecting a pointer and assigning 0 to a pointer. You have to cast it. "lint" complains, and rightly so, if you pass an object of one type to a routine which expects an object of another type. It's *VERY EASY* to catch this kind of problem - just run "lint" every once in a while. We support 16-bit "int"s and 32-bit pointers on our machines; if this causes somebody's code to have problems because it says things like execl("/usr/local/foo", "foo", "bar", 0); instead of execl("/usr/local/foo", "foo", "bar", (char *)0); it's because the code is incorrect. Period. > Of course, many compilers do not conform to this requirement. It's not a requirement, so compilers don't have to conform. > The problem can be avoided by, for example, always using (say) NULL as > the Null address constant, where NULL is #defined as something like > ((char *) 0). OK. Everybody take a deep breath and repeat after me: THER IS NO ONE NULL ADDRESS CONSTANT IN C! There is no such thing as a generic "pointer" in C. There are pointers to characters, pointers to "int"s, pointers to "struct proc", pointers to... As such, there is no such thing as a generic null pointer. There are null pointers to no character, null pointers to no "int", etc.. As such, the problem should not be avoided by the above trick. "lint" will complain, and the code will choke on implementations where "char *" and "int *" have different representations (a word-addressed machine where a byte pointer would take more bits than fit in a word pointer REQUIRES such an implementation). > The real solution, of course, would be to introduce a new keyword, say "null", > which represents the Null address constant, with an implementation- > defined value. However, I doubt that that will ever come about. Let's hope not. It is an incorrect solution for the very reason mentioned above. The clses thing to a correct solution is the introduction of declarations of functions that declare the argument types; thus, the compiler could perform the necessary coercions just as it can do so in expressions. > Anyone who has made much effort at porting C code has no doubt encountered > this problem, but I don't think it is as well known as it should be. Anyone who has made much effort at porting C code has encountered lots of problems, all too many of which are due to people misusing the language. Many of those can be avoided by using "lint". Go forth and do so. Let's hope this kills this discussion off until the next time it shows up (which will probably be in another couple of months - it keeps returning like a bad penny). Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
g-frank@gumby.UUCP (01/28/85)
> Anyone who has made much effort at porting C code has encountered lots of > problems, all too many of which are due to people misusing the language. > Many of those can be avoided by using "lint". Go forth and do so. > The whole point of languages where the compiler does strong type checking is that no one gets to misuse the language, at least without making a conscious effort to do so. As long as it is easier to avoid a cast than to use one, and the compiler doesn't complain, lazy or rushed or habit-bound programmers will do so. With regard to lint: 1) Most people working in a Unix environment never use it, because they don't have to. 2) I have been desperately searching for an implementation for my own programming environment (PC-DOS and QNX on the IBM PC), thus far without any luck. It just doesn't seem to be very available in any but orthodox Unix systems. This should say something about the great esteem in which the C programming community holds lint. Human nature being what it is, "go forth and use lint" should get approx- imately the same enthusiastic response as "go forth and sin no more." -- Dan Frank "good news is just life's way of keeping you off balance."
mwm@ucbtopaz.CC.Berkeley.ARPA (01/29/85)
In article <347@ecr.UUCP> peterc@ecr.UUCP (Peter Curran) writes: >Of course, C SHOULD be defined to allow sizeof(int) != sizeof(int *). >However, due to one point in the Reference Manual, and K&R (and, I assume, >the standard, although I haven't checked), they are actually required to >be equal. The problem is that "0" is defined to be the Null pointer >constant. Sorry, but that's not quite right. Quoting K&R, page 192, first paragraph, last sentence: However, it is guaranteed that assignment of the constant 0 to a pointer will produce a null pointer distinguishable from a pointer to any object. In other words, "0" is not the null pointer constant, but coerces to it on assignment to a pointer. > When "0" is passed as a parameter to a function, the compiler >cannot tell whether an int or an int * is intended. Yup. That why you need to cast NULL parameters to the right type. Not doing the cast is a bug that will work on some machines, but not on all machines. > The effect of this is >that sizeof(int) must equal sizeof(int *), and even more, the value of the >Null address constant must be bit-for-bit identical to the value of ((int) 0). No. The effect is that the null address constant of type (type) must be bit-for-bit identical to ((type *) 0). ((int) 0) and ((type *) 0) don't even have to be the same size. >Of course, many compilers do not conform to this requirement. The problem >can be avoided by, for example, always using (say) NULL as the Null >address constant, where NULL is #defined as something like ((char *) 0). I've done that, but it's a kludge. The code will still be buggy, and the bugs will manifest themselves on any machine where (sizeof (char *)) != (sizeof (int *)) != (sizeof (struct gort *)). This is one of the reasons adding the parameters to the declaration of an external (or forward-referenced) function. >The real solution, of course, would be to introduce a new keyword, say "null", >which represents the Null address constant, with an implementation- >defined value. However, I doubt that that will ever come about. Sounds like a good idea to me. Trouble is, you still have the problem of figureing out which "null" to pass to an external procedure. <mike
mjl@ritcv.UUCP (Mike Lutz) (01/29/85)
> OK. Everybody take a deep breath and repeat after me: > > THER IS NO ONE NULL ADDRESS CONSTANT IN C! > > There is no such thing as a generic "pointer" in C. Guy's right, of course. For those who want null pointers of various types, might I suggest the following macro: /* * Make a Null Pointer for objects of type 't' */ #define NullPtr(t) ( ((t) *) 0 ) This permits code like: execl( "/bin/foo", "foo", "bar", NullPtr(char) ) ; This can be quickly fixed for those who object to creating pointers to objects of type 't' but want Null pointers for (pointer) type t. With a bit of imagination, you can create macros to allocate & free objects of type 't' in a type safe fashion, using malloc/calloc/free. -- Mike Lutz Rochester Institute of Technology, Rochester NY UUCP: {allegra,seismo}!rochester!ritcv!mjl CSNET: mjl%rit@csnet-relay.ARPA
Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (01/29/85)
"lint" is invaluable and we use it frequently in our application development work. It picks up many common mistakes, some of which are unintentional and some of which are due to misunderstandings. The rule in my team is "make the code pass lint completely or else explain why it can't possibly".
friesen@psivax.UUCP (Stanley Friesen) (01/29/85)
In article <351@ecr.UUCP> quenton@ecr.UUCP (Dennis Smith) writes: >It might also be noted, although I have had no experience with them, >some compilers for certain older generations of computers, generate >pointers of differing sizes. This occurs when the machine is not >byte addressable, so that a pointer to a word aligned item might >be "n" bits long, but a pointer to a character must point to the >word and also indicate which character within the word. >This would make the even more disastrous situation of > sizeof(char *) != sizeof(int *) >making the defintion of something like NULL even more incomprehensible. And not only "older" computers, the current Honeywell mainframe has an architecture which works like this. Of course that is because they decided to maintain code compatibility with their old 600 series from the mid-60s. -- Sarima (Stanley Friesen) {trwrb|allegra|cbosgd|hplabs|ihnp4|aero!uscvax!akgua}!sdcrdcf!psivax!friesen or quad1!psivax!friesen
guy@rlgvax.UUCP (Guy Harris) (01/30/85)
> The problem of passing 0 for a null pointer (as a parameter), and > the solution of "#define NULL ...", as pointed out by P.Curran, > is valid. As you state below, there are machines where sizeof(char *) != sizeof(int *). This solution (#define NULL ((char *)0)) is NOT valid on those machines. > The only generally useable solution > that I know of is - > #define NULL 0 /** when sizeof(xxx *) == sizeof(int) **/ > #define NULL 0L /** when sizeof(xxx *) == sizeof(long) **/ This isn't a solution. There may be machines in which the bit pattern that (char *)0 represents is something other than N zeros, where N is the number of bits in an "int" or a "long int". > One possible solution for the future could be the use of - > #define NULL ((void *)0) > which seems compatable with the notation of (void *) being a generic > pointer type. This won't work either. If a routine expects an "int *", dammit, it expects an "int *", not an "int", not a "long int", not a "char *", and not a "void *". What is so d*mn difficult about putting in pointer casts? It's second nature to me now, and has been for several years - dating back to PDP-11 days when it wasn't a problem. Don't think of C as structured assembler, where you "know" what's "really happening". Use it as a typed language, albeit with weak type checking. Trust me, you'll be happier for doing so. Can we put this discussion to bed now, with the conclusion that the only correct solution to the problem, pending ANSI Standard C with the ability to import the declaration of the arguments to a routine, is to put the ****** pointer casts in? Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
jack@vu44.UUCP (Jack Jansen) (01/30/85)
I while ago, I tried something that I was almost sure would fail (which it did), being: #if sizeof(struct foobar) != BLKSIZ I *know* why this fails, but still the most recent definition I saw of #if is #if <constant-expression> which includes sizeof(). Does anyone know whether the new standard has changed this, or changed the definition of constant-expression not to include sizeof()? Or is everyone supposed to integrate the preprocessor into the compiler (yuck)? -- Jack Jansen, {seismo|philabs|decvax}!mcvax!vu44!jack or ...!vu44!htsa!jack Help! How do I make a cup of tea while working with an acoustic modem?
cottrell@nbs-vms.ARPA (02/01/85)
/* Bob Larson has a machine with 48 bit ptr's & 32 bit int's & long's. What is this beast? Someone also said that ptr's to different objex may be different sizes. Where & why? I realize that a certain machine may desire this to make implementation as efficient as possible, but I think the designers should just bite the bullet and make all ptr's the same size. The machine is probably brain damaged anyway. Any machine not byte addressable is an inhospitable host for C at best. As I said before, my model is the pdp-11/vax architecture. The 68000 and 32032 fall into this category. I really don't care if my code is not portable to some weird architecture I have never seen or do not wish to see again (u1108). */
gam@amdahl.UUCP (gam) (02/01/85)
> > Anyone who has made much effort at porting C code has encountered lots of > > problems, all too many of which are due to people misusing the language. > > Many of those can be avoided by using "lint". Go forth and do so. > > > With regard to lint: > > 1) Most people working in a Unix environment never use it, because they > don't have to. > Human nature being what it is, "go forth and use lint" should get approx- > imately the same enthusiastic response as "go forth and sin no more." Lint is widely used here. On more than one occasion a casual misuse of pointers or arrays were pointed out by lint. Also lint gets almost as much attention as the C compiler, as far as program maintanence is concerned. And porting programs from other systems would be a painful task without lint. People here aren't using lint because it is a good upright moral thing to do; they use it because it helps to solve problems. That's what a good tool is for. -- Gordon A. Moffett ...!{ihnp4,hplabs,sun}!amdahl!gam
guy@rlgvax.UUCP (Guy Harris) (02/02/85)
> Someone also said that ptr's to different objex may be different sizes. > Where & why? Where: On a Zilog 8000 running in segmented mode (24-bit pointers). On a Motorola 68000. On an Intel 802*8[68] running large-model code. Just because a machine supports large pointers doesn't mean that it supports 32-bit arithmetic well. The Z8000 probably does 32-bit arithmetic 16 bits at a time. The 68000 definitely does, and can't do 32-bit multiplies or divides conveniently at all. Why: Because it doesn't say anywhere that you can't. Because you may not want to pay the penalty for 32-bit arithmetic. > I realize that a certain machine may desire this to make implementation > as efficient as possible, but I think the designers should just bite > the bullet and make all ptr's the same size. On this point, I agree. 16-bit "int"s make techniques like using "malloc" and "realloc" to grow a large table (used by such obscure programs as "nm") lose big. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
henry@utzoo.UUCP (Henry Spencer) (02/03/85)
> With regard to lint: > > 1) Most people working in a Unix environment never use it, because they > don't have to. Most people working in a Unix environment write cruddy code as a result. Those of us with sense use lint at the drop of a bit, and delint other people's code routinely [SIGH]. > 2) I have been desperately searching for an implementation for my own > programming environment (PC-DOS and QNX on the IBM PC), thus far > without any luck. It just doesn't seem to be very available in any > but orthodox Unix systems. This should say something about the great > esteem in which the C programming community holds lint. Not quite true. The relevant issue is not the value of lint, but the total lack of any sort of public specifications for it. The only way to find out what lint does is to read the AT&T code, after which writing one of your own is legally tricky. > Human nature being what it is, "go forth and use lint" should get approx- > imately the same enthusiastic response as "go forth and sin no more." Generally, that's about the sort of response it gets from sloppy coders. Those who take the advice to heart, generally come to appreciate its value. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
ksbszabo@wateng.UUCP (Kevin Szabo) (02/03/85)
As Guy says, sizeof( int ) != sizeof( int *). However, a lot of code depends on the two sizes being the same (unfortunately). We have a 68k beast with a Micrsoft port of systemIII. On this machine an Int is 32 bits. Isn't an int supposed to be the natural word size for a machine? Is 32 bits a natural size for a 68000? I guess it will be for the 68k which have 32 bit busses, but what about the machine with the 16 bit data bus? I have a feeling that the integer size was picked more for porting convenience than anything else, of course I have been wrong many times before. -- Kevin Szabo watmath!wateng!ksbszabo (U of Waterloo VLSI Group, Waterloo Ont.)
friesen@psivax.UUCP (Stanley Friesen) (02/04/85)
In article <7904@brl-tgr.ARPA> cottrell@nbs-vms.ARPA writes: >/* >Bob Larson has a machine with 48 bit ptr's & 32 bit int's & long's. >What is this beast? Someone also said that ptr's to different objex >may be different sizes. Where & why? I realize that a certain machine >may desire this to make implementation as efficient as possible, but I >think the designers should just bite the bullet and make all ptr's >the same size. I dont't know about Bob Larsons's machine, but I used to use a Honeywell mainframe. The architecture dates back to the 60's, and Honeywell has kept it around in the name of upward compatibility. The machine is a *word* oriented machine with a 36 bit word(yes 36, not 32), each instruction is one word in length, and contains 1-1/2 addresses, a memory address and a register specifier. A word pointer is simply a word containing a memory address in the same bits holding one in an instruction word, the last 17 bits of the word. Such a word can be used with any normal indirection mode(there are three on these machines). If you want byte addressing(9-bit bytes *or* 6-bit bytes), you must use a special "tagged" indirect mode in which the indirect word(the pointer) contains a normal address plus a special field specifying the byte within the destination word. Because of the way "tagged" indirection works it may be necessary to make a *copy* of the pointer to use for dereferencing if you intend to re-use it.!! This is what I call brain-damaged, but it is *real*, and a Honeywell C-compiler must put up with different types of pointers for ints and chars, *or* make chars 36 bits. -- Sarima (Stanley Friesen) {trwrb|allegra|cbosgd|hplabs|ihnp4|aero!uscvax!akgua}!sdcrdcf!psivax!friesen or quad1!psivax!friesen
cottrell@nbs-vms.ARPA (02/05/85)
/* > > Someone also said that ptr's to different objex may be different sizes. > > Where & why? > > Where: > > On a Zilog 8000 running in segmented mode (24-bit pointers). > On a Motorola 68000. > On an Intel 802*8[68] running large-model code. > Just because a machine supports large pointers doesn't mean that it > supports 32-bit arithmetic well. The Z8000 probably does 32-bit > arithmetic 16 bits at a time. The 68000 definitely does, and can't do > 32-bit multiplies or divides conveniently at all. The 68000 only uses 24 bits for addressing, but uses them either 1) as a 32 bit item in the instruxion stream, & 2) in a 32 bit register. While it would be possible for an implementor to use only 3 bytes, the space saved would be offset by the overhead in loading into a four byte register & masking. The 24 bit restrixion is only temporary anyway. Future versions will probably allow 32 bits. I think the SUN mmu axually uses these bits. I think the z8000 has 32 bit regs too. > Why: > > Because it doesn't say anywhere that you can't. Because you may not want > to pay the penalty for 32-bit arithmetic. If you have a machine with an address space > 64k, you probably have 32 bit registers. > > I realize that a certain machine may desire this to make implementation > > as efficient as possible, but I think the designers should just bite > > the bullet and make all ptr's the same size. > > On this point, I agree. 16-bit "int"s make techniques like using "malloc" > and "realloc" to grow a large table (used by such obscure programs as "nm") > lose big. I just read a news item from yourself which stated: "THERE IS NO SUCH THING AS A GENERIC NULL POINTER" Presumably because of different length pointers. Which way do you want it? > > Guy Harris > {seismo,ihnp4,allegra}!rlgvax!guy */
Doug Gwyn (VLD/VMB) <gwyn@BRL-VLD.ARPA> (02/05/85)
Does anybody other than Cottrell have difficulty coping with various sizes of pointers?
henry@utzoo.UUCP (Henry Spencer) (02/05/85)
> Does anyone know whether the new standard has ... > changed the definition of constant-expression not to include > sizeof()? Recent drafts of the ANSI standard outlaw sizeof() in #if. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
henry@utzoo.UUCP (Henry Spencer) (02/05/85)
> I have a feeling that [this] integer size [32 bits on 68000] was picked > more for porting convenience than anything else... Very probably. While rules like "don't assume pointers and integers are the same size" and "don't assume *(char *)0 == '\0'" are good advice for writing new code, an unfortunate amount of old code breaks them. You get a choice of having to fix it all, or arranging for the dubious assumptions to remain true. For obvious reasons, many people with a product to get out the door have taken the latter approach. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
henry@utzoo.UUCP (Henry Spencer) (02/05/85)
> ... Someone also said that ptr's to different objex > may be different sizes. Where & why? I realize that a certain machine > may desire this to make implementation as efficient as possible, but I > think the designers should just bite the bullet and make all ptr's > the same size. The machine is probably brain damaged anyway. Any > machine not byte addressable is an inhospitable host for C at best. Quite true, but often a poor C implementation is better than none. The idea is to make it no poorer than you have to. This often means that whatever kludges are needed for "char *" really shouldn't have to be applied to all pointers. > As I said before, my model is the pdp-11/vax architecture. The 68000 > and 32032 fall into this category. I really don't care if my code > is not portable to some weird architecture I have never seen or > do not wish to see again (u1108). I'm afraid what you are really saying is that you don't really care about portability at all. "If the machine is very similar to mine, maybe it'll run; if not, tough." I understand but can't sympathize. It's not that much harder to do it right. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
breuel@harvard.ARPA (Thomas M. Breuel) (02/05/85)
> Does anybody other than Cottrell have difficulty coping > with various sizes of pointers? Yes, I do, and looking at UN*X source code for the VAX and the PDP I would believe that most of Berkley's programmers/students would also have problems. Of course, you can always work around sizeof(char *)!=sizeof(int *) (or sizeof(char *)!=sizeof(int)), but often it is a hassle, and it makes porting old (4.2BSD :-) source code very difficult. Thomas.
mark@tove.UUCP (Mark Weiser) (02/05/85)
In article <1071@amdahl.UUCP> gam@amdahl.UUCP (gam) writes: >> > Anyone who has made much effort at porting C code has encountered lots of >> > problems, all too many of which are due to people misusing the language. >> > Many of those can be avoided by using "lint". Go forth and do so. >> > >> With regard to lint: >> >> 1) Most people working in a Unix environment never use it, because they >> don't have to. >> Human nature being what it is, "go forth and use lint" should get approx- >> imately the same enthusiastic response as "go forth and sin no more." > >Lint is widely used here....porting programs from other systems would be >a painful task without lint. > We have Pyramid's and Vaxes as our main machines. If your code passes lint, it is likely to run both, in spite of the fact that the stack is used differently, the byte orders are reversed, and they require different word alignments when acessing structures. Lint is handy -- Spoken: Mark Weiser ARPA: mark@maryland Phone: +1-301-454-7817 CSNet: mark@umcp-cs UUCP: {seismo,allegra}!umcp-cs!mark USPS: Computer Science Dept., University of Maryland, College Park, MD 20742
guy@rlgvax.UUCP (Guy Harris) (02/05/85)
> The 68000 only uses 24 bits for addressing, but uses them either > 1) as a 32 bit item in the instruxion stream, & 2) in a 32 bit register. 10 points for originality, but that's NOT why I said a Motorola 68000-based machine may have sizeof (int) != sizeof (int *). The reason why they may (and do - our current ones do; it's a pain in the *ss, and I prefer 32-bit "int"s, but what we did was not illegal, immoral, or fattening) be different is: > > > > Because it doesn't say anywhere that you can't. Because you may not want > > to pay the penalty for 32-bit arithmetic. > > If you have a machine with an address space > 64k, you probably have > 32 bit registers. Who said anything about the register size? The 68000 has 32-bit registers; anybody who claims there's no speed penalty for 32-bit arithmetic on the 68000 has been smoking those funny cigarettes too long. 32-bit by 32-bit divides are NOT cheap on the 68000 (32-bit by 16-bit ones aren't cheap either, but they're a lot cheaper than 32-bit by 32-bit ones). > > On this point, I agree. 16-bit "int"s make techniques like using "malloc" > > and "realloc" to grow a large table (used by such obscure programs as "nm") > > lose big. > > I just read a news item from yourself which stated: > "THERE IS NO SUCH THING AS A GENERIC NULL POINTER" > Presumably because of different length pointers. Which way do you want it? 1) It's not just because of different length pointers; the representation could be different. 2) I prefer "int"s to be able to hold the size, in bytes, of an object of approximately the same size as the address space. I don't give a tinker's damn whether different kinds of pointers are congruent or not. One can imagine a machine where "int"s are big enough to hold the size of the aforementioned object in bytes, but where "char *" and "int *" are different sizes. An example would be a word-addressed machine with a 64KW address space. Choose sizeof (int) == 32 bits (remember, sizes are in bytes), sizeof (int *) == 16 bits (large enough to hold the largest possible pointer to "int"), sizeof (char *) == 32 bits (at least 17 bits are necessary). Possibly a dumb machine, but it points out that the question "Which way do (I) want it" is meaningless, given that your two choices are not mutually exclusive. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
guy@rlgvax.UUCP (Guy Harris) (02/05/85)
> Of course, you can always work around sizeof(char *)!=sizeof(int *) > (or sizeof(char *)!=sizeof(int)), but often it is a hassle, and > it makes porting old (4.2BSD :-) source code very difficult. Changing the implementation of, say, "getpwent" and the password file can make porting programs that rummage through the password file directly difficult. This is NOT an argument against changing the implementation. It is an argument against writing such programs in the future, and for dedicating time to clean up those fossils if you make such a change. The same applies to implementations of C on machines that don't encourage the same sorts of laxity as "reasonable" machines do. If expedience is VERY important, you might consider doing the wrong thing; however, I think you're better off biting the bullet and fixing the code (and reporting fixes to AT&T, UCB, or whoever wrote it - maybe they'll take the hint). Think of it as doing a good deed for the next person who has to move that software to a machine which isn't a warmed-over PDP-11. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
Larry Carroll <LARRY@JPL-VLSI.ARPA> (02/06/85)
>Does anybody other than Cottrell have difficulty coping >with various sizes of pointers? "Trouble"? No, but it's a detail that everyone has to cope with and even experienced C programmers sometimes forget to do it. Especially if you're someone like me who's been away from C for a while discussions like these help me keep in mind practical matters that I may forget. Larry @ jpl-vlsi ------
jss@sjuvax.UUCP (J. Shapiro) (02/06/85)
[Aren't you hungry...?] It occurs to me that the length of a pointer simply being required to be constant (even within the same data type) presents problems. Many microprocessors now implement span-dependent addressing, and if your implementation allows passing of an array wholesale, and that array is small, there is no reason why one shouldn't be able to optimize the pointer size as being relative to some register variable which points to the current function's bottom of passed data. Is this a problem in practice - are pointers in fact obliged to be the same size everywhere, or am I missing something? On the topic of sizeof(int) == sizeof(int *), I refer you to K&R p. 210, which says: 1. A pointer may be converted to any of the integral types long enough to hold it. Whether an int or a long is required is machine dependent. 2. An object of integral type may be explicitly converted to a pointer... Since compilers need to do type checking anyway, passing 0 instead of NULL should always be valid. Note that K&R says that assigning 0 to an integer generates the appropriate NULL pointer. This type conversion (it is implied) is automagic, and thus there *is* a generic NULL, which is the integer 0. It is also mentioned that "The mapping function... is intended to be undsurprising to those who know the addressing structure of the machine," which is a loophole big enough to fly a barn through. Jon Shapiro Haverford College
bsa@ncoast.UUCP (Brandon Allbery) (02/08/85)
> Article <7810@brl-tgr.ARPA>, from Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> +---------------- | The rule in my team is "make the code pass lint completely or else | explain why it can't possibly". STANDARD RESPONSE: Is it reasonable that I should have to write: char *foo = "/tmp"; chdir(foo); instead of chdir("/tmp"); just to satisfy lint? It gets impossible to trace through the garbage our lint puts out. (Before you tell me to fix lint, give me a Xenix source license.) Brandon (bsa@ncoast.UUCP) -- Brandon Allbery, decvax!cwruecmp!ncoast!bsa, "ncoast!bsa"@case.csnet (etc.) 6504 Chestnut Road, Independence, Ohio 44131 +1 216 524 1416 (or what have you)
garys@bunker.UUCP (Gary M. Samuelson) (02/08/85)
> On the topic of sizeof(int) == sizeof(int *), I refer you to K&R p. > 210, which says: > > 1. A pointer may be converted to any of the integral types > long enough to hold it. Whether an int or a long is required > is machine dependent. > > 2. An object of integral type may be explicitly converted to a > pointer... > > Since compilers need to do type checking anyway, passing 0 instead of NULL > should always be valid. Note that K&R says that assigning 0 to an integer > generates the appropriate NULL pointer. This type conversion (it is > implied) is automagic, and thus there *is* a generic NULL, which is the > integer 0. Your reasoning breaks down at the implicit assumption that passing 0 as an argument to a function constitutes an assignment. It doesn't; the compiler does not know the types of function arguments where the function is called. E.g., when you write foo(bar), the compiler knows what "bar" is, but has no idea what type foo's formal parameter has. > It is also mentioned that "The mapping function... is intended to be > unsurprising to those who know the addressing structure of the machine," > which is a loophole big enough to fly a barn through. > Jon Shapiro > Haverford College Gary Samuelson
henry@utzoo.UUCP (Henry Spencer) (02/08/85)
> 2. An object of integral type may be explicitly converted to a > pointer... > > Since compilers need to do type checking anyway, passing 0 instead of NULL > should always be valid. Wrong, you forgot the word "explicit". That means *you* have to do it. The compiler won't do it for you in parameter-passing. Remember that current C compilers do not (cannot) check types of parameters. > Note that K&R says that assigning 0 to an integer > generates the appropriate NULL pointer. This type conversion (it is > implied) is automagic, and thus there *is* a generic NULL, which is the > integer 0. For the 157th time, WRONG. The only generic NULL pointer in C is the *literal* integer *constant* 0. An integer *value* equal to 0 is *not* a NULL pointer; only the constant 0 will do. Unless the character "0" appears -- either explicitly or via "#define NULL 0" -- at the place where the conversion to pointer is being performed, then it's not a real NULL pointer. If you read K&R carefully, you will discover that this is what it really says. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
ndiamond@watdaisy.UUCP (Norman Diamond) (02/08/85)
> I used to use > a Honeywell mainframe. The architecture dates back to the 60's, > and Honeywell has kept it around in the name of upward compatibility. > The machine is a *word* oriented machine with a 36 bit word(yes 36, > not 32), ... > This is what I call brain-damaged, but it is *real*, and a Honeywell > C-compiler must put up with different types of pointers for ints and > chars, *or* make chars 36 bits. > -- Sarima (Stanley Friesen) Not only that, but its characteristics are quoted in a table appearing TWICE in K&R. Therefore EVERYONE has already been warned about such kinds of brain damage. -- Norman Diamond UUCP: {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdaisy!ndiamond CSNET: ndiamond%watdaisy@waterloo.csnet ARPA: ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa "Opinions are those of the keyboard, and do not reflect on me or higher-ups."
guy@rlgvax.UUCP (Guy Harris) (02/09/85)
> It occurs to me that the length of a pointer simply being required to > be constant (even within the same data type) presents problems. Many > microprocessors now implement span-dependent addressing, and if your > implementation allows passing of an array wholesale, and that array is > small, there is no reason why one shouldn't be able to optimize the pointer > size as being relative to some register variable which points to the > current function's bottom of passed data. > > Is this a problem in practice - are pointers in fact obliged to be the > same size everywhere, or am I missing something? No, you are not missing anything - there is only one representation of a "foo *" in a C program. In the example you give, the called routine would have to know that the pointer was relative to that particular register (or, if the pointer indicated that fact, would at least have to know that the pointer in question was a relative pointer). This means that you'd either have to introduce a character replacing "*" to indicate this new flavor of pointer, or introduce a pragma which said "you can use a relative pointer here". (Which microprocessor are you referring to?) > Since compilers need to do type checking anyway, passing 0 instead of NULL > should always be valid. The only problem is that C compilers do *not* do type checking in function calls, because there's no way in the current C language to say that a particular function's third argument is a "char *". As such, passing 0 instead of NULL is not valid (well, passing 0 is the same as passing NULL, and both are invalid; passing (char *)NULL is valid). > Note that K&R says that assigning 0 to an integer generates the appropriate > NULL pointer. This type conversion (it is implied) is automagic, and thus > there *is* a generic NULL, which is the integer 0. No, there is no "generic NULL", there are several "appropriate NULL"s. The integer 0 just happens to be a way of telling the compiler to generate whatever null pointer is appropriate for the pointer type that appears in the expression. Maybe if the word "nil" had been a reserved word in C, and C used "nil" instead of "0" for this purpose, a lot of the confusion that null pointers cause might never have happened. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (02/10/85)
There is no way, with separate compilation of modules, that a current C compiler can determine what pointer type to coerce a 0 function argument to, which is why the programmer must do this himself. In the draft ANSI C standard, if a function prototype is specified then it will indeed be possible (and required) that the compiler coerce an argument to the right type. Actually, some of us don't like this since it hides coding errors; it would be nice if the compiler (or at least lint) could give a warning when this coercion was done.
robert@gitpyr.UUCP (Robert Viduya) (02/10/85)
>< Posted from Doug Gwyn (VLD/VMB) <gwyn@BRL-VLD.ARPA> > Does anybody other than Cottrell have difficulty coping > with various sizes of pointers? I don't have difficulty with it, but I do feel that all pointers should be the same size. A pointer is a pointer, regardless of what it points to. It's a datatype all by itself; it isn't a mutation of the datatype it points to. Perhaps an addition to the language is in order (gotta have something to handle those Intel chips). Well, since C allows you to have 'long int', 'int' and 'short int', what about long pointers, pointers and short pointers? Don't ask me how they would be declared; I'll leave that up to someone else. robert -- Robert Viduya Georgia Institute of Technology ...!{akgua,allegra,amd,hplabs,ihnp4,masscomp,ut-ngp}!gatech!gitpyr!robert ...!{rlgvax,sb1,uf-cgrl,unmvax,ut-sally}!gatech!gitpyr!robert
Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (02/11/85)
If your "lint" is indeed broken, and you don't maintain your own system, then GET YOUR VENDOR TO FIX "lint". The more people just take whatever garbage the so-called UNIX vendors dish out, the worse the situation will become.
ndiamond@watdaisy.UUCP (Norman Diamond) (02/11/85)
> I don't have difficulty with it [various sizes of pointers], but I do > feel that all pointers should be the same size. A pointer is a pointer, > regardless of what it points to. It's a datatype all by itself; it isn't > a mutation of the datatype it points to. > > Perhaps an addition to the language is in order (gotta have something to > handle those Intel chips). Well, since C allows you to have 'long int', > 'int' and 'short int', what about long pointers, pointers and short pointers? > Don't ask me how they would be declared; I'll leave that up to someone > else. > -- Robert Viduya Then no one will know when to declare a long pointer or short pointer. They know they need a (struct xxx *) or a (char *), they should have a compiler that's bright enough to figure out whether a long pointer or short pointer is needed, for each machine they want to run their program on. In PL/I, a pointer is a datatype all by itself. On some machines, in order to be able to "point" to either integers or characters, you have to waste 3/4 of the memory your strings are stored in, and you can't use the machine's string instructions. On Intel, you can make all pointers the same size by using long pointers for everything, whether they're needed or not. Or, you can use a language that has a little bit of flexibility, and lets the compiler figure out such things. These are the reasons that Pascal, despite all of its shortcomings, is more portable in some ways than C is. People in net.lang.pascal are complaining about the same things, not being able to assign pointers to ints. Sure, let's reduce the portability of every existing language, and give more jobs to portability and languages people so that they can repeat the cycle, eh? -- Norman Diamond UUCP: {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdaisy!ndiamond CSNET: ndiamond%watdaisy@waterloo.csnet ARPA: ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa "Opinions are those of the keyboard, and do not reflect on me or higher-ups."
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (02/20/85)
> A pointer is a pointer, regardless of what it points to. > It's a datatype all by itself; it isn't a mutation of the datatype it points > to. You must be thinking of some other language, Algol perhaps.
gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (02/21/85)
> .... Or, you can use a language > that has a little bit of flexibility, and lets the compiler figure out > such things. > > These are the reasons that Pascal, despite all of its shortcomings, is > more portable in some ways than C is. ??? Conclusion does not follow. Please do not confuse the complaints from people who want C to be different with what C is.
ndiamond@watdaisy.UUCP (Norman Diamond) (02/24/85)
> > .... Or, you can use a language > > that has a little bit of flexibility, and lets the compiler figure out > > such things. > > > > These are the reasons that Pascal, despite all of its shortcomings, is > > more portable in some ways than C is. > > ??? Conclusion does not follow. Please do not confuse the complaints > from people who want C to be different with what C is. In Pascal, if you want variables to be able to hold integers of certain sizes, you specify the bounds. The compiler figures out if it needs a short, long, etc. Same for sizes of sets (though a few early brain-damaged implementations of Pascal created non-believers). Both Pascal and the present definition of C do this for pointers. -- Norman Diamond UUCP: {decvax|utzoo|ihnp4|allegra}!watmath!watdaisy!ndiamond CSNET: ndiamond%watdaisy@waterloo.csnet ARPA: ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa "Opinions are those of the keyboard, and do not reflect on me or higher-ups."
dir@obo586.UUCP (Dan Rosenblatt) (05/03/85)
[] A function I wrote for some graphics software looked something like: vecmul(invec,inmat,outvec) double invec[3],inmat[3][3],outvec[3]; { int i,j; double tmpvec[3]; for (i=0;i<3;++i) { tmpvec[i] = 0.; for (j=0;j<3;++j) tmpvec[i] += invec[j] * inmat[i][j]; } bcopy((char *)outvec,(char *)tmpvec,sizeof(outvec)); } The calling sequence for bcopy was: (dst,src,size_in_bytes). The problem is that 'sizeof(outvec)' produced the value 2 instead of what I expected - 24. The reason is (as I kick myself around the room :-}) is that an array which is a parameter to a function becomes a pointer to that array. The 2 comes from the fact that I'm running on a 16-bit 8086 chip. 'nuf said. Dan Rosenblatt obo Systems, Inc. ...{ihnp4!denelcor,nbires!gangue}!obo586!dir
grayson@uiucuxc.CSO.UIUC.EDU (09/29/86)
It is interesting that the expression sizeof (int) - 1 is ambiguous in C, for it can be parsed as sizeof ((int)(- 1)) or as (sizeof(int)) - 1 Think about it! The unix compiler does it the second way, for when it sees the '-' it sets the precedence for that character ASSUMING it will be used as a binary operator.
vedm@hoqam.UUCP (BEATTIE) (09/30/86)
> > It is interesting that the expression > sizeof (int) - 1 > is ambiguous in C, for it can be parsed as > sizeof ((int)(- 1)) > or as > (sizeof(int)) - 1 > > Think about it! > > The unix compiler does it the second way, for when it sees the '-' it > sets the precedence for that character ASSUMING it will be used as a > binary operator. There is nothing ambiguous about it. K&R p188: "The construction sizeof(type) is taken to be a unit, so the expression sizeof(type)-2 is the same as (sizeof(type))-2." Tom. ...!{decvax | ucbvax}!ihnp4!hoqax!twb
garys@bunker.UUCP (Gary M. Samuelson) (10/01/86)
>> It is interesting that the expression >> sizeof (int) - 1 >> is ambiguous in C, for it can be parsed as >> sizeof ((int)(- 1)) >> or as >> (sizeof(int)) - 1 >> The unix compiler does it the second way, for when it sees the '-' it ------------- (speaking of ambiguous) >> sets the precedence for that character ASSUMING it will be used as a >> binary operator. >There is nothing ambiguous about it. >K&R p188: >"The construction sizeof(type) is taken to be a unit, so the >expression sizeof(type)-2 is the same as (sizeof(type))-2." You're both partially right, and partially wrong. The expression would be ambiguous, if not for the disambiguating rule where the compiler is not *assuming* that the minus sign is binary, but *deciding* that it is. Gary Samuelson
drw@cullvax.UUCP (Dale Worley) (10/01/86)
> It is interesting that the expression > sizeof (int) - 1 > is ambiguous in C, for it can be parsed as > sizeof ((int)(- 1)) > or as > (sizeof(int)) - 1 > > Think about it! Both K&R (App. A, 7.2) and Harbison&Steele (7.4.2) note that it is ambiguous on the face of it, and that it is to be resolved in favor of (sizeof (int)) - 1 Dale
pedz@bobkat.UUCP (Pedz Thing) (10/02/86)
In article <102500008@uiucuxc> grayson@uiucuxc.CSO.UIUC.EDU writes: > >It is interesting that the expression > sizeof (int) - 1 >is ambiguous in C, for it can be parsed as > sizeof ((int)(- 1)) >or as > (sizeof(int)) - 1 > At first, I thought this note was really stupid. I had the idea that unary minus was down at a different level from the other unary operators. Then as I looked more and more into it, I came to these conclusions. First, what the compiler does is correct because this exact case is mentioned in the K & R (Last paragraph of section 7.2 in Appendix A, page 188). Second, this is a special case. Sizeof, type cast, and unary minus are all at the same precedence and they associate right to left. Thus the normal interpretation would be (sizeof ((int) (-1))). This is not the correct interpretation however as I just mentioned. -- Perry Smith ctvax ---\ megamax --- bobkat!pedz pollux---/
rgenter@labs-b.bbn.com (Rick Genter) (10/03/86)
The expression sizeof (int) - 1 is not ambiguous. The operand of "sizeof" must either by a typecast or an lvalue. "(int) -1" is neither. -------- Rick Genter BBN Laboratories Inc. (617) 497-3848 10 Moulton St. 6/512 rgenter@labs-b.bbn.COM (Internet new) Cambridge, MA 02238 rgenter@bbn-labs-b.ARPA (Internet old) linus!rgenter%BBN-LABS-B.ARPA (UUCP)
ark@alice.UucP (Andrew Koenig) (10/04/86)
> The expression > > sizeof (int) - 1 > > is not ambiguous. The operand of "sizeof" must either by a typecast or > an lvalue. "(int) -1" is neither. Nope. The operand of "sizeof" can be an rvalue.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/03/86)
In article <663@dg_rtp.UUCP> throopw@dg_rtp.UUCP (Wayne Throop) writes: >True, true. But ANSI is likely to decide that sizeof(char) MUST ALWAYS >BE one (and I think this is universally true on existing >implementations... if I'm wrong, somebody let me know). X3J11 as it stands requires sizeof(char)==1. I have proposed that this requirement be removed, to better support applications such as Asian character sets and bitmap display programming. Along with this, I proposed a new data type such that sizeof(short char)==1. It turns out that the current draft proposed standard has to be changed very little to support this distinction between character objects (char) and smallest-addressable objects (short char). This is much better, I think, than a proposal that introduced (long char) for text characters. Unfortunately, much existing C code believes that "char" means "byte". My proposal would allow implementors the freedom to decide whether supporting this existing practice is more important than the benefits of making a distinction between the two concepts. It is possible to write code that doesn't depend on sizeof(char)==1, and some C programmers are already careful about this. Transition to the more general scheme would occur gradually (if at all) for existing C implementations, with only implementors of systems for the Asian market and of bitmap display architectures initially taking advantage of the opportunity to make these types different sizes.
guy@sun.uucp (Guy Harris) (11/05/86)
> X3J11 as it stands requires sizeof(char)==1. I have proposed that > this requirement be removed, to better support applications such as > Asian character sets and bitmap display programming. Along with > this, I proposed a new data type such that sizeof(short char)==1. > It turns out that the current draft proposed standard has to be > changed very little to support this distinction between character > objects (char) and smallest-addressable objects (short char). This > is much better, I think, than a proposal that introduced (long char) > for text characters. Why? If this is the AT&T proposal, it did *not* "introduce (long char) for text characters"; it introduced (long char) for *long* text characters. "char" is still to be used when processing text that does not include long (16-bit) characters. I believe the theory here was that requiring *all* programs that process text ("cat" doesn't count; it doesn't - or, at least, shouldn't - process text) to process them in 16-bit blocks might cut their performance to a degree that customers who would not use the ability to handle Kanji would find unacceptable. I have seen no data to confirm or disprove this. (Changing the meaning of "char" does not directly affect the support of "bitmap display programming" at all. It only affects applications that display things like Asian character sets on bitmap displays, but it doesn't affect them any differently than it affects applications that display them on "conventional" terminals that support those character sets.) > Unfortunately, much existing C code believes that "char" means "byte". > My proposal would allow implementors the freedom to decide whether > supporting this existing practice is more important than the benefits > of making a distinction between the two concepts. Both "short char"/"char" and "char"/"long char" make a distinction between the two concepts; one may have aesthetic objections with the way the latter scheme draws the distinction, but that's another matter. (Is 16 bits enough if you want to give every single character a code of its own?) > It is possible to write code that doesn't depend on sizeof(char)==1, > and some C programmers are already careful about this. It is possible to write *some* code so that it doesn't depend on sizeof(char)==1. Absent a data type one byte long, other code is difficult at best to write this way. > Transition to the more general scheme would occur gradually (if at all) for > existing C implementations, with only implementors of systems for > the Asian market and of bitmap display architectures initially taking > advantage of the opportunity to make these types different sizes. I think "if at all" is appropriate here. There are a *lot* of interfaces that think that "char" is a one-byte data type; e.g., "read", "write", etc.. I see no evidence that converting existing code and data structures to use "short char" would be anything other than highly disruptive. Adding "long char" would permit new programs to be written to support long characters, and permit existing programs to be rewritten to support them, without breaking existing programs; this indicates to me that it would make it much more likely that "long char" would be widely adopted and used than that "short char" would. I see no reason why a proposal that would, quite likely, lead to two different C-language environments existing in parallel for a long time to come is superior to one that would permit environments to add on the ability to handle long characters and thus would make it easier for them to do so and thus more likely that they would. (This is especially true when you consider that most of the programs in question would have to be changed quite a bit to support Asian languages *anyway*; just widening "char" to 16 bits, recompiling them, and linking them with a library with a brand new standard I/O, etc. would barely begin to make them support those languages.) -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
levy@ttrdc.UUCP (Daniel R. Levy) (11/05/86)
In article <5141@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: >X3J11 as it stands requires sizeof(char)==1. I have proposed that >this requirement be removed, to better support applications such as >Asian character sets and bitmap display programming. Along with >this, I proposed a new data type such that sizeof(short char)==1. >It turns out that the current draft proposed standard has to be >changed very little to support this distinction between character >objects (char) and smallest-addressable objects (short char). This >is much better, I think, than a proposal that introduced (long char) >for text characters. > >Unfortunately, much existing C code believes that "char" means "byte". > >It is possible to write code that doesn't depend on sizeof(char)==1, >and some C programmers are already careful about this. A question: what about the jillions of C programs out there which declare "char *malloc()"? Will they all need to be changed? Common sense says no, since malloc() is supposed to return a "maximally aligned" address anyhow, so as far as anyone cares it could be declared float * or double * or short int * or (anything else)* if malloc() in the malloc() code itself were declared the same way. So if "char" happened to be a two byte quantity, no sweat, right? Or was there any particular reason for declaring malloc() to be a "char *"? And thus, might something break in malloc() or the usage thereof if char might no longer be the smallest addressable quantity? -- ------------------------------- Disclaimer: The views contained herein are | dan levy | yvel nad | my own and are not at all those of my em- | an engihacker @ | ployer or the administrator of any computer | at&t computer systems division | upon which I may hack. | skokie, illinois | -------------------------------- Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa, go for it! allegra,ulysses,vax135}!ttrdc!levy
kimcm@olamb.UUCP (Kim Chr. Madsen) (11/06/86)
In article <5141@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
]
] X3J11 as it stands requires sizeof(char)==1. I have proposed that
] this requirement be removed, to better support applications such as
] Asian character sets and bitmap display programming. Along with
] this, I proposed a new data type such that sizeof(short char)==1.
] It turns out that the current draft proposed standard has to be
] changed very little to support this distinction between character
] objects (char) and smallest-addressable objects (short char). This
] is much better, I think, than a proposal that introduced (long char)
] for text characters.
]
] Unfortunately, much existing C code believes that "char" means "byte".
] My proposal would allow implementors the freedom to decide whether
] supporting this existing practice is more important than the benefits
Why not take the full step and let the datatype char be of variable size,
like int's and other types. Then invent the datatype ``byte'' which is exactly
8 bits long.
Do I hear you say it would break existing C-code, well so would the introduction
of ``short char''....
<Kim Chr. Madsen>
mwm@eris.BERKELEY.EDU (Mike (Don't have strength to leave) Meyer) (11/07/86)
In article <126@olamb.UUCP> kimcm@olamb.UUCP (Kim Chr. Madsen) writes: >Why not take the full step and let the datatype char be of variable size, >like int's and other types. Then invent the datatype ``byte'' which is exactly >8 bits long. Ok, so what should those with C compilers on the QM/C (18 bit words, word addressable) or the C/70 (20 bit words, two 10-bit address units per word) do, hmmm? And yes, there are C compilers for those two machines. Not only is all the world not a VAX, it all isn't even addressable in eight-bit units! <mike
gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/07/86)
Guy missed the meaning of my reference to bitmap display programming. What I really care about in this context is support for direct bit addressing. I know for a fact that one reason we don't HAVE this on some current architectures is the lack of access to the facility from high-level languages. I would like it to be POSSIBLE for some designer of an architecture likely to be used for bit-mapped systems to decide to make bits directly addressable. I know I have often wished that I had bit arrays in C when programming bitmap display applications. The 8-bit byte was an arbitrary packaging decision (made by IBM for the System/360 family, by DEC for the PDP-11, and by some others, but definitely not by EVERY vendor). There are already some 9-, 10-, and 12-bit oriented C implementations; I would like to give implementors the OPTION of choosing to use 16-bit (char)s even if their machine can address individual 8-bit bytes or even individual bits. The idea of a "character" is that of an individually manipulable primitive unit of text. The idea of "byte" is that of an individually addressable unit of storage. From one point of view, it doesn't matter what the two basic types would be called if and when this distinction is made in the C language. However, in X3J11 practically everything that now refers to (char) arrays is designed principally for text application, while practically everything that refers to arbitrary storage uses (void *), not (char *). (The one exception is strcoll(), which specifically produces a (char[]) result; Prosser and I discussed this and agreed that this was acceptable for its intended use. In a good implementation using my (char)/(short char) distinction, it would be POSSIBLE to maintain a reasonable default collating sequence for (char)s so that a kludge like strcoll() would not normally be necessary.) Using (long char) for genuine text characters would conflict with existing definitions for text-oriented functions, which is the main reason I decided that (char) is STILL the proper type for text units. I realize that many major vendors in the international UNIX market have already adopted "solutions" to the problem of "international character sets"; however, each has taken a different approach! There is nothing in my proposal to preclude an implementor from continuing to force sizeof(char)==sizeof(short char) and preserving his previous vendor-specific "solution"; however, what I proposed ALLOWS an implementor to choose a much cleaner solution if he so desires, without forcing him to if he prefers other methods, and it also allows nybble- or bit-addressable architectures to be nicely supported at the C language level. The trade-off is between more compact storage (as in AT&T's approach) requiring kludgery to handle individual textual units, versus a clean, simple model of characters and storage cells that supports uncomplicated, straightforward programming. It happens that the text/binary stream distinction of X3J11 fits the corresponding character/byte distinction very nicely. The only wart is for systems like UNIX that allow mixing of text-stream operations, such as scanf(), with binary-stream operations, such as fread(); there is a potential alignment problem in doing this. (By the way, I also propose new functions [f]getsc()/[f]putsc() for getting/putting single (short char)s; this is necessary for the semantic definition of fread()/fwrite() on binary streams. In my original proposal these were called [f]getbyte()/[f]putbyte(), but the new names are better.) ANY C implementation that makes a real distinction between characters and bytes is going to cause problems for people porting their code to it. The choices are, first, whether to ever make such a distinction, and second, if so, how to do so. I believe the distinction is important, and much prefer a clean solution over one that requires programmers to convert text data arrays back and forth, or to keep track of two sets of otherwise identical library functions. As with function prototypes, a transition period can exist during which (char) and (short char) have the same size, which is no worse than the current situation, and implementors could choose when if ever to split these types apart. Please note that there is not much impact of my proposal on current good C coding practice; for example, the following continue to work no matter what choices the C implementor has made: struct foo bar[SIZE], barcpy; unsigned nelements = sizeof bar / sizeof bar[0]; fread( bar, sizeof(struct foo), SIZE, fp ); fread( bar, sizeof bar, 1, fp ); memcpy( &barcpy, &bar[3], sizeof(struct foo) ); /* the above requires casting anyway if prototype not in scope */ char str[] = "text"; printf( "\"%s\" contains %d characters\n", str, strlen( str ) ); While it is POSSIBLE to run into problems, such as in using the result of strlen() as the length of a memcpy() operation, these don't arise so often that it is hopeless to make the transition. One thing for sure, if we don't make the character/byte distinction POSSIBLE in the formal ANSI C standard, it will be too late to do it later. The absolute minimum necessary is to remove the requirement that sizeof(char)==1 from the standard, although this opens up a hole in the spec that needs plugging by a proposal like mine (X3J11/86-136, revised to fit the latest draft proposed standard and to change the names of the primitive byte get/put functions).
gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/07/86)
In article <1294@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes: >A question: what about the jillions of C programs out there which >declare "char *malloc()"? Will they all need to be changed? Common >sense says no, since malloc() is supposed to return a "maximally aligned" >address anyhow, so as far as anyone cares it could be declared float * or >double * or short int * or (anything else)* if malloc() in the malloc() code >itself were declared the same way. So if "char" happened to be a two byte >quantity, no sweat, right? Or was there any particular reason for declaring >malloc() to be a "char *"? And thus, might something break in malloc() or >the usage thereof if char might no longer be the smallest addressable quantity? X3J11 malloc() returns type (void *) anyway, so this is already an issue independently of the multi-byte (char) issue. The answer is, on most machines the old (char *) declaration of malloc() will not result in broken code under X3J11, but it is POSSIBLE that it would break under some X3J11 implementations (one assumes that the implementer will take pains to keep this from happening if at all possible). Under the multi-byte (char) proposal, malloc() still returns (void *) and is not affected at all by the proposal. sizeof() still returns the number of primitive storage cells occupied by a data object, which is still the right information to feed malloc() as a parameter. The X3J11 draft proposed standard as it now stands has actually managed to enforce a rather clean distinction between (char) data and arbitrary data. The additional changes to the draft to introduce a separate data type for the smallest addressable storage unit are really very minor.
gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/07/86)
In article <126@olamb.UUCP> kimcm@olamb.UUCP (Kim Chr. Madsen) writes: >Why not take the full step and let the datatype char be of variable size, >like int's and other types. Then invent the datatype ``byte'' which is exactly >8 bits long. When fully elaborated to address the related issues, this idea differs from what I have proposed in only two fundamental ways: (1) no support for smallest addressable chunk sizes other than 8 bits; (2) introduction of a new keyword, one likely to be in heavy use in existing carefully-written C code.
guy@sun.uucp (Guy Harris) (11/08/86)
> Guy missed the meaning of my reference to bitmap display programming. > What I really care about in this context is support for direct bit > addressing. I am not at all convinced that anybody *should* care about this, at least from the standpoint of bitmap display programming. If a vendor permits you to bang bits on a display, they should provide you with routines to do this; frame buffers are not all the same, and code that works well on one display may not work well at all on another. Furthermore, some hardware may do some bit-banging operations for you; if you approach the display at the right level of abstraction, this can be done transparently, but not if you just write into a bit array. Furthermore, it's not clear that displays should be programmed at the bit-array level anyway; James Gosling and David Rosenthal have made what I consider a very good case against doing this (and no, I don't consider it a good case just because I work at Sun and we're trying to push NeWS). > I know for a fact that one reason we don't HAVE this on some current > architectures is the lack of access to the facility from > high-level languages. If that is the case, then the architect made a mistake. If it's really important, they can extend the language. Yes, this means a non-standard extension; however, the only way to get it to be a standard extension is to get *every* vendor to adopt it, regardless of whether they support bit addressing or not. In the case of C, this means longer-than-32-bit "void *" on lots of *existing* machines; I don't think the chances of this happening are very good at all. > I would like it to be POSSIBLE for some designer of an architecture > likely to be used for bit-mapped systems to decide to make bits directly > addressable. It is ALREADY possible to do this. The architect merely has to avoid thinking "if I can't get at this feature from unextended ANSI C, I shouldn't put it in." The chances are very slim indeed that there will be a standard way to do bit addressing in ANSI C, since this would require ANSI C to mandate that all implementations support it, and would require ANSI C to be rather more different from current C implementations that most vendors would like. > The idea of a "character" is that of an individually manipulable > primitive unit of text. As I've already pointed out, it is quite possible that there may be more than one such notion on a system. > However, in X3J11 practically everything that now refers to (char) > arrays is designed principally for text application, while practically > everything that refers to arbitrary storage uses (void *), not (char *). However, you're now introducing a *third* type; when you are dealing with arbitrary storage, sometimes you use "void *" as a pointer to arbitrary storage and sometimes you use "short char" as an element of arbitrary storage. > In a good implementation using my (char)/(short char) distinction, it > would be POSSIBLE to maintain a reasonable default collating sequence > for (char)s so that a kludge like strcoll() would not normally be > necessary.) This is simply not true, unless the "normally" here is being used as an escape clause to dismiss many natural languages as abnormal. Some languages do *not* sort words with a character-by-character comparison (e.g., German). One *might* give ligatures like "SS" "char" codes of their own - but you'd have to deal with existing documents with two "S"es in them, and you'd either have to convert them "on the fly" in standard I/O (in which case you'd have to have standard I/O know what language the file was in) or convert them *en bloc* when you brought the document over from a system with 8-bit "char"s. (Oh, yes, you'd still have to have standard I/O handle 8-bit and 16-bit "char"s, and conversion between them, unless you propose to make this new whizzy machine require text file conversion when you bring files from or send files to machines with boring obsolete old 8-bit "char"s.) Furthermore, I don't know how you sort words in Oriental languages, although I remember people saying there *is* no unique way of sorting them. > Using (long char) for genuine text characters would conflict with > existing definitions for text-oriented functions, which is the main > reason I decided that (char) is STILL the proper type for text units. If you're going to internationalize an existing program, changing it to use "lstrcpy" instead of "strcpy" is the least of your worries. I see no problem whatsoever with having the existing text-oriented functions handle 8-bit "char"s. Furthermore, since not every implementation that supports large character sets is going to adopt 16-bit "char"s, you're going to need two sets of text-oriented functions in the specification anyway. > The trade-off is between more compact storage (as in AT&T's approach) > requiring kludgery to handle individual textual units, versus a clean, > simple model of characters and storage cells that supports uncomplicated, > straightforward programming. What is this "kludgery"? You need two classes of string manipulation routines. Big Deal. You need to convert some encoded representation in a file to a 16-bit-character representation when you read the file, and convert it back when you write it back. Big Deal. This would presumably be handled by library routines. If you're going to read existing text files without requireing them to be blessed by a conversion utility, you'll have to do that in your scheme as well. You need to remember to properly declare "char" and "long char" variables, and arrays and pointers to same. Big Deal. I am not convinced that the "char"/"long char" scheme is significantly less "clean", "simple", "uncomplicated", or "straightforward" than the "short char"/"char" scheme. > While it is POSSIBLE to run into problems, such as in using the > result of strlen() as the length of a memcpy() operation, these > don't arise so often that it is hopeless to make the transition. Sigh. No, it isn't necessarily HOPELESS; however, you have not provided ANY evidence that the various problems caused by changing the meaning of "char" would be preferable to any disruption to the "clean" models caused by adding "long char". (Frankly, I'd rather keep track of two types of string copy routines and character types than keep track of all the *existing* code that would have to have "char"s changed to "short char".) -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/08/86)
Guy is still missing my point about bitmap display programming; I have NOT been arguing for a GUARANTEED PORTABLE way to handle individual bits, but rather for the ability to do so directly in real C on specific machines/implementations WITH THE FACILITY: typedef short char Pixel; /* one bit for B&W displays */ /* fancy color frame buffers wouldn't use (short char) for this, but an inexpensive "home" model might */ typedef struct { short x, y; } Point; typedef struct { Point origin, corner; } Rectangle; typedef struct { Pixel *base; /* NOT (Word *) */ unsigned width; /* in Bits, not Words */ Rectangle rect; /* obscured-layer chain really goes here */ } Bitmap; /* does this look familiar? */ Direct use of Pixel pointers/arrays tremendously simplifies coding for such applications as "dmdp", where one has to pick up typically six bits at a time from a rectangle for each printer byte being assembled (sometimes none of the six bits are in the same "word", no matter how bits may have been clumped into words by the architect). Now, MC68000 and WE32000 architectures do not support this (except for (short char)s that are multi-bit pixels). But I definitely want the next generation of desktop processors to support bit addressing. I am fully aware that programming at this level of detail is non-portable, but portable graphics programming SUCKS, particularly at the interactive human interface level. Programmers who try that are doing their users a disservice. I say this from the perspective of one who is considered almost obsessively concerned with software portability and who has been the chief designer of spiffy commercial graphic systems (and who currently programs DMDs and REAL frame buffers, not Suns). I'm well aware of the use of packed-bit access macros, thank you. That is exactly what I want to get away from! The BIT is the basic unit of information, not the "byte", and there is nothing particularly sacred about the number 8, either. I agree that if you want to write PORTABLE bit-accessing code, you'll have to use macros or functions, since SOME machines/implementations will not directly support one-bit data objects. That wasn't my concern. Due to all the confusion, I'm recapitulating my proposal briefly: ESSENTIAL: (1) New type: (short char), signedness as for (char). (2) sizeof(short char) == 1. (3) sizeof(char) >= sizeof(short char). (4) Clean up wording slightly to improve the byte (storage cell) vs. character distinction. RECOMMENDED: (5) Fix character \-escapes so that larger numeric values are permitted in character/string constants on implementations where that is needed. The current 9/12 bit limit is a botch anyway. (6) Text streams read/write/seek (char)s, and binary streams read/write/seek (short char)s. This requires addition of fgetsc(), fputsc(), which are routines I think most system programmers have already invented under names like get_byte(). (7) Add `b' size modifier for fscanf(). I've previously pointed out that this has very little impact on most existing code, although I do know of exceptions. (Actually, until the code is ported to a sizeof(short char) != sizeof(char) environment, it wouldn't break in this regard. That port is likely to be a painful one in any case, since it would probably be to a multi-byte character environment, and SOMEthing would have to be done anyway. The changes necessary to accommodate this are generally fewer and simpler under my proposal than under a (long char)/lstrcpy() approach.) As to whether I think that mapping to/from 16-bit (char) would be done by the I/O support system rather than the application code, my answer is: Absolutely! That's where it belongs. (AT&T has said this too, on at least one occasion, taking it even so far as to suggest that the device driver should be doing this. I assume they meant a STREAMS module.) I won't bother responding in detail on other points, such as use of reasonable default "DP shop" collating sequences analogous to ASCII without having to pack/unpack multi-byte strings. (Yes, it's true that machine collating sequence isn't always appropriate -- but does that mean that one never encounters computer output that IS ordered by internal collating sequence? Also note that strcoll() amounts to a declaration that there IS a natural multibyte collating sequence for any single environment.) Instead I will simply assure you that I have indeed thought about all those things (and more), have read the literature, have talked with people working on internationalization, and have even been in internationalization working groups. I spent the seven hours driving back from the Raleigh X3J11 meeting analyzing why people were finding these issues so complex, and discovered that much of it was due to the unquestioned assumption that "16-bit" text had to be considered as made of individual 8-bit (char)s. If one starts to write out a BNF grammar for what text IS, it becomes obvious very quickly that that is an unnatural constraint. Before glibly dismissing this as not well thought out, give it a genuine try and see what it is like for actual programming; then try ANY alternative approach and see how IT works in practice. If you prefer, don't consider my proposal as a panacea for such issues, but rather as a simple extension that permits some implementers to choose comparatively straightforward solutions while leaving all others no worse off than before (proof: if one were to decide to make sizeof(char) == sizeof(short char), that is precisely where we are now.) What I DON'T want to see is a klutzy solution FORCED on all implementers, which is what standardizing a bunch of simultaneous (long char) and (char) string routines (lstrcpy(), etc.) would amount to. If vendors think it is necessary to take the (long char) approach, the door is still open for them to do so under my proposal (without X3J11's blessing), but vendors who really don't care about 16-bit chars (yes, there are vendors like that!) are not forced to provide that extra baggage in their libraries and documentation. The fact that more future CPU architectures may support tiny data types directly in standard C than at present is an extra benefit from my approach to the "multi-byte character" problem; it wasn't my original motivation, but I'm happy that it turned out that way. (You can bet that (short char) would be heavily used for Boolean arrays, for example, if my proposal makes it into the standard; device-specific bitmap display programming is by no means the only application that could benefit from availability of a shorter type. I've seen many people #define TINY for nybble-sized quantities, usually having to use a larger size (e.g., (char)) than they really wanted.) From the resistance he's been putting up, I doubt that I will convert Guy to my point of view, and I'm fairly sure that many people who have already settled on some strategy to address the multi-byte character issue are not eager to back out the work they've already put into it. However, since I've shown that a clean conceptual model for such text IS workable, there's no excuse for continued claims that explicit byte-packing and unpacking is the only way to go.