[net.lang.c] sizeof

notes@ucbcad.UUCP (10/26/83)

#N:ucbesvax:4800029:000:459
ucbesvax!turner    Oct 26 05:51:00 1983

	I have a question about enum types.  What size are they?
Ritchie says that his compiler treats them as ints.  But what about
pcc?  Are they sizeof int or sizeof char *?  The latter would be
preferable to me, since I am using enum's to hide pointers from the
users of a package; for this to work across all machines, enum
types must be (at least) as large as the largest possible pointer-
size.

Thanks in advance,
    Michael Turner (ucbvax!ucbesvax.turner)

mjs@rabbit.UUCP (10/27/83)

The type of an enum is int, though some compilers may shorten that to
short or char if the enumerated values fit.
-- 
	Marty Shannon
UUCP:	{alice,rabbit,research}!mjs
Phone:	201-582-3199

mrm@datagen.UUCP (06/12/84)

The X3J11 draft that is being worked on this week (6/11 -- 6/15) will state
that:

        sizeof expression

does not cause any side-effects to occur.

	Michael Meissner
	Data General Corporation
	...{ allegra, ihpn4, rocky2, decvax!ittvax }!datagen!mrm

cottrell@nbs-vms.ARPA (01/19/85)

 K&R page 192 first paragraph:

   "The compilers currently allow a pointer to be assigned to an integer,
an integer to a pointer, and a pointer to a pointer of another type.
THE ASSIGNMENT IS A PURE COPY OPERATION, WITH NO CONVERSION. This usage is
nonportable, and may produce pointers which cause addressing exceptions
when used. However, it is guaranteed that the assignment of the constant
0 to a pointer will produce a null pointer distinguishable from a pointer
to any object."

This says to me that the sizes must be the same. Changing the size is
a conversion in my eye. I believe you when you say that there are compilers
in wide use that do this, but I have heard lots of weird stuff about
what someone's compiler does. Brain damage is everywhere!
*/

guy@rlgvax.UUCP (Guy Harris) (01/19/85)

>  K&R page 192 first paragraph:
> 
>    "The compilers currently allow a pointer to be assigned to an integer,
> an integer to a pointer, and a pointer to a pointer of another type.
> THE ASSIGNMENT IS A PURE COPY OPERATION, WITH NO CONVERSION. ...
> 
> This says to me that the sizes must be the same. Changing the size is
> a conversion in my eye. ...

Under "Explicit pointer conversions", p. 210:

	A pointer may be converted *to any of the integral types
	large enough to hold it.  Whether an "int" or "long" is required
	is machine dependent.*  ("Italics" mine.)

Note that "integer" does not mean "int".  "4. What's in a name", last
paragraph, p. 182:

	Up to three sizes of integer, declared "short int", "int", and
	"long int", are available.

So what they meant to say on p. 192 was that a pointer may be assigned
to an integer large enough to hold it.  On some machines, "int" may not
be large enough to hold a pointer, and "long int" is the only integer
to which a pointer may be assigned.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

thomas@utah-gr.UUCP (Spencer W. Thomas) (01/20/85)

In article <7527@brl-tgr.ARPA> cottrell@nbs-vms.ARPA writes:
>
> K&R page 192 first paragraph:
>
>   "The compilers currently allow a pointer to be assigned to an integer,
		   *********
>an integer to a pointer, and a pointer to a pointer of another type.
>THE ASSIGNMENT IS A PURE COPY OPERATION, WITH NO CONVERSION. This usage is
							      **** ***** **
>nonportable, and may produce pointers which cause addressing exceptions
 ***********
>
>This says to me that the sizes must be the same. Changing the size is
>a conversion in my eye. 

Note the words I have underlined above.  Nowhere in this paragraph does
it say that this is a feature which a C compiler MUST have, only that
this is a feature of CURRENT compilers.

-- 
=Spencer
	({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA)
		<<< Silly quote of the week >>>

jim@ISM780B.UUCP (01/21/85)

> K&R page 192 first paragraph:
>
>   "The compilers currently allow a pointer to be assigned to an integer,
>an integer to a pointer, and a pointer to a pointer of another type.
>THE ASSIGNMENT IS A PURE COPY OPERATION, WITH NO CONVERSION. This usage is
>nonportable, and may produce pointers which cause addressing exceptions
>when used. However, it is guaranteed that the assignment of the constant
>0 to a pointer will produce a null pointer distinguishable from a pointer
>to any object."

K&R is obsolete!  K&R is obsolete!  K&R is obsolete!  K&R is obsolete!
K&R is obsolete!  K&R is obsolete!  K&R is obsolete!  K&R is obsolete!
K&R is obsolete!  K&R is obsolete!  K&R is obsolete!  K&R is obsolete!

It is old, it is outdated, it is wrong.
Those compilers that allow structure assignment, separate name spaces
for structure members, and enums, that is, the compilers that implement
the real C language, the one described by the C reference manual that
AT&T distributes, and the one on which the ANSI standard is being based,
generate a warning when such assignments occur.  The C reference manual
does not allow them; values must be explicitly cast before being assigned.

>This says to me that the sizes must be the same. Changing the size is
>a conversion in my eye.

Have you seen an ophthalmologist lately?  :-)
Even given K&R, with its cavalier approach toward formal specification,
this is not a reasonable interpretation, because the above paragraph says
"integer", not "int", and a pointer can't be the same size as a char,
a short, an int, and a long all at the same time.

-- Jim Balter, INTERACTIVE Systems (ima!jim)

peterc@ecr.UUCP (Peter Curran) (01/22/85)

Of course, C SHOULD be defined to allow sizeof(int) != sizeof(int *).
However, due to one point in the Reference Manual, and K&R (and, I assume,
the standard, although I haven't checked), they are actually required to
be equal.  The problem is that "0" is defined to be the Null pointer
constant.  When "0" is passed as a parameter to a function, the compiler
cannot tell whether an int or an int * is intended. The effect of this is
that sizeof(int) must equal sizeof(int *), and even more, the value of the
Null address constant must be bit-for-bit identical to the value of ((int) 0).

Of course, many compilers do not conform to this requirement.  The problem
can be avoided by, for example, always using (say) NULL as the Null
address constant, where NULL is #defined as something like ((char *) 0).
Doing so conforms to the Reference Manual, but not doing so also conforms
(and of course many, if not most, C programs don't follow this practice).

The real solution, of course, would be to introduce a new keyword, say "null",
which represents the Null address constant, with an implementation-
defined value.  However, I doubt that that will ever come about.

Anyone who has made much effort at porting C code has no doubt encountered
this problem, but I don't think it is as well known as it should be.

crandell@ut-sally.UUCP (Jim Crandell) (01/23/85)

> Even given K&R, with its cavalier approach toward formal specification,
> this is not a reasonable interpretation, because the above paragraph says
> "integer", not "int", and a pointer can't be the same size as a char,
> a short, an int, and a long all at the same time.

Unless, of course, char, short, int and long int are all the same, which
K&R also fails to proscribe.
-- 

    Jim Crandell, C. S. Dept., The University of Texas at Austin
               {ihnp4,seismo,ctvax}!ut-sally!crandell

quenton@ecr.UUCP (Dennis Smith) (01/23/85)

The problem of passing 0 for a null pointer (as a parameter), and
the solution of   "#define NULL ...", as pointed out by P.Curran,
is valid.  However, the use of -
  #define NULL ((char *)0)
although portable will cause many compilers to complain about
differing pointer types, and will also cause lint to generate many
additional useless messages.  The only generally useable solution
that I know of is -
  #define NULL 0	/** when sizeof(xxx *) == sizeof(int) **/
  #define NULL 0L       /** when sizeof(xxx *) == sizeof(long) **/
This unfortunately means that the "define" must be changed whenever
the target machine/compiler/environment changes.

One possible solution for the future could be the use of -
  #define NULL ((void *)0)
which seems compatable with the notation of (void *) being a generic
pointer type.

It might also be noted, although I have had no experience with them,
some compilers for certain older generations of computers, generate
pointers of differing sizes.  This occurs when the machine is not
byte addressable, so that a pointer to a word aligned item might
be "n" bits long, but a pointer to a character must point to the
word and also indicate which character within the word.
This would make the even more disastrous situation of
  sizeof(char *) != sizeof(int *)
making the defintion of something like NULL even more incomprehensible.

chris@umcp-cs.UUCP (Chris Torek) (01/27/85)

> [...] C SHOULD be defined to allow sizeof(int) != sizeof(int *).
> However, due to one point in the Reference Manual, [...] they are
> actually required to be equal.  The problem is that "0" is defined to
> be the Null pointer constant.  When "0" is passed as a parameter to a
> function, the compiler cannot tell whether an int or an int * is
> intended. The effect of this is that sizeof(int) must equal
> sizeof(int *), and even more, the value of the Null address constant
> must be bit-for-bit identical to the value of ((int) 0).

NO! NO! and NO!

[please turn your volume control way up]  PASSING AN UNCASTED ZERO
TO A ROUTINE THAT EXPECTS A POINTER IS NOT PORTABLE, AND IS JUST PLAIN
WRONG.  GET THAT STRAIGHT *NOW*!

[you can turn your volume control back down]

The following code is NOT portable and probably fails on half the
existing implementations of C:

	#define NULL 0		/* this from <stdio.h> */

	f() {
		g(NULL);
	}

	g(p) int *p; {
		if (p == NULL)
			do_A();
		else
			do_B();
	}

The value ``f'' passes to ``g'' is the integer zero.  What that
represents inside g is completely undefined.  It is not the nil
pointer, unless your compiler just happens to work that way (not
uncommon but not universal).  It may not even be the same size (in
bits or bytes or decidigits or whatever your hardware uses).

One tiny little simple change fixes it:

	f() {
		g((int *)NULL);
	}

It is now portable, and all that good stuff.  You can write the
first and hope real hard, or you can write the second and know.

The point is that the zero value and the nil pointer are two completely
different things, and the compiler happens to be obliged to convert the
former to the latter in expressions where this is forced (e.g., casts
or comparison with another pointer).  It is NOT forced in function
calls (though under the ANSI standard it would be in some cases).
(I claim that it IS forced in expressions such as if (p) where p is a
pointer; this is "if (p != 0)" where type-matching p and 0 forces the
conversion.)

(Now I WILL agree that if you have the option of making the nil pointer
and the zero bit pattern the same, then you will have less trouble with
existing programs if you do....)
-- 
(This line accidently left nonblank.)

In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (01/27/85)

Bogus, bogus.

sizeof (int) is not required to be the same as sizeof (int *).
sizeof (int *) is also not necessarily the same as sizeof (char *).
0 is not the same as (char *)0.

guy@rlgvax.UUCP (Guy Harris) (01/28/85)

> Of course, C SHOULD be defined to allow sizeof(int) != sizeof(int *).
> However, due to one point in the Reference Manual, and K&R (and, I assume,
> the standard, although I haven't checked), they are actually required to
> be equal.  The problem is that "0" is defined to be the Null pointer
> constant.

This has been said about 1.0E6 times before, but "0" is NOT defined to be
the null pointer constant.  K&R says:

	...it is guaranteed that *ASSIGNMENT* (emphasis mine) of the
	constant 0 to a pointer will produce a null pointer distinguishable
	from a pointer to any object.

Passing something as an argument doesn't behave like an assignment, precisely
because the compiler can't perform the proper type coercions.  You have
to do so yourself; if you want to pass an "int" to a routine that expects
a "double", you have to say "(double) foo", whereas if you wanted to assign
that int to a double you could omit the case and the compiler would perform
the coercion for you.  The same holds for passing 0 to a routine expecting
a pointer and assigning 0 to a pointer.  You have to cast it.

"lint" complains, and rightly so, if you pass an object of one type to
a routine which expects an object of another type.  It's *VERY EASY* to
catch this kind of problem - just run "lint" every once in a while.

We support 16-bit "int"s and 32-bit pointers on our machines; if this
causes somebody's code to have problems because it says things like

	execl("/usr/local/foo", "foo", "bar", 0);

instead of

	execl("/usr/local/foo", "foo", "bar", (char *)0);

it's because the code is incorrect.  Period.

> Of course, many compilers do not conform to this requirement.

It's not a requirement, so compilers don't have to conform.

> The problem can be avoided by, for example, always using (say) NULL as
> the Null address constant, where NULL is #defined as something like
> ((char *) 0).

OK.  Everybody take a deep breath and repeat after me:

THER IS NO ONE NULL ADDRESS CONSTANT IN C!

There is no such thing as a generic "pointer" in C.  There are pointers to
characters, pointers to "int"s, pointers to "struct proc", pointers to...
As such, there is no such thing as a generic null pointer.  There are null
pointers to no character, null pointers to no "int", etc..  As such, the
problem should not be avoided by the above trick.  "lint" will complain,
and the code will choke on implementations where "char *" and "int *" have
different representations (a word-addressed machine where a byte pointer
would take more bits than fit in a word pointer REQUIRES such an
implementation).

> The real solution, of course, would be to introduce a new keyword, say "null",
> which represents the Null address constant, with an implementation-
> defined value.  However, I doubt that that will ever come about.

Let's hope not.  It is an incorrect solution for the very reason mentioned
above.  The clses thing to a correct solution is the introduction of
declarations of functions that declare the argument types; thus, the
compiler could perform the necessary coercions just as it can do so in
expressions.

> Anyone who has made much effort at porting C code has no doubt encountered
> this problem, but I don't think it is as well known as it should be.

Anyone who has made much effort at porting C code has encountered lots of
problems, all too many of which are due to people misusing the language.
Many of those can be avoided by using "lint".  Go forth and do so.

Let's hope this kills this discussion off until the next time it shows up
(which will probably be in another couple of months - it keeps returning
like a bad penny).

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

g-frank@gumby.UUCP (01/28/85)

> Anyone who has made much effort at porting C code has encountered lots of
> problems, all too many of which are due to people misusing the language.
> Many of those can be avoided by using "lint".  Go forth and do so.
> 

   The whole point of languages where the compiler does strong type checking
is that no one gets to misuse the language, at least without making a conscious
effort to do so.  As long as it is easier to avoid a cast than to use one,
and the compiler doesn't complain, lazy or rushed or habit-bound programmers
will do so.

   With regard to lint:

   1) Most people working in a Unix environment never use it, because they
      don't have to.

   2) I have been desperately searching for an implementation for my own
      programming environment (PC-DOS and QNX on the IBM PC), thus far
      without any luck.  It just doesn't seem to be very available in any
      but orthodox Unix systems.  This should say something about the great
      esteem in which the C programming community holds lint.

   Human nature being what it is, "go forth and use lint" should get approx-
imately the same enthusiastic response as "go forth and sin no more."



-- 
      Dan Frank

	"good news is just life's way of keeping you off balance."

mwm@ucbtopaz.CC.Berkeley.ARPA (01/29/85)

In article <347@ecr.UUCP> peterc@ecr.UUCP (Peter Curran) writes:
>Of course, C SHOULD be defined to allow sizeof(int) != sizeof(int *).
>However, due to one point in the Reference Manual, and K&R (and, I assume,
>the standard, although I haven't checked), they are actually required to
>be equal.  The problem is that "0" is defined to be the Null pointer
>constant.

Sorry, but that's not quite right. Quoting K&R, page 192, first paragraph,
last sentence:

	However, it is guaranteed that assignment of the constant
	0 to a pointer will produce a null pointer distinguishable
	from a pointer to any object.

In other words, "0" is not the null pointer constant, but coerces to it
on assignment to a pointer.

>	   When "0" is passed as a parameter to a function, the compiler
>cannot tell whether an int or an int * is intended.

Yup. That why you need to cast NULL parameters to the right type. Not doing
the cast is a bug that will work on some machines, but not on all machines.

>						      The effect of this is
>that sizeof(int) must equal sizeof(int *), and even more, the value of the
>Null address constant must be bit-for-bit identical to the value of ((int) 0).

No. The effect is that the null address constant of type (type) must be
bit-for-bit identical to ((type *) 0). ((int) 0) and ((type *) 0) don't
even have to be the same size.

>Of course, many compilers do not conform to this requirement.  The problem
>can be avoided by, for example, always using (say) NULL as the Null
>address constant, where NULL is #defined as something like ((char *) 0).

I've done that, but it's a kludge. The code will still be buggy, and the
bugs will manifest themselves on any machine where (sizeof (char *)) !=
(sizeof (int *)) != (sizeof (struct gort *)). This is one of the reasons
adding the parameters to the declaration of an external (or
forward-referenced) function.

>The real solution, of course, would be to introduce a new keyword, say "null",
>which represents the Null address constant, with an implementation-
>defined value.  However, I doubt that that will ever come about.

Sounds like a good idea to me. Trouble is, you still have the problem of
figureing out which "null" to pass to an external procedure.

	<mike

mjl@ritcv.UUCP (Mike Lutz) (01/29/85)

> OK.  Everybody take a deep breath and repeat after me:
> 
> THER IS NO ONE NULL ADDRESS CONSTANT IN C!
> 
> There is no such thing as a generic "pointer" in C.

Guy's right, of course.  For those who want null pointers of various
types, might I suggest the following macro:

/*
 * Make a Null Pointer for objects of type 't'
 */

#define NullPtr(t)	( ((t) *) 0 )

This permits code like:

	execl( "/bin/foo", "foo", "bar", NullPtr(char) ) ;

This can be quickly fixed for those who object to creating pointers to
objects of type 't' but want Null pointers for (pointer) type t.

With a bit of imagination, you can create macros to allocate & free
objects of type 't' in a type safe fashion, using malloc/calloc/free.
-- 
Mike Lutz	Rochester Institute of Technology, Rochester NY
UUCP:		{allegra,seismo}!rochester!ritcv!mjl
CSNET:		mjl%rit@csnet-relay.ARPA

Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (01/29/85)

"lint" is invaluable and we use it frequently in our application
development work.  It picks up many common mistakes, some of which
are unintentional and some of which are due to misunderstandings.
The rule in my team is "make the code pass lint completely or else
explain why it can't possibly".

friesen@psivax.UUCP (Stanley Friesen) (01/29/85)

In article <351@ecr.UUCP> quenton@ecr.UUCP (Dennis Smith) writes:
>It might also be noted, although I have had no experience with them,
>some compilers for certain older generations of computers, generate
>pointers of differing sizes.  This occurs when the machine is not
>byte addressable, so that a pointer to a word aligned item might
>be "n" bits long, but a pointer to a character must point to the
>word and also indicate which character within the word.
>This would make the even more disastrous situation of
>  sizeof(char *) != sizeof(int *)
>making the defintion of something like NULL even more incomprehensible.

	And not only "older" computers, the current Honeywell
mainframe has an architecture which works like this.  Of course
that is because they decided to maintain code compatibility with
their old 600 series from the mid-60s.
-- 

				Sarima (Stanley Friesen)

{trwrb|allegra|cbosgd|hplabs|ihnp4|aero!uscvax!akgua}!sdcrdcf!psivax!friesen
 or
quad1!psivax!friesen

guy@rlgvax.UUCP (Guy Harris) (01/30/85)

> The problem of passing 0 for a null pointer (as a parameter), and
> the solution of   "#define NULL ...", as pointed out by P.Curran,
> is valid.

As you state below, there are machines where sizeof(char *) != sizeof(int *).
This solution (#define NULL ((char *)0)) is NOT valid on those machines.

> The only generally useable solution
> that I know of is -
>   #define NULL 0	/** when sizeof(xxx *) == sizeof(int) **/
>   #define NULL 0L       /** when sizeof(xxx *) == sizeof(long) **/

This isn't a solution.  There may be machines in which the bit pattern
that (char *)0 represents is something other than N zeros, where N is
the number of bits in an "int" or a "long int".

> One possible solution for the future could be the use of -
>   #define NULL ((void *)0)
> which seems compatable with the notation of (void *) being a generic
> pointer type.

This won't work either.  If a routine expects an "int *", dammit, it
expects an "int *", not an "int", not a "long int", not a "char *", and
not a "void *".  What is so d*mn difficult about putting in pointer casts?
It's second nature to me now, and has been for several years - dating
back to PDP-11 days when it wasn't a problem.

Don't think of C as structured assembler, where you "know" what's
"really happening".  Use it as a typed language, albeit with weak type
checking.  Trust me, you'll be happier for doing so.

Can we put this discussion to bed now, with the conclusion that the only
correct solution to the problem, pending ANSI Standard C with the ability
to import the declaration of the arguments to a routine, is to put the
****** pointer casts in?

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

jack@vu44.UUCP (Jack Jansen) (01/30/85)

I while ago, I tried something that I was almost sure would
fail (which it did), being:

#if sizeof(struct foobar) != BLKSIZ

I *know* why this fails, but still the most recent definition
I saw of #if is
#if <constant-expression>
which includes sizeof().
Does anyone know whether the new standard has changed this, or
changed the definition of constant-expression not to include
sizeof()?
Or is everyone supposed to integrate the preprocessor into the
compiler (yuck)?
-- 
	Jack Jansen, {seismo|philabs|decvax}!mcvax!vu44!jack
	or				       ...!vu44!htsa!jack
Help! How do I make a cup of tea while working with an acoustic modem?

cottrell@nbs-vms.ARPA (02/01/85)

/*
Bob Larson has a machine with 48 bit ptr's & 32 bit int's & long's.
What is this beast? Someone also said that ptr's to different objex
may be different sizes. Where & why? I realize that a certain machine
may desire this to make implementation as efficient as possible, but I
think the designers should just bite the bullet and make all ptr's
the same size. The machine is probably brain damaged anyway. Any
machine not byte addressable is an inhospitable host for C at best.
As I said before, my model is the pdp-11/vax architecture. The 68000
and 32032 fall into this category. I really don't care if my code
is not portable to some weird architecture I have never seen or
do not wish to see again (u1108).
*/

gam@amdahl.UUCP (gam) (02/01/85)

> > Anyone who has made much effort at porting C code has encountered lots of
> > problems, all too many of which are due to people misusing the language.
> > Many of those can be avoided by using "lint".  Go forth and do so.
> > 
>    With regard to lint:
> 
>    1) Most people working in a Unix environment never use it, because they
>       don't have to.
>    Human nature being what it is, "go forth and use lint" should get approx-
> imately the same enthusiastic response as "go forth and sin no more."

Lint is widely used here.  On more than one occasion a casual misuse
of pointers or arrays were pointed out by lint.  Also lint gets almost
as much attention as the C compiler, as far as program maintanence
is concerned.  And porting programs from other systems would be
a painful task without lint.

People here aren't using lint because it is a good upright moral
thing to do; they use it because it helps to solve problems.  That's
what a good tool is for.
-- 
Gordon A. Moffett		...!{ihnp4,hplabs,sun}!amdahl!gam

guy@rlgvax.UUCP (Guy Harris) (02/02/85)

> Someone also said that ptr's to different objex may be different sizes.
> Where & why?

Where:

On a Zilog 8000 running in segmented mode (24-bit pointers).
On a Motorola 68000.
On an Intel 802*8[68] running large-model code.

Just because a machine supports large pointers doesn't mean that it
supports 32-bit arithmetic well.  The Z8000 probably does 32-bit
arithmetic 16 bits at a time.  The 68000 definitely does, and can't do
32-bit multiplies or divides conveniently at all.

Why:

Because it doesn't say anywhere that you can't.  Because you may not want
to pay the penalty for 32-bit arithmetic.

> I realize that a certain machine may desire this to make implementation
> as efficient as possible, but I think the designers should just bite
> the bullet and make all ptr's the same size.

On this point, I agree.  16-bit "int"s make techniques like using "malloc"
and "realloc" to grow a large table (used by such obscure programs as "nm")
lose big.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

henry@utzoo.UUCP (Henry Spencer) (02/03/85)

>    With regard to lint:
> 
>    1) Most people working in a Unix environment never use it, because they
>       don't have to.

Most people working in a Unix environment write cruddy code as a result.
Those of us with sense use lint at the drop of a bit, and delint other
people's code routinely [SIGH].

>    2) I have been desperately searching for an implementation for my own
>       programming environment (PC-DOS and QNX on the IBM PC), thus far
>       without any luck.  It just doesn't seem to be very available in any
>       but orthodox Unix systems.  This should say something about the great
>       esteem in which the C programming community holds lint.

Not quite true.  The relevant issue is not the value of lint, but the
total lack of any sort of public specifications for it.  The only way to
find out what lint does is to read the AT&T code, after which writing
one of your own is legally tricky.

>    Human nature being what it is, "go forth and use lint" should get approx-
> imately the same enthusiastic response as "go forth and sin no more."

Generally, that's about the sort of response it gets from sloppy coders.
Those who take the advice to heart, generally come to appreciate its value.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

ksbszabo@wateng.UUCP (Kevin Szabo) (02/03/85)

As Guy says, sizeof( int ) != sizeof( int *).

However, a lot of code depends on the two sizes being the same
(unfortunately). We have a 68k beast with a Micrsoft port of 
systemIII. On this machine an Int is 32 bits. Isn't an int supposed
to be the natural word size for a machine? Is 32 bits a natural
size for a 68000? I guess it will be for the 68k which have
32 bit busses, but what about the machine with the 16 bit data bus?

I have a feeling that the integer size was picked more for porting
convenience than anything else, of course I have been wrong many times
before.
-- 
Kevin Szabo  watmath!wateng!ksbszabo (U of Waterloo VLSI Group, Waterloo Ont.)

friesen@psivax.UUCP (Stanley Friesen) (02/04/85)

In article <7904@brl-tgr.ARPA> cottrell@nbs-vms.ARPA writes:
>/*
>Bob Larson has a machine with 48 bit ptr's & 32 bit int's & long's.
>What is this beast? Someone also said that ptr's to different objex
>may be different sizes. Where & why? I realize that a certain machine
>may desire this to make implementation as efficient as possible, but I
>think the designers should just bite the bullet and make all ptr's
>the same size.

	I dont't know about Bob Larsons's machine, but I used to use
a Honeywell mainframe.  The architecture dates back to the 60's,
and Honeywell has kept it around in the name of upward compatibility.
The machine is a *word* oriented machine with a 36 bit word(yes 36,
not 32), each instruction is one word in length, and contains 1-1/2
addresses, a memory address and a register specifier.   A word pointer
is simply a word containing a memory address in the same bits holding
one in an instruction word, the last 17 bits of the word.  Such a word
can be used with any normal indirection mode(there are three on these
machines). If you want byte addressing(9-bit bytes *or* 6-bit bytes),
you must use a special "tagged" indirect mode in which the indirect
word(the pointer) contains a normal address plus a special field
specifying the byte within the destination word. Because of the
way "tagged" indirection works it may be necessary to make a *copy*
of the pointer to use for dereferencing if you intend to re-use it.!!
This is what I call brain-damaged, but it is *real*, and a Honeywell
C-compiler must put up with different types of pointers for ints and
chars, *or* make chars 36 bits.
-- 

				Sarima (Stanley Friesen)

{trwrb|allegra|cbosgd|hplabs|ihnp4|aero!uscvax!akgua}!sdcrdcf!psivax!friesen
 or
quad1!psivax!friesen

cottrell@nbs-vms.ARPA (02/05/85)

/*
> > Someone also said that ptr's to different objex may be different sizes.
> > Where & why?
> 
> Where:
> 
> On a Zilog 8000 running in segmented mode (24-bit pointers).
> On a Motorola 68000.
> On an Intel 802*8[68] running large-model code.
> Just because a machine supports large pointers doesn't mean that it
> supports 32-bit arithmetic well.  The Z8000 probably does 32-bit
> arithmetic 16 bits at a time.  The 68000 definitely does, and can't do
> 32-bit multiplies or divides conveniently at all.

The 68000 only uses 24 bits for addressing, but uses them either
1) as a 32 bit item in the instruxion stream, & 2) in a 32 bit register.
While it would be possible for an implementor to use only 3 bytes, the 
space saved would be offset by the overhead in loading into a four byte
register & masking. The 24 bit restrixion is only temporary anyway.
Future versions will probably allow 32 bits. I think the SUN mmu
axually uses these bits. I think the z8000 has 32 bit regs too.

> Why:
> 
> Because it doesn't say anywhere that you can't.  Because you may not want
> to pay the penalty for 32-bit arithmetic.

If you have a machine with an address space > 64k, you probably have
32 bit registers.

> > I realize that a certain machine may desire this to make implementation
> > as efficient as possible, but I think the designers should just bite
> > the bullet and make all ptr's the same size.
> 
> On this point, I agree.  16-bit "int"s make techniques like using "malloc"
> and "realloc" to grow a large table (used by such obscure programs as "nm")
> lose big.

I just read a news item from yourself which stated:
	"THERE IS NO SUCH THING AS A GENERIC NULL POINTER"
Presumably because of different length pointers. Which way do you want it?
> 
> 	Guy Harris
> 	{seismo,ihnp4,allegra}!rlgvax!guy
*/

Doug Gwyn (VLD/VMB) <gwyn@BRL-VLD.ARPA> (02/05/85)

Does anybody other than Cottrell have difficulty coping
with various sizes of pointers?

henry@utzoo.UUCP (Henry Spencer) (02/05/85)

> Does anyone know whether the new standard has ...
> changed the definition of constant-expression not to include
> sizeof()?

Recent drafts of the ANSI standard outlaw sizeof() in #if.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (02/05/85)

> I have a feeling that [this] integer size [32 bits on 68000] was picked
> more for porting convenience than anything else...

Very probably.  While rules like "don't assume pointers and integers
are the same size" and "don't assume *(char *)0 == '\0'" are good
advice for writing new code, an unfortunate amount of old code breaks
them.  You get a choice of having to fix it all, or arranging for the
dubious assumptions to remain true.  For obvious reasons, many people
with a product to get out the door have taken the latter approach.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

henry@utzoo.UUCP (Henry Spencer) (02/05/85)

> ... Someone also said that ptr's to different objex
> may be different sizes. Where & why? I realize that a certain machine
> may desire this to make implementation as efficient as possible, but I
> think the designers should just bite the bullet and make all ptr's
> the same size. The machine is probably brain damaged anyway. Any
> machine not byte addressable is an inhospitable host for C at best.

Quite true, but often a poor C implementation is better than none.
The idea is to make it no poorer than you have to.  This often means
that whatever kludges are needed for "char *" really shouldn't have
to be applied to all pointers.

> As I said before, my model is the pdp-11/vax architecture. The 68000
> and 32032 fall into this category. I really don't care if my code
> is not portable to some weird architecture I have never seen or
> do not wish to see again (u1108).

I'm afraid what you are really saying is that you don't really care
about portability at all.  "If the machine is very similar to mine,
maybe it'll run; if not, tough."  I understand but can't sympathize.
It's not that much harder to do it right.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

breuel@harvard.ARPA (Thomas M. Breuel) (02/05/85)

> Does anybody other than Cottrell have difficulty coping
> with various sizes of pointers?

Yes, I do, and looking at UN*X source code for the VAX and the PDP
I would believe that most of Berkley's programmers/students would
also have problems.

Of course, you can always work around sizeof(char *)!=sizeof(int *)
(or sizeof(char *)!=sizeof(int)), but often it is a hassle, and
it makes porting old (4.2BSD :-) source code very difficult.

					Thomas.

mark@tove.UUCP (Mark Weiser) (02/05/85)

In article <1071@amdahl.UUCP> gam@amdahl.UUCP (gam) writes:
>> > Anyone who has made much effort at porting C code has encountered lots of
>> > problems, all too many of which are due to people misusing the language.
>> > Many of those can be avoided by using "lint".  Go forth and do so.
>> > 
>>    With regard to lint:
>> 
>>    1) Most people working in a Unix environment never use it, because they
>>       don't have to.
>>    Human nature being what it is, "go forth and use lint" should get approx-
>> imately the same enthusiastic response as "go forth and sin no more."
>
>Lint is widely used here....porting programs from other systems would be
>a painful task without lint.
>

We have Pyramid's and Vaxes as our main machines.  If your code passes
lint, it is likely to run both, in spite of the fact that the
stack is used differently, the byte orders are reversed, and they
require different word alignments when acessing structures.  Lint
is handy
-- 
Spoken: Mark Weiser 	ARPA:	mark@maryland	Phone: +1-301-454-7817
CSNet:	mark@umcp-cs 	UUCP:	{seismo,allegra}!umcp-cs!mark
USPS: Computer Science Dept., University of Maryland, College Park, MD 20742

guy@rlgvax.UUCP (Guy Harris) (02/05/85)

> The 68000 only uses 24 bits for addressing, but uses them either
> 1) as a 32 bit item in the instruxion stream, & 2) in a 32 bit register.

10 points for originality, but that's NOT why I said a Motorola 68000-based
machine may have sizeof (int) != sizeof (int *).  The reason why they
may (and do - our current ones do; it's a pain in the *ss, and I prefer
32-bit "int"s, but what we did was not illegal, immoral, or fattening)
be different is:
> > 
> > Because it doesn't say anywhere that you can't.  Because you may not want
> > to pay the penalty for 32-bit arithmetic.
> 
> If you have a machine with an address space > 64k, you probably have
> 32 bit registers.

Who said anything about the register size?  The 68000 has 32-bit registers;
anybody who claims there's no speed penalty for 32-bit arithmetic on
the 68000 has been smoking those funny cigarettes too long.  32-bit by
32-bit divides are NOT cheap on the 68000 (32-bit by 16-bit ones aren't
cheap either, but they're a lot cheaper than 32-bit by 32-bit ones).

> > On this point, I agree.  16-bit "int"s make techniques like using "malloc"
> > and "realloc" to grow a large table (used by such obscure programs as "nm")
> > lose big.
> 
> I just read a news item from yourself which stated:
> 	"THERE IS NO SUCH THING AS A GENERIC NULL POINTER"
> Presumably because of different length pointers. Which way do you want it?

1) It's not just because of different length pointers; the representation
could be different.

2) I prefer "int"s to be able to hold the size, in bytes, of an object
of approximately the same size as the address space.  I don't give a
tinker's damn whether different kinds of pointers are congruent or not.
One can imagine a machine where "int"s are big enough to hold the size
of the aforementioned object in bytes, but where "char *" and "int *" are
different sizes.  An example would be a word-addressed machine with a 64KW
address space.  Choose sizeof (int) == 32 bits (remember, sizes are in bytes),
sizeof (int *) == 16 bits (large enough to hold the largest possible pointer
to "int"), sizeof (char *) == 32 bits (at least 17 bits are necessary).
Possibly a dumb machine, but it points out that the question
"Which way do (I) want it" is meaningless, given that your two choices
are not mutually exclusive.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

guy@rlgvax.UUCP (Guy Harris) (02/05/85)

> Of course, you can always work around sizeof(char *)!=sizeof(int *)
> (or sizeof(char *)!=sizeof(int)), but often it is a hassle, and
> it makes porting old (4.2BSD :-) source code very difficult.

Changing the implementation of, say, "getpwent" and the password file
can make porting programs that rummage through the password file directly
difficult.  This is NOT an argument against changing the implementation.
It is an argument against writing such programs in the future, and for
dedicating time to clean up those fossils if you make such a change.

The same applies to implementations of C on machines that don't encourage
the same sorts of laxity as "reasonable" machines do.  If expedience
is VERY important, you might consider doing the wrong thing; however,
I think you're better off biting the bullet and fixing the code (and
reporting fixes to AT&T, UCB, or whoever wrote it - maybe they'll take
the hint).  Think of it as doing a good deed for the next person who
has to move that software to a machine which isn't a warmed-over PDP-11.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

Larry Carroll <LARRY@JPL-VLSI.ARPA> (02/06/85)

>Does anybody other than Cottrell have difficulty coping
>with various sizes of pointers?

"Trouble"?  No, but it's a detail that everyone has to cope with and 
even experienced C programmers sometimes forget to do it.  Especially 
if you're someone like me who's been away from C for a while discussions 
like these help me keep in mind practical matters that I may forget.

				Larry @ jpl-vlsi
------

jss@sjuvax.UUCP (J. Shapiro) (02/06/85)

[Aren't you hungry...?]
	It occurs to me that the length of a pointer simply being required to
be constant (even within the same data type) presents problems. Many
microprocessors now implement span-dependent addressing, and if your
implementation allows passing of an array wholesale, and that array is
small, there is no reason why one shouldn't be able to optimize the pointer
size as being relative to some register variable which points to the
current function's bottom of passed data.

	Is this a problem in practice - are pointers in fact obliged to be the
same size everywhere, or am I missing something?

	On the topic of sizeof(int) == sizeof(int *), I refer you to K&R p.
210, which says:
	
		1. A pointer may be converted to any of the integral types long
		enough to hold it. Whether an int or a long is required is machine
		dependent.

		2. An object of integral type may be explicitly converted to a
		pointer...

Since compilers need to do type checking anyway, passing 0 instead of NULL
should always be valid.  Note that K&R says that assigning 0 to an integer
generates the appropriate NULL pointer.  This type conversion (it is
implied) is automagic, and thus there *is* a generic NULL, which is the
integer 0.

It is also mentioned that "The mapping function... is intended to be
undsurprising to those who know the addressing structure of the machine,"
which is a loophole big enough to fly a barn through.

Jon Shapiro
Haverford College

bsa@ncoast.UUCP (Brandon Allbery) (02/08/85)

> Article <7810@brl-tgr.ARPA>, from Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA>
+----------------
| The rule in my team is "make the code pass lint completely or else
| explain why it can't possibly".

STANDARD RESPONSE:

Is it reasonable that I should have to write:

	char *foo = "/tmp";
	chdir(foo);

instead of

	chdir("/tmp");

just to satisfy lint?  It gets impossible to trace through the garbage
our lint puts out.  (Before you tell me to fix lint, give me a Xenix
source license.)

Brandon (bsa@ncoast.UUCP)
-- 
Brandon Allbery, decvax!cwruecmp!ncoast!bsa, "ncoast!bsa"@case.csnet (etc.)
6504 Chestnut Road, Independence, Ohio 44131 +1 216 524 1416 (or what have you)

garys@bunker.UUCP (Gary M. Samuelson) (02/08/85)

> 	On the topic of sizeof(int) == sizeof(int *), I refer you to K&R p.
> 210, which says:
> 	
> 		1. A pointer may be converted to any of the integral types
>		long enough to hold it. Whether an int or a long is required
>		is machine dependent.
> 
> 		2. An object of integral type may be explicitly converted to a
> 		pointer...
> 
> Since compilers need to do type checking anyway, passing 0 instead of NULL
> should always be valid.  Note that K&R says that assigning 0 to an integer
> generates the appropriate NULL pointer.  This type conversion (it is
> implied) is automagic, and thus there *is* a generic NULL, which is the
> integer 0.

Your reasoning breaks down at the implicit assumption that passing
0 as an argument to a function constitutes an assignment.  It doesn't;
the compiler does not know the types of function arguments where the
function is called.  E.g., when you write foo(bar), the compiler
knows what "bar" is, but has no idea what type foo's formal parameter
has.

> It is also mentioned that "The mapping function... is intended to be
> unsurprising to those who know the addressing structure of the machine,"
> which is a loophole big enough to fly a barn through.

> Jon Shapiro
> Haverford College

Gary Samuelson

henry@utzoo.UUCP (Henry Spencer) (02/08/85)

> 		2. An object of integral type may be explicitly converted to a
> 		pointer...
> 
> Since compilers need to do type checking anyway, passing 0 instead of NULL
> should always be valid.

Wrong, you forgot the word "explicit".  That means *you* have to do it.
The compiler won't do it for you in parameter-passing.  Remember that
current C compilers do not (cannot) check types of parameters.

> Note that K&R says that assigning 0 to an integer
> generates the appropriate NULL pointer.  This type conversion (it is
> implied) is automagic, and thus there *is* a generic NULL, which is the
> integer 0.

For the 157th time, WRONG.  The only generic NULL pointer in C is the
*literal* integer *constant* 0.  An integer *value* equal to 0 is *not*
a NULL pointer; only the constant 0 will do.  Unless the character
"0" appears -- either explicitly or via "#define NULL 0" -- at the place
where the conversion to pointer is being performed, then it's not a real
NULL pointer.  If you read K&R carefully, you will discover that this is
what it really says.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

ndiamond@watdaisy.UUCP (Norman Diamond) (02/08/85)

> I used to use
> a Honeywell mainframe.  The architecture dates back to the 60's,
> and Honeywell has kept it around in the name of upward compatibility.
> The machine is a *word* oriented machine with a 36 bit word(yes 36,
> not 32), ...
> This is what I call brain-damaged, but it is *real*, and a Honeywell
> C-compiler must put up with different types of pointers for ints and
> chars, *or* make chars 36 bits.
> -- Sarima (Stanley Friesen)

Not only that, but its characteristics are quoted in a table appearing
TWICE in K&R.  Therefore EVERYONE has already been warned about such
kinds of brain damage.
-- 

   Norman Diamond

UUCP:  {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdaisy!ndiamond
CSNET: ndiamond%watdaisy@waterloo.csnet
ARPA:  ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa

"Opinions are those of the keyboard, and do not reflect on me or higher-ups."

guy@rlgvax.UUCP (Guy Harris) (02/09/85)

> 	It occurs to me that the length of a pointer simply being required to
> be constant (even within the same data type) presents problems. Many
> microprocessors now implement span-dependent addressing, and if your
> implementation allows passing of an array wholesale, and that array is
> small, there is no reason why one shouldn't be able to optimize the pointer
> size as being relative to some register variable which points to the
> current function's bottom of passed data.
> 
> 	Is this a problem in practice - are pointers in fact obliged to be the
> same size everywhere, or am I missing something?

No, you are not missing anything - there is only one representation of a
"foo *" in a C program.  In the example you give, the called routine would
have to know that the pointer was relative to that particular register
(or, if the pointer indicated that fact, would at least have to know that
the pointer in question was a relative pointer).  This means that you'd
either have to introduce a character replacing "*" to indicate this new
flavor of pointer, or introduce a pragma which said "you can use a relative
pointer here".  (Which microprocessor are you referring to?)

> Since compilers need to do type checking anyway, passing 0 instead of NULL
> should always be valid.

The only problem is that C compilers do *not* do type checking in function
calls, because there's no way in the current C language to say that a
particular function's third argument is a "char *".  As such, passing 0
instead of NULL is not valid (well, passing 0 is the same as passing NULL,
and both are invalid; passing (char *)NULL is valid).

> Note that K&R says that assigning 0 to an integer generates the appropriate
> NULL pointer.  This type conversion (it is implied) is automagic, and thus
> there *is* a generic NULL, which is the integer 0.

No, there is no "generic NULL", there are several "appropriate NULL"s.
The integer 0 just happens to be a way of telling the compiler to generate
whatever null pointer is appropriate for the pointer type that appears
in the expression.  Maybe if the word "nil" had been a reserved word in
C, and C used "nil" instead of "0" for this purpose, a lot of the confusion
that null pointers cause might never have happened.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (02/10/85)

There is no way, with separate compilation of modules, that a current
C compiler can determine what pointer type to coerce a 0 function
argument to, which is why the programmer must do this himself.  In the
draft ANSI C standard, if a function prototype is specified then it
will indeed be possible (and required) that the compiler coerce an
argument to the right type.  Actually, some of us don't like this
since it hides coding errors; it would be nice if the compiler (or
at least lint) could give a warning when this coercion was done.

robert@gitpyr.UUCP (Robert Viduya) (02/10/85)

><
Posted from  Doug Gwyn (VLD/VMB) <gwyn@BRL-VLD.ARPA>
> Does anybody other than Cottrell have difficulty coping
> with various sizes of pointers?

I don't have difficulty with it, but I do feel that all pointers should
be the same size.  A pointer is a pointer, regardless of what it points to.
It's a datatype all by itself; it isn't a mutation of the datatype it points
to.

Perhaps an addition to the language is in order (gotta have something to
handle those Intel chips).  Well, since C allows you to have 'long int',
'int' and 'short int', what about long pointers, pointers and short pointers?
Don't ask me how they would be declared; I'll leave that up to someone
else.
			robert
-- 
Robert Viduya
Georgia Institute of Technology

...!{akgua,allegra,amd,hplabs,ihnp4,masscomp,ut-ngp}!gatech!gitpyr!robert
...!{rlgvax,sb1,uf-cgrl,unmvax,ut-sally}!gatech!gitpyr!robert

Doug Gwyn (VLD/VMB) <gwyn@Brl-Vld.ARPA> (02/11/85)

If your "lint" is indeed broken, and you don't maintain your own system,
then GET YOUR VENDOR TO FIX "lint".  The more people just take whatever
garbage the so-called UNIX vendors dish out, the worse the situation
will become.

ndiamond@watdaisy.UUCP (Norman Diamond) (02/11/85)

> I don't have difficulty with it [various sizes of pointers], but I do
> feel that all pointers should be the same size.  A pointer is a pointer,
> regardless of what it points to.  It's a datatype all by itself; it isn't
> a mutation of the datatype it points to.
>
> Perhaps an addition to the language is in order (gotta have something to
> handle those Intel chips).  Well, since C allows you to have 'long int',
> 'int' and 'short int', what about long pointers, pointers and short pointers?
> Don't ask me how they would be declared; I'll leave that up to someone
> else.
> --  Robert Viduya

Then no one will know when to declare a long pointer or short pointer.
They know they need a (struct xxx *) or a (char *), they should have a
compiler that's bright enough to figure out whether a long pointer or
short pointer is needed, for each machine they want to run their program
on.

In PL/I, a pointer is a datatype all by itself.  On some machines, in
order to be able to "point" to either integers or characters, you have
to waste 3/4 of the memory your strings are stored in, and you can't use
the machine's string instructions.

On Intel, you can make all pointers the same size by using long pointers
for everything, whether they're needed or not.  Or, you can use a language
that has a little bit of flexibility, and lets the compiler figure out
such things.

These are the reasons that Pascal, despite all of its shortcomings, is
more portable in some ways than C is.

People in net.lang.pascal are complaining about the same things, not being
able to assign pointers to ints.  Sure, let's reduce the portability of
every existing language, and give more jobs to portability and languages
people so that they can repeat the cycle, eh?
-- 

   Norman Diamond

UUCP:  {decvax|utzoo|ihnp4|allegra|clyde}!watmath!watdaisy!ndiamond
CSNET: ndiamond%watdaisy@waterloo.csnet
ARPA:  ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa

"Opinions are those of the keyboard, and do not reflect on me or higher-ups."

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (02/20/85)

> A pointer is a pointer, regardless of what it points to.
> It's a datatype all by itself; it isn't a mutation of the datatype it points
> to.

You must be thinking of some other language, Algol perhaps.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (02/21/85)

> ....  Or, you can use a language
> that has a little bit of flexibility, and lets the compiler figure out
> such things.
> 
> These are the reasons that Pascal, despite all of its shortcomings, is
> more portable in some ways than C is.

??? Conclusion does not follow.  Please do not confuse the complaints
from people who want C to be different with what C is.

ndiamond@watdaisy.UUCP (Norman Diamond) (02/24/85)

> > ....  Or, you can use a language
> > that has a little bit of flexibility, and lets the compiler figure out
> > such things.
> > 
> > These are the reasons that Pascal, despite all of its shortcomings, is
> > more portable in some ways than C is.
> 
> ??? Conclusion does not follow.  Please do not confuse the complaints
> from people who want C to be different with what C is.

In Pascal, if you want variables to be able to hold integers of certain
sizes, you specify the bounds.  The compiler figures out if it needs a
short, long, etc.  Same for sizes of sets (though a few early brain-damaged
implementations of Pascal created non-believers).

Both Pascal and the present definition of C do this for pointers.
-- 

   Norman Diamond

UUCP:  {decvax|utzoo|ihnp4|allegra}!watmath!watdaisy!ndiamond
CSNET: ndiamond%watdaisy@waterloo.csnet
ARPA:  ndiamond%watdaisy%waterloo.csnet@csnet-relay.arpa

"Opinions are those of the keyboard, and do not reflect on me or higher-ups."

dir@obo586.UUCP (Dan Rosenblatt) (05/03/85)

[]
A function I wrote for some graphics software looked something like:

vecmul(invec,inmat,outvec)
double invec[3],inmat[3][3],outvec[3];
{
	int i,j;
	double tmpvec[3];

	for (i=0;i<3;++i) {
		tmpvec[i] = 0.;
		for (j=0;j<3;++j)
			tmpvec[i] += invec[j] * inmat[i][j];
	}
	bcopy((char *)outvec,(char *)tmpvec,sizeof(outvec));
}

The calling sequence for bcopy was: (dst,src,size_in_bytes).
The problem is that 'sizeof(outvec)' produced the value 2
instead of what I expected - 24.  The reason is (as I kick
myself around the room :-}) is that an array which is a parameter
to a function becomes a pointer to that array.  The 2 comes
from the fact that I'm running on a 16-bit 8086 chip. 'nuf said.


Dan Rosenblatt
obo Systems, Inc.
...{ihnp4!denelcor,nbires!gangue}!obo586!dir

grayson@uiucuxc.CSO.UIUC.EDU (09/29/86)

It is interesting that the expression
	sizeof (int) - 1
is ambiguous in C, for it can be parsed as
	sizeof ((int)(- 1))
or as
	(sizeof(int)) - 1

Think about it!

The unix compiler does it the second way, for when it sees the '-' it
sets the precedence for that character ASSUMING it will be used as a
binary operator.

vedm@hoqam.UUCP (BEATTIE) (09/30/86)

> 
> It is interesting that the expression
> 	sizeof (int) - 1
> is ambiguous in C, for it can be parsed as
> 	sizeof ((int)(- 1))
> or as
> 	(sizeof(int)) - 1
> 
> Think about it!
> 
> The unix compiler does it the second way, for when it sees the '-' it
> sets the precedence for that character ASSUMING it will be used as a
> binary operator.

There is nothing ambiguous about it.
K&R p188:
"The construction   sizeof(type)  is taken to be a unit, so the
expression  sizeof(type)-2  is the same as (sizeof(type))-2."
Tom.
...!{decvax | ucbvax}!ihnp4!hoqax!twb

garys@bunker.UUCP (Gary M. Samuelson) (10/01/86)

>> It is interesting that the expression
>> 	sizeof (int) - 1
>> is ambiguous in C, for it can be parsed as
>> 	sizeof ((int)(- 1))
>> or as
>> 	(sizeof(int)) - 1

>> The unix compiler does it the second way, for when it sees the '-' it
       ------------- (speaking of ambiguous)
>> sets the precedence for that character ASSUMING it will be used as a
>> binary operator.

>There is nothing ambiguous about it.
>K&R p188:
>"The construction   sizeof(type)  is taken to be a unit, so the
>expression  sizeof(type)-2  is the same as (sizeof(type))-2."

You're both partially right, and partially wrong.  The expression
would be ambiguous, if not for the disambiguating rule where the
compiler is not *assuming* that the minus sign is binary, but
*deciding* that it is.

Gary Samuelson

drw@cullvax.UUCP (Dale Worley) (10/01/86)

> It is interesting that the expression
> 	sizeof (int) - 1
> is ambiguous in C, for it can be parsed as
> 	sizeof ((int)(- 1))
> or as
> 	(sizeof(int)) - 1
> 
> Think about it!

Both K&R (App. A, 7.2) and Harbison&Steele (7.4.2) note that it is
ambiguous on the face of it, and that it is to be resolved in favor of
	(sizeof (int)) - 1

Dale

pedz@bobkat.UUCP (Pedz Thing) (10/02/86)

In article <102500008@uiucuxc> grayson@uiucuxc.CSO.UIUC.EDU writes:
>
>It is interesting that the expression
>	sizeof (int) - 1
>is ambiguous in C, for it can be parsed as
>	sizeof ((int)(- 1))
>or as
>	(sizeof(int)) - 1
>

At first, I thought this note was really stupid.  I had the idea that
unary minus was down at a different level from the other unary
operators.  Then as I looked more and more into it, I came to these
conclusions.  First, what the compiler does is correct because this
exact case is mentioned in the K & R (Last paragraph of section 7.2 in
Appendix A, page 188).  Second, this is a special case.  Sizeof, type
cast, and unary minus are all at the same precedence and they
associate right to left.  Thus the normal interpretation would be
(sizeof ((int) (-1))).  This is not the correct interpretation however
as I just mentioned.
-- 
Perry Smith
ctvax ---\
megamax --- bobkat!pedz
pollux---/

rgenter@labs-b.bbn.com (Rick Genter) (10/03/86)

The expression

	sizeof (int) - 1

is not ambiguous.  The operand of "sizeof" must either by a typecast or
an lvalue.  "(int) -1" is neither.
--------
Rick Genter 				BBN Laboratories Inc.
(617) 497-3848				10 Moulton St.  6/512
rgenter@labs-b.bbn.COM  (Internet new)	Cambridge, MA   02238
rgenter@bbn-labs-b.ARPA (Internet old)	linus!rgenter%BBN-LABS-B.ARPA (UUCP)

ark@alice.UucP (Andrew Koenig) (10/04/86)

> The expression
>
>	sizeof (int) - 1
>
> is not ambiguous.  The operand of "sizeof" must either by a typecast or
> an lvalue.  "(int) -1" is neither.

Nope.  The operand of "sizeof" can be an rvalue.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/03/86)

In article <663@dg_rtp.UUCP> throopw@dg_rtp.UUCP (Wayne Throop) writes:
>True, true.  But ANSI is likely to decide that sizeof(char) MUST ALWAYS
>BE one (and I think this is universally true on existing
>implementations... if I'm wrong, somebody let me know).

X3J11 as it stands requires sizeof(char)==1.  I have proposed that
this requirement be removed, to better support applications such as
Asian character sets and bitmap display programming.  Along with
this, I proposed a new data type such that sizeof(short char)==1.
It turns out that the current draft proposed standard has to be
changed very little to support this distinction between character
objects (char) and smallest-addressable objects (short char).  This
is much better, I think, than a proposal that introduced (long char)
for text characters.

Unfortunately, much existing C code believes that "char" means "byte".
My proposal would allow implementors the freedom to decide whether
supporting this existing practice is more important than the benefits
of making a distinction between the two concepts.

It is possible to write code that doesn't depend on sizeof(char)==1,
and some C programmers are already careful about this.  Transition
to the more general scheme would occur gradually (if at all) for
existing C implementations, with only implementors of systems for
the Asian market and of bitmap display architectures initially taking
advantage of the opportunity to make these types different sizes.

guy@sun.uucp (Guy Harris) (11/05/86)

> X3J11 as it stands requires sizeof(char)==1.  I have proposed that
> this requirement be removed, to better support applications such as
> Asian character sets and bitmap display programming.  Along with
> this, I proposed a new data type such that sizeof(short char)==1.
> It turns out that the current draft proposed standard has to be
> changed very little to support this distinction between character
> objects (char) and smallest-addressable objects (short char).  This
> is much better, I think, than a proposal that introduced (long char)
> for text characters.

Why?  If this is the AT&T proposal, it did *not* "introduce (long char) for
text characters"; it introduced (long char) for *long* text characters.
"char" is still to be used when processing text that does not include long
(16-bit) characters.  I believe the theory here was that requiring *all*
programs that process text ("cat" doesn't count; it doesn't - or, at least,
shouldn't - process text) to process them in 16-bit blocks might cut their
performance to a degree that customers who would not use the ability to
handle Kanji would find unacceptable.  I have seen no data to confirm or
disprove this.

(Changing the meaning of "char" does not directly affect the support of
"bitmap display programming" at all.  It only affects applications that
display things like Asian character sets on bitmap displays, but it doesn't
affect them any differently than it affects applications that display them
on "conventional" terminals that support those character sets.)

> Unfortunately, much existing C code believes that "char" means "byte".
> My proposal would allow implementors the freedom to decide whether
> supporting this existing practice is more important than the benefits
> of making a distinction between the two concepts.

Both "short char"/"char" and "char"/"long char" make a distinction between
the two concepts; one may have aesthetic objections with the way the latter
scheme draws the distinction, but that's another matter.  (Is 16 bits enough
if you want to give every single character a code of its own?)

> It is possible to write code that doesn't depend on sizeof(char)==1,
> and some C programmers are already careful about this.

It is possible to write *some* code so that it doesn't depend on
sizeof(char)==1.  Absent a data type one byte long, other code is difficult
at best to write this way.

> Transition to the more general scheme would occur gradually (if at all) for
> existing C implementations, with only implementors of systems for
> the Asian market and of bitmap display architectures initially taking
> advantage of the opportunity to make these types different sizes.

I think "if at all" is appropriate here.  There are a *lot* of interfaces
that think that "char" is a one-byte data type; e.g., "read", "write", etc..
I see no evidence that converting existing code and data structures to use
"short char" would be anything other than highly disruptive.

Adding "long char" would permit new programs to be written to support long
characters, and permit existing programs to be rewritten to support them,
without breaking existing programs; this indicates to me that it would make
it much more likely that "long char" would be widely adopted and used than
that "short char" would.  I see no reason why a proposal that would, quite
likely, lead to two different C-language environments existing in parallel
for a long time to come is superior to one that would permit environments to
add on the ability to handle long characters and thus would make it easier
for them to do so and thus more likely that they would.  (This is especially
true when you consider that most of the programs in question would have to
be changed quite a bit to support Asian languages *anyway*; just widening
"char" to 16 bits, recompiling them, and linking them with a library with a
brand new standard I/O, etc. would barely begin to make them support those
languages.)
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

levy@ttrdc.UUCP (Daniel R. Levy) (11/05/86)

In article <5141@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>X3J11 as it stands requires sizeof(char)==1.  I have proposed that
>this requirement be removed, to better support applications such as
>Asian character sets and bitmap display programming.  Along with
>this, I proposed a new data type such that sizeof(short char)==1.
>It turns out that the current draft proposed standard has to be
>changed very little to support this distinction between character
>objects (char) and smallest-addressable objects (short char).  This
>is much better, I think, than a proposal that introduced (long char)
>for text characters.
>
>Unfortunately, much existing C code believes that "char" means "byte".
>
>It is possible to write code that doesn't depend on sizeof(char)==1,
>and some C programmers are already careful about this.

A question:  what about the jillions of C programs out there which
declare "char *malloc()"?  Will they all need to be changed?  Common
sense says no, since malloc() is supposed to return a "maximally aligned"
address anyhow, so as far as anyone cares it could be declared float * or
double * or short int * or (anything else)*  if malloc() in the malloc() code
itself were declared the same way.  So if "char" happened to be a two byte
quantity, no sweat, right?  Or was there any particular reason for declaring
malloc() to be a "char *"?   And thus, might something break in malloc() or
the usage thereof if char might no longer be the smallest addressable quantity?
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer or the administrator of any computer
| at&t computer systems division |  upon which I may hack.
|        skokie, illinois        |
 --------------------------------   Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
	   go for it!  			allegra,ulysses,vax135}!ttrdc!levy

kimcm@olamb.UUCP (Kim Chr. Madsen) (11/06/86)

In article <5141@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
] 
] X3J11 as it stands requires sizeof(char)==1.  I have proposed that
] this requirement be removed, to better support applications such as
] Asian character sets and bitmap display programming.  Along with
] this, I proposed a new data type such that sizeof(short char)==1.
] It turns out that the current draft proposed standard has to be
] changed very little to support this distinction between character
] objects (char) and smallest-addressable objects (short char).  This
] is much better, I think, than a proposal that introduced (long char)
] for text characters.
] 
] Unfortunately, much existing C code believes that "char" means "byte".
] My proposal would allow implementors the freedom to decide whether
] supporting this existing practice is more important than the benefits

Why not take the full step and let the datatype char be of variable size,
like int's and other types. Then invent the datatype ``byte'' which is exactly
8 bits long.

Do I hear you say it would break existing C-code, well so would the introduction
of ``short char''.... 

					<Kim Chr. Madsen>

mwm@eris.BERKELEY.EDU (Mike (Don't have strength to leave) Meyer) (11/07/86)

In article <126@olamb.UUCP> kimcm@olamb.UUCP (Kim Chr. Madsen) writes:
>Why not take the full step and let the datatype char be of variable size,
>like int's and other types. Then invent the datatype ``byte'' which is exactly
>8 bits long.

Ok, so what should those with C compilers on the QM/C (18 bit words,
word addressable) or the C/70 (20 bit words, two 10-bit address units
per word) do, hmmm? And yes, there are C compilers for those two
machines.

Not only is all the world not a VAX, it all isn't even addressable in
eight-bit units!

	<mike

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/07/86)

Guy missed the meaning of my reference to bitmap display programming.
What I really care about in this context is support for direct bit
addressing.  I know for a fact that one reason we don't HAVE this on
some current architectures is the lack of access to the facility from
high-level languages.  I would like it to be POSSIBLE for some designer
of an architecture likely to be used for bit-mapped systems to decide
to make bits directly addressable.  I know I have often wished that I
had bit arrays in C when programming bitmap display applications.

The 8-bit byte was an arbitrary packaging decision (made by IBM for
the System/360 family, by DEC for the PDP-11, and by some others, but
definitely not by EVERY vendor).  There are already some 9-, 10-, and
12-bit oriented C implementations; I would like to give implementors
the OPTION of choosing to use 16-bit (char)s even if their machine can
address individual 8-bit bytes or even individual bits.

The idea of a "character" is that of an individually manipulable
primitive unit of text.  The idea of "byte" is that of an individually
addressable unit of storage.  From one point of view, it doesn't matter
what the two basic types would be called if and when this distinction is
made in the C language.  However, in X3J11 practically everything that
now refers to (char) arrays is designed principally for text application,
while practically everything that refers to arbitrary storage uses
(void *), not (char *).  (The one exception is strcoll(), which
specifically produces a (char[]) result; Prosser and I discussed this
and agreed that this was acceptable for its intended use.  In a good
implementation using my (char)/(short char) distinction, it would be
POSSIBLE to maintain a reasonable default collating sequence for (char)s
so that a kludge like strcoll() would not normally be necessary.)
Using (long char) for genuine text characters would conflict with
existing definitions for text-oriented functions, which is the main
reason I decided that (char) is STILL the proper type for text units.

I realize that many major vendors in the international UNIX market
have already adopted "solutions" to the problem of "international
character sets"; however, each has taken a different approach!  There
is nothing in my proposal to preclude an implementor from continuing
to force sizeof(char)==sizeof(short char) and preserving his previous
vendor-specific "solution"; however, what I proposed ALLOWS an
implementor to choose a much cleaner solution if he so desires,
without forcing him to if he prefers other methods, and it also allows
nybble- or bit-addressable architectures to be nicely supported at the
C language level.  The trade-off is between more compact storage (as
in AT&T's approach) requiring kludgery to handle individual textual
units, versus a clean, simple model of characters and storage cells
that supports uncomplicated, straightforward programming.

It happens that the text/binary stream distinction of X3J11 fits the
corresponding character/byte distinction very nicely.  The only wart
is for systems like UNIX that allow mixing of text-stream operations,
such as scanf(), with binary-stream operations, such as fread(); there
is a potential alignment problem in doing this.  (By the way, I also
propose new functions [f]getsc()/[f]putsc() for getting/putting single
(short char)s; this is necessary for the semantic definition of
fread()/fwrite() on binary streams.  In my original proposal these
were called [f]getbyte()/[f]putbyte(), but the new names are better.)

ANY C implementation that makes a real distinction between characters
and bytes is going to cause problems for people porting their code
to it.  The choices are, first, whether to ever make such a distinction,
and second, if so, how to do so.  I believe the distinction is
important, and much prefer a clean solution over one that requires
programmers to convert text data arrays back and forth, or to keep
track of two sets of otherwise identical library functions.  As with
function prototypes, a transition period can exist during which (char)
and (short char) have the same size, which is no worse than the current
situation, and implementors could choose when if ever to split these
types apart.

Please note that there is not much impact of my proposal on current
good C coding practice; for example, the following continue to work
no matter what choices the C implementor has made:

	struct foo bar[SIZE], barcpy;
	unsigned nelements = sizeof bar / sizeof bar[0];
	fread( bar, sizeof(struct foo), SIZE, fp );
	fread( bar, sizeof bar, 1, fp );
	memcpy( &barcpy, &bar[3], sizeof(struct foo) );
	/* the above requires casting anyway if prototype not in scope */

	char str[] = "text";
	printf( "\"%s\" contains %d characters\n", str, strlen( str ) );

While it is POSSIBLE to run into problems, such as in using the
result of strlen() as the length of a memcpy() operation, these
don't arise so often that it is hopeless to make the transition.
One thing for sure, if we don't make the character/byte distinction
POSSIBLE in the formal ANSI C standard, it will be too late to do
it later.  The absolute minimum necessary is to remove the
requirement that sizeof(char)==1 from the standard, although this
opens up a hole in the spec that needs plugging by a proposal like
mine (X3J11/86-136, revised to fit the latest draft proposed standard
and to change the names of the primitive byte get/put functions).

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/07/86)

In article <1294@ttrdc.UUCP> levy@ttrdc.UUCP (Daniel R. Levy) writes:
>A question:  what about the jillions of C programs out there which
>declare "char *malloc()"?  Will they all need to be changed?  Common
>sense says no, since malloc() is supposed to return a "maximally aligned"
>address anyhow, so as far as anyone cares it could be declared float * or
>double * or short int * or (anything else)*  if malloc() in the malloc() code
>itself were declared the same way.  So if "char" happened to be a two byte
>quantity, no sweat, right?  Or was there any particular reason for declaring
>malloc() to be a "char *"?   And thus, might something break in malloc() or
>the usage thereof if char might no longer be the smallest addressable quantity?

X3J11 malloc() returns type (void *) anyway, so this is already an issue
independently of the multi-byte (char) issue.  The answer is, on most
machines the old (char *) declaration of malloc() will not result in
broken code under X3J11, but it is POSSIBLE that it would break under
some X3J11 implementations (one assumes that the implementer will take
pains to keep this from happening if at all possible).

Under the multi-byte (char) proposal, malloc() still returns (void *)
and is not affected at all by the proposal.  sizeof() still returns
the number of primitive storage cells occupied by a data object, which
is still the right information to feed malloc() as a parameter.

The X3J11 draft proposed standard as it now stands has actually managed
to enforce a rather clean distinction between (char) data and arbitrary
data.  The additional changes to the draft to introduce a separate data
type for the smallest addressable storage unit are really very minor.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/07/86)

In article <126@olamb.UUCP> kimcm@olamb.UUCP (Kim Chr. Madsen) writes:
>Why not take the full step and let the datatype char be of variable size,
>like int's and other types. Then invent the datatype ``byte'' which is exactly
>8 bits long.

When fully elaborated to address the related issues, this idea differs
from what I have proposed in only two fundamental ways:
	(1) no support for smallest addressable chunk sizes other than
		8 bits;
	(2) introduction of a new keyword, one likely to be in heavy
		use in existing carefully-written C code.

guy@sun.uucp (Guy Harris) (11/08/86)

> Guy missed the meaning of my reference to bitmap display programming.
> What I really care about in this context is support for direct bit
> addressing.

I am not at all convinced that anybody *should* care about this, at least
from the standpoint of bitmap display programming.  If a vendor permits you
to bang bits on a display, they should provide you with routines to do this;
frame buffers are not all the same, and code that works well on one display
may not work well at all on another.  Furthermore, some hardware may do some
bit-banging operations for you; if you approach the display at the right
level of abstraction, this can be done transparently, but not if you just
write into a bit array.

Furthermore, it's not clear that displays should be programmed at the
bit-array level anyway; James Gosling and David Rosenthal have made what I
consider a very good case against doing this (and no, I don't consider it a
good case just because I work at Sun and we're trying to push NeWS).

> I know for a fact that one reason we don't HAVE this on some current
> architectures is the lack of access to the facility from
> high-level languages.

If that is the case, then the architect made a mistake.  If it's really
important, they can extend the language.  Yes, this means a non-standard
extension; however, the only way to get it to be a standard extension is to
get *every* vendor to adopt it, regardless of whether they support bit
addressing or not.  In the case of C, this means longer-than-32-bit "void *"
on lots of *existing* machines; I don't think the chances of this happening
are very good at all.

> I would like it to be POSSIBLE for some designer of an architecture
> likely to be used for bit-mapped systems to decide to make bits directly
> addressable.

It is ALREADY possible to do this.  The architect merely has to avoid
thinking "if I can't get at this feature from unextended ANSI C, I shouldn't
put it in."  The chances are very slim indeed that there will be a standard
way to do bit addressing in ANSI C, since this would require ANSI C to
mandate that all implementations support it, and would require ANSI C to be
rather more different from current C implementations that most vendors would
like.

> The idea of a "character" is that of an individually manipulable
> primitive unit of text.

As I've already pointed out, it is quite possible that there may be more
than one such notion on a system.

> However, in X3J11 practically everything that now refers to (char)
> arrays is designed principally for text application, while practically
> everything that refers to arbitrary storage uses (void *), not (char *).

However, you're now introducing a *third* type; when you are dealing with
arbitrary storage, sometimes you use "void *" as a pointer to arbitrary
storage and sometimes you use "short char" as an element of arbitrary
storage.

> In a good implementation using my (char)/(short char) distinction, it
> would be POSSIBLE to maintain a reasonable default collating sequence
> for (char)s so that a kludge like strcoll() would not normally be
> necessary.)

This is simply not true, unless the "normally" here is being used as an
escape clause to dismiss many natural languages as abnormal.  Some languages
do *not* sort words with a character-by-character comparison (e.g., German).
One *might* give ligatures like "SS" "char" codes of their own - but you'd
have to deal with existing documents with two "S"es in them, and you'd
either have to convert them "on the fly" in standard I/O (in which case
you'd have to have standard I/O know what language the file was in) or
convert them *en bloc* when you brought the document over from a system with
8-bit "char"s.  (Oh, yes, you'd still have to have standard I/O handle 8-bit
and 16-bit "char"s, and conversion between them, unless you propose to make
this new whizzy machine require text file conversion when you bring files
from or send files to machines with boring obsolete old 8-bit "char"s.)

Furthermore, I don't know how you sort words in Oriental languages, although
I remember people saying there *is* no unique way of sorting them.

> Using (long char) for genuine text characters would conflict with
> existing definitions for text-oriented functions, which is the main
> reason I decided that (char) is STILL the proper type for text units.

If you're going to internationalize an existing program, changing it to use
"lstrcpy" instead of "strcpy" is the least of your worries.  I see no
problem whatsoever with having the existing text-oriented functions handle
8-bit "char"s.  Furthermore, since not every implementation that supports
large character sets is going to adopt 16-bit "char"s, you're going to need
two sets of text-oriented functions in the specification anyway.

> The trade-off is between more compact storage (as in AT&T's approach)
> requiring kludgery to handle individual textual units, versus a clean,
> simple model of characters and storage cells that supports uncomplicated,
> straightforward programming.

What is this "kludgery"?  You need two classes of string manipulation
routines.  Big Deal.  You need to convert some encoded representation in a
file to a 16-bit-character representation when you read the file, and
convert it back when you write it back.  Big Deal.  This would presumably be
handled by library routines.  If you're going to read existing text files
without requireing them to be blessed by a conversion utility, you'll have
to do that in your scheme as well.  You need to remember to properly declare
"char" and "long char" variables, and arrays and pointers to same.  Big Deal.

I am not convinced that the "char"/"long char" scheme is significantly less
"clean", "simple", "uncomplicated", or "straightforward" than the "short
char"/"char" scheme.

> While it is POSSIBLE to run into problems, such as in using the
> result of strlen() as the length of a memcpy() operation, these
> don't arise so often that it is hopeless to make the transition.

Sigh.  No, it isn't necessarily HOPELESS; however, you have not provided ANY
evidence that the various problems caused by changing the meaning of "char"
would be preferable to any disruption to the "clean" models caused by adding
"long char".  (Frankly, I'd rather keep track of two types of string copy
routines and character types than keep track of all the *existing* code that
would have to have "char"s changed to "short char".)
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

gwyn@brl-smoke.ARPA (Doug Gwyn ) (11/08/86)

Guy is still missing my point about bitmap display programming;
I have NOT been arguing for a GUARANTEED PORTABLE way to handle
individual bits, but rather for the ability to do so directly
in real C on specific machines/implementations WITH THE FACILITY:
	typedef short char	Pixel;	/* one bit for B&W displays */
				/* fancy color frame buffers wouldn't
				   use (short char) for this, but an
				   inexpensive "home" model might */
	typedef struct
		{
		short	x, y;
		}	Point;
	typedef struct
		{
		Point	origin, corner;
		}	Rectangle;
	typedef struct
		{
		Pixel	*base;		/* NOT (Word *) */
		unsigned width;		/* in Bits, not Words */
		Rectangle rect;
		/* obscured-layer chain really goes here */
		}	Bitmap;	/* does this look familiar? */
Direct use of Pixel pointers/arrays tremendously simplifies coding for
such applications as "dmdp", where one has to pick up typically six
bits at a time from a rectangle for each printer byte being assembled
(sometimes none of the six bits are in the same "word", no matter how
bits may have been clumped into words by the architect).

Now, MC68000 and WE32000 architectures do not support this (except for
(short char)s that are multi-bit pixels).  But I definitely want the
next generation of desktop processors to support bit addressing.  I am
fully aware that programming at this level of detail is non-portable,
but portable graphics programming SUCKS, particularly at the interactive
human interface level.  Programmers who try that are doing their users
a disservice.  I say this from the perspective of one who is considered
almost obsessively concerned with software portability and who has been
the chief designer of spiffy commercial graphic systems (and who
currently programs DMDs and REAL frame buffers, not Suns).

I'm well aware of the use of packed-bit access macros, thank you.  That
is exactly what I want to get away from!  The BIT is the basic unit of
information, not the "byte", and there is nothing particularly sacred
about the number 8, either.  I agree that if you want to write PORTABLE
bit-accessing code, you'll have to use macros or functions, since SOME
machines/implementations will not directly support one-bit data objects.
That wasn't my concern.

Due to all the confusion, I'm recapitulating my proposal briefly:
	ESSENTIAL:
		(1) New type: (short char), signedness as for (char).
		(2) sizeof(short char) == 1.
		(3) sizeof(char) >= sizeof(short char).
		(4) Clean up wording slightly to improve the
		    byte (storage cell) vs. character distinction.
	RECOMMENDED:
		(5) Fix character \-escapes so that larger numeric
		    values are permitted in character/string constants
		    on implementations where that is needed.  The
		    current 9/12 bit limit is a botch anyway.
		(6) Text streams read/write/seek (char)s, and
		    binary streams read/write/seek (short char)s.
		    This requires addition of fgetsc(), fputsc(),
		    which are routines I think most system programmers
		    have already invented under names like get_byte().
		(7) Add `b' size modifier for fscanf().

I've previously pointed out that this has very little impact on most
existing code, although I do know of exceptions.  (Actually, until the
code is ported to a sizeof(short char) != sizeof(char) environment,
it wouldn't break in this regard.  That port is likely to be a painful
one in any case, since it would probably be to a multi-byte character
environment, and SOMEthing would have to be done anyway.  The changes
necessary to accommodate this are generally fewer and simpler under my
proposal than under a (long char)/lstrcpy() approach.)

As to whether I think that mapping to/from 16-bit (char) would be done
by the I/O support system rather than the application code, my answer
is:  Absolutely!  That's where it belongs.  (AT&T has said this too,
on at least one occasion, taking it even so far as to suggest that the
device driver should be doing this.  I assume they meant a STREAMS
module.)

I won't bother responding in detail on other points, such as use of
reasonable default "DP shop" collating sequences analogous to ASCII
without having to pack/unpack multi-byte strings.  (Yes, it's true
that machine collating sequence isn't always appropriate -- but does
that mean that one never encounters computer output that IS ordered by
internal collating sequence?  Also note that strcoll() amounts to a
declaration that there IS a natural multibyte collating sequence for
any single environment.)  Instead I will simply assure you that I
have indeed thought about all those things (and more), have read the
literature, have talked with people working on internationalization,
and have even been in internationalization working groups.  I spent the
seven hours driving back from the Raleigh X3J11 meeting analyzing why
people were finding these issues so complex, and discovered that much
of it was due to the unquestioned assumption that "16-bit" text had to
be considered as made of individual 8-bit (char)s.  If one starts to
write out a BNF grammar for what text IS, it becomes obvious very
quickly that that is an unnatural constraint.  Before glibly dismissing
this as not well thought out, give it a genuine try and see what it is
like for actual programming; then try ANY alternative approach and see
how IT works in practice.

If you prefer, don't consider my proposal as a panacea for such issues,
but rather as a simple extension that permits some implementers to
choose comparatively straightforward solutions while leaving all others
no worse off than before (proof: if one were to decide to make
sizeof(char) == sizeof(short char), that is precisely where we are now.)
What I DON'T want to see is a klutzy solution FORCED on all implementers,
which is what standardizing a bunch of simultaneous (long char) and (char)
string routines (lstrcpy(), etc.) would amount to.  If vendors think it
is necessary to take the (long char) approach, the door is still open
for them to do so under my proposal (without X3J11's blessing), but
vendors who really don't care about 16-bit chars (yes, there are vendors
like that!) are not forced to provide that extra baggage in their
libraries and documentation.

The fact that more future CPU architectures may support tiny data types
directly in standard C than at present is an extra benefit from my
approach to the "multi-byte character" problem; it wasn't my original
motivation, but I'm happy that it turned out that way.  (You can bet
that (short char) would be heavily used for Boolean arrays, for example,
if my proposal makes it into the standard; device-specific bitmap
display programming is by no means the only application that could
benefit from availability of a shorter type.  I've seen many people
#define TINY for nybble-sized quantities, usually having to use a
larger size (e.g., (char)) than they really wanted.)

From the resistance he's been putting up, I doubt that I will convert
Guy to my point of view, and I'm fairly sure that many people who have
already settled on some strategy to address the multi-byte character
issue are not eager to back out the work they've already put into it.
However, since I've shown that a clean conceptual model for such text
IS workable, there's no excuse for continued claims that explicit
byte-packing and unpacking is the only way to go.