[net.lang.c] sizeof, ptrs, lint, portability

cottrell@nbs-vms.ARPA (02/08/85)

/*
-) This discussion is getting ugly. I am not a `dedicated amateur hacker.'
   I am quite good as a matter of fact. I care a lot about quality code,
   more than most people. I will therefore try & wrap this up.

-) I hope that ANSI gets the standard out soon. Please remember we are
   a privileged group in that we get to hear many viewpoints, including
   such annoying ones as myself. Many people don't even know there will
   be a standard. Lots of people work in a vacuum.

-) I am shocked to find out that pointers may be of different sizes, &
   may be of a different size than int. This has been true for so long
   many people just assumed it. I believe it should be true wherever possible
   for the following reason: if a default exists, it should be a useful one.
   Defining different sizes for these items gives credibility to the
   claim that C is dangerous. Just another accident waiting to happen.
   
-) Perhaps you forgot some of the attraxions of the language: a terse
   language that modelled the machine closely. No stupid booleans.
   All those casts are ugly. How are we to convert people to C if they
   must put up with all that verbosity? Shouldn't the compiler know?
   Okay, automatic casting will help *execution* correctness, but with
   the declarations for funxions in another file, the code will still
   read the same (casts optional). Mostly noone looks at header files.

-) I apologize for calling (presumably someone's favorite) machines `weird'
   or `braindamaged'. Let's say `less capable'. The pdp-11 was a landmark
   in machine design. It was byte addressable, had a stack, memory mapped
   i/o, an orthogonal instruxion set, and useful addressing modes. The
   vax continued this trend. Most micros (all?) are byte addressable.
   Most have an address space two to the size-of-register-in-bits power.
   Most of the machines designed before this were not byte addressable.
   Most of these machines had some very strange glitches. In short weird.
   Some minis continued the mainframe trend and only addressed words.
   This sort of machine is an inhospitable host for the C language and
   some implementations are downright kluges. I claim that they don't
   run C but another language I would call `C--'.

-) While you are claiming that it is MY CODING PRACTICES (and evidently
   hordes of others, including 4.2bsd & sys III implementors) that are
   nonportable, I am claiming that it is THOSE WEIRD MACHINES that are
   nonportable. By changing the rules in the middle of the game, you
   are depriving me (and others) of the time honored tradition of punning.
   I know it is easier to change the language than the machines. I say
   don't do it. Why encourage the production of out-dated hardware?

-) I still maintain that assigning zero to a pointer stores an unspecified
   number of zero bits. The nil/null ptr is a convention, just like
   null terminated strings. We all agree that zero is special because
   there is not likely to be real data there. The null ptr is an
   out-of-band value. We agreed to represent it in-bound. Still, a piece
   of kernel code should be able to pick up the byte at address zero by:
		int j; char *p; p = 0; j = *p;
   Allowing any other value to actually be stored breaks this. Besides,
   SHOW ME A C THAT USES ANYTHING OTHER THAN ZERO ON ANY MACHINE!!!
   K&R says of pointer to integer conversions: "The mapping funxion is
   also machine dependent, but is intended to be unsurprising to those
   who know the machine." I would be surprised at nonzero null ptrs.

-) Guy: if I want the square root of four, I do sqrt(4.0); NO CAST!

-) As for the Honeywell x16 & CDC 17xx which were 16 bit word-addressable
   only (I presume a word pointer in one word, one bit representing
   left/rite in the second word?) there is another solution, albeit klugy:
   put the whole thing in one word & restrict character addressing to the
   lower half of the address space. The Varian V7x series (upgraded 620i)
   uses this format, altho words can only reference 32k words because
   bit 15 is used for indirexion and byte addressing will not indirect.
   Yeah, I know, gross. This can be  mitigated if they have memory
   management. Why would you need 64k words instead of 32k? Hey, it's
   finite. Get a bigger machine if you need one.

-) Perhaps I forgot the :-) on my `swapping by xor' article. Like
   Guy said, "cute", but not very user friendly.

-) How many of you port what percentage of programs? I thought the
   intent of the standard was to not break existing programs. I claim
   that the standard should recognize the existing idioms. Languages
   are also defined by usage as well as specification. Pretty soon you
   will be passing descriptors around instead of objex!

-) I will present (in another article, this one's getting too long)
   a good reason for type punning. Stay tuned to this channel...

-) I started out to be reconciliatory; I am unfortunately (for y'all)
   more convinced of my position. Eat flaming death fascist media pigs!
*/

jsdy@SEISMO.ARPA (02/08/85)

I'm kinda tired of cottrell's insistent flaming.  I am almost convinced
that he does a lot of it just to get under the skins of people like Guy,
who can get very righteously provoked.	;-)/;-S

I think most folk would agree that we like C's terse style, and that
once we have its capabilities on our particular machine down pat, we can
right incredibly clever and, yes, punning programs that do all manner of
wonderful things.  We can do this quickly, brightly, and with beauty,
but not necessarily portably.  And I'm using that word in a highly
literal sense there!  If it ports to 99.9% of correct compilers but not
to 0.1%, then it is not portable.  That is  n o t  to say it's bad code
-- it does what you want, if you don't want to run on the 0.1%.  I know
that cottrell has said any number of times he doesn't, so his code is
still GOOD CODE.  At least, as far as this criterion goes.

However, there is an important subgroup of us to whom it is important
to know exactly what is legal for 100% of all correct compilers, and
what is not.  This is a subgroup, not the whole.  And to that group,
it matters much that a program to be 100% portable must take into
account "weird" machines and odd-sized word/byte/???? sizes.  And, yes,
pointers like the DEC-10 byte pointers! (18 bits address, 18 bits byte
specifier, remember?)

There is room in this world for all of us.  I fall into each group, on
occasion, although I must admit that Guy sounds much purer than anything
I've ever written.  (Of course, I've never written ANSI code ...).
However, there is not room in my notesfiles for so many flames!  [;-)]
I am not "the legendary Loren", and have trouble keeping up with my
limited number of newsgroups.  So, let's keep it to more light than
heat, OK?	;-);-);-)

Joe Yao		hadron!jsdy@seismo.{ARPA,UUCP}

guy@rlgvax.UUCP (Guy Harris) (02/09/85)

> -) I am shocked to find out that pointers may be of different sizes, &
>    may be of a different size than int. This has been true for so long
>    many people just assumed it. I believe it should be true wherever possible
>    for the following reason: if a default exists, it should be a useful one.
>    Defining different sizes for these items gives credibility to the
>    claim that C is dangerous. Just another accident waiting to happen.

The existence of automobiles is also "an accident waiting to happen" (although
it didn't wait very long) in those terms.  I don't blame the automobile,
I blame the driver.  I have no interest in seeing a governor placed on
all cars that limits speed to 35MPH.  Nor do I have an interest in seeing
the requirement that all pointer types must be represented the same way
placed on the C language.  If people can't cope with machines that require
(or, at least, strongly prefer) different pointer representations, that's
their problem, not C's.

> -) Perhaps you forgot some of the attraxions of the language: a terse
>    language that modelled the machine closely. No stupid booleans.
>    All those casts are ugly. How are we to convert people to C if they
>    must put up with all that verbosity? Shouldn't the compiler know?

That's why the ANSI C standard improved the declaration syntax for functions;
yes, the compiler should know, and ANSI C compilers do know (except for
functions with a variable number of arguments; the prime offender, "execl",
is just syntactic sugar for something that can be done equally well with
"execv").

> -) I apologize for calling (presumably someone's favorite) machines `weird'
>    or `braindamaged'. Let's say `less capable'. The pdp-11 was a landmark
>    in machine design. It was byte addressable, had a stack, memory mapped
>    i/o, an orthogonal instruxion set, and useful addressing modes. The
>    vax continued this trend. Most micros (all?) are byte addressable.

According to the Stanford MIPS people (see "Hardware/Software Tradeoffs
for Increased Performance" in the Proceedings of the Symposium on Architectural
Support for Programming Languages and Operating Systems, SIGARCH Computer
Architecture News V10#2 and SIGPLAN Notices V17#4), you may be better off
if you have a word-addressed machine and special pointers for accessing
bytes.  (In their case, byte and word pointers are both 32 bits long, but
coercions are still not copies.)

>    Most have an address space two to the size-of-register-in-bits power.

As has been said more times than I care to count, the 68000s registers are
32 bits long but 32 bit arithmetic is less efficient than 16 bit arithmetic.
I think that this is unfortunate, but it's a fact of life.  There are good
things to be said for 16-bit "int"s on a 68000.

>    This sort of machine is an inhospitable host for the C language and
>    some implementations are downright kluges. I claim that they don't
>    run C but another language I would call `C--'.

You aren't the arbiter of the C language; if you want to hold that opinion
you're welcome to it, but I suspect most people wouldn't agree.  UNIX runs
on the Sperry 1100; if users of UNIX on that machine (or other putatively
"inhospitable" machines) have any comments on that point, I'd like to hear
them.

> -) While you are claiming that it is MY CODING PRACTICES (and evidently
>    hordes of others, including 4.2bsd & sys III implementors) that are
>    nonportable, I am claiming that it is THOSE WEIRD MACHINES that are
>    nonportable. By changing the rules in the middle of the game, you
>    are depriving me (and others) of the time honored tradition of punning.

Aside from any semantic quibbles about the meaning of "nonportable", I
object to the reference to the "time honored tradition of punning".
Lots of traditions, like self-modifying code, were "time-honored" in the
days of small slow machines which "needed" that sort of stuff.  I can
get away without punning 99.9% of the time; the other .1% of code can
be "#ifdef"ed, or written in assembly language, or...

> -) I still maintain that assigning zero to a pointer stores an unspecified
>    number of zero bits.

Maintain what you will, the *language spec*, such as it is, says no such
thing.  Your statement is merely a statement of preference, which people
are at leisure to ignore.

> The null ptr is an out-of-band value. We agreed to represent it in-bound.

Who's "we"?  On many machines, there *is* no out-of-band value.  On the
VAX, 0xffffffff is arguably an out-of-band value, while on most UNIXes
on the VAX 0x0 is an in-band value.  On other machines, there *is* an
out-of-band value, specified by the architectural spec as "how to represent
a null pointer", and it need not consist of N 0 bits.

> Still, a piece  of kernel code should be able to pick up the byte at
> address zero by:
> 		int j; char *p; p = 0; j = *p;
>    Allowing any other value to actually be stored breaks this.

However, it doesn't break

	int j; char *p; j = 0; p = j; j = *p;

Admittedly, this is slightly less efficient, but the number of times when
you execute code that is intended *only* to fetch the contents of location
0 (as opposed to code that fetches the contents of an arbitrary location;

	peek(addr)
	int addr;
	{
		return(*(char *)addr);
	}

even works if you say "j = peek(0)") is very small.

> Besides, SHOW ME A C THAT USES ANYTHING OTHER THAN ZERO ON ANY MACHINE!!!

Hello?  Anybody from the Lawrence Livermore Labs S-1 project out there?
Don't you have a special bit pattern for the null pointer?

>    K&R says of pointer to integer conversions: "The mapping funxion is
>    also machine dependent, but is intended to be unsurprising to those
>    who know the machine." I would be surprised at nonzero null ptrs.

A subtle point; given a "char *" variable "p", the statement

	p = 0;

is different in character from both the statements

	p = 1;

and the statements

	i = 0;
	p = i;

given an "int" variable "i".  Arguably, this is confusing and a mistake, but
it is the clearest (and, probably, only correct) interpretation of what
K&R says on the subject.  The latter two sets of statements do this
particular mapping; the former one is a special case which shoves a null
character pointer into "p".  The mapping function in the third set of
statements is unsurprising.  If I ran the zoo, there would have been a
special keyword "nil" or "null", and THAT would have been the way to
specify null pointers; 50% of all these discussions wouldn't have occurred
if that was done.  Unfortunately, it's too late for that.

> -) Guy: if I want the square root of four, I do sqrt(4.0); NO CAST!

That's because the C language has a way of representing floating-point
constants directly.  It doesn't have a way of representing null pointers
directly; instead, it has a sneaky language rule that says the symbol
"0", when used in conjunction with a cast to a pointer or an expression
involving pointers, is interpreted as a null pointer of the appropriate
type.  If there were, say, a null pointer operator like "sizeof", like

	null(char *)

you could pass null(char *) to a routine.  Alternatively, if the language
had permitted you to declare the types of the arguments to a function
since Day 1, calling a function which expects a "char *" as an argument
would be an expression involving pointers and the 0 (or "nil" or "null")
would be interpreted as a null pointer to no character.

> -) How many of you port what percentage of programs? I thought the
>    intent of the standard was to not break existing programs. I claim
>    that the standard should recognize the existing idioms.

No, the intent of the standard is not to break existing *correct*
programs.  There exist programs, written by people at, among other places,
a certain large West Coast university, which assume that location 0
contains a null string (although that crap seems to have disappeared as
of 4.2BSD).  Does this mean that all implementations of C must map
location 0 into the address space and must put a zero byte there?

"=+" was a legal part of the language once.  It has now disappeared; the
System V compiler now only accepts "+=".  More and more programs are properly
declaring functions, casting pointers, etc..  As such, I see no point in
supporting the passing of undecorated 0s to functions whose argument types
are undeclared as the passing of a null pointer.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

kpmartin@watmath.UUCP (Kevin Martin) (02/10/85)

In article <8121@brl-tgr.ARPA> cottrell@nbs-vms.ARPA writes:
>-) I apologize for calling (presumably someone's favorite) machines `weird'
>   or `braindamaged'. Let's say `less capable'.
Let's say 'capable of doing things other than you want done'.
I doubt that a pianist would like being called 'weird', 'braindamaged',
or even 'less capable' just because you happen to want to hear her play a
trombone.

>   Allowing any other value to actually be stored breaks this. Besides,
>   SHOW ME A C THAT USES ANYTHING OTHER THAN ZERO ON ANY MACHINE!!!
>   K&R says of pointer to integer conversions: "The mapping funxion is
>   also machine dependent, but is intended to be unsurprising to those
>   who know the machine." I would be surprised at nonzero null ptrs.
Then you obviously don't know the machine.
A Honeywell DPS8, running CP-6, uses the bit pattern 06014 (36 bits)
as the null pointer (byte offset zero in segment number 014).
The pointer to int casts (and back) are accomplished by ex-oring
with this value.
The value 0 CAN'T be used, since this is a valid pointer.

>-) How many of you port what percentage of programs? I thought the
>   intent of the standard was to not break existing programs.
I don't think it does break programs (except those which use =<op>
and initializers without the '='). Your programs will continue
to run on systems on which they previously ran. You may get
more warnings when you compile them, though.

                     Kevin Martin, UofW Software Development Group

jdb@mordor.UUCP (John Bruner) (02/10/85)

> Hello?  Anybody from the Lawrence Livermore Labs S-1 project out there?
> Don't you have a special bit pattern for the null pointer?

I had prepared this reply with the intention of avoiding a reference
to the S-1 Mark IIA.  I've mentioned it several times recently and
I wondered if people are getting tired of hearing about it.  However,
since you asked, the answer is YES.  Our machine has a 36-bit word
and a 31-bit virtual address space.  The 5 high-order bits of a
pointer constitute its "tag" which specifies an assortment of
things.  Two values, 0 and 31, are invalid tags.  An attempt to
use a pointer manipulation instruction on words containing these
tags will cause a trap.  (This allows easy detection of indirection
through integers if the integer is in the range -2**31..2**31-1.)
A tag of 2 indicates a NIL pointer, which can be copied but not
dereferenced.

There are two operating system projects here.  Amber, which is based
quite a bit on the MULTICS model, is written in Pastel and uses the
NIL pointer tag.  The other operating system is UNIX.  After a lot
of grief because the tide of sloppy programs was too great, we decided
to hack the microcode to allow 0-tagged pointers and use an integer
zero as our NULL pointer.  (There is a special microcode patch which
must be applied before we boot UNIX.) We all regard this as WRONG, WRONG,
WRONG, WRONG, WRONG, WRONG.  It means that C and Pastel cannot easily
share data structures, and it defeats a lot of the useful hardware
type checking.  We hope to develop a C front-end for our Pastel
compiler so that C programs which run under Amber can use the
NIL pointer properly.

(int)0 vs. (int *)0 has become a very sore point with me for this
reason.  I am firmly convinced that they are NOT the same and I
am unhappy that we had to contort our implementation to match an
assumption that is valid on simpler architectures.


Now, for the reply I just finished editing:

Although it is tempting to comment on the assertion that machines
which differ from a PDP-11 or a VAX are "less capable", I'm not
going to respond to that in this posting.  Instead, I'd like to
take the notion of "portability" in terms of "changing the rules
in the middle of the game" a little bit further.  Instead of
starting with C under VAX UNIX, however, I want to start with the
oldest C that I'm familiar with: the C compiler that came with
the Sixth Edition of UNIX.  (I trust that any unwarranted assumptions
that I make about C based upon the Sixth Edition can be corrected
by others who have used even earlier versions.)

Let me cite from the C Reference Manual for the Sixth Edition:

  2.3.2 Character constants [third paragraph]
  
  Character constants behave exactly like integers (not, in particular,
  like objects of character type).  In conformity with the addressing
  structure of the PDP-11, a charcter constant of length 1 has the code
  for the given character in the low-order byte and 0 in the high-order
  byte; a character constant of length 2 has the code for the first
  character in the low byte and that for the second character in the
  high-order byte.  Character constants with more than one character
  are inherently machine-dependent and should be avoided.

Nonetheless, programs used multi-character constants.  One in
particular that I'm very familiar with was APL\11.  [The author of
APL\11 was Ken Thompson (a.k.a. "/usr/sys/ken"), who I think we
can agree is rather knowledgable about C and UNIX.]  Unfortunately,
PCC generated two-character character constants in the opposite
order than Ritchie's CC.  The manual doesn't say that the results
are compiler-dependent, so one should expect them to be the same
for both compilers on the same machine.  Hence, PCC (and thus the VAX
"cc") is nonportable.  (The first time I tried to move APL from a
V6 PDP-11 to a 32/V VAX I had to find and fix 800 "new" errors.)


It is interesting that the assertion has been raised that -1 has
always been the standard error return.  Here's a simple program
for copying standard input to standard output, from "Programming
in C -- A Tutorial", section 7.
	
	main() {
		char c;
		while( (c = getchar()) != '\0' )
			putchar(c);
	}

This worked before the Standard I/O library "broke" it.  getchar()
used to return '\0' on EOF.  Also, programs which had previously used
the old I/O library (with "fin" and "fout" -- anyone remember what

	fout = dup(1);

did?) or the old Portable C library had to be changed to
accommodate STDIO.  I guess STDIO is nonportable too.


Several programs that I worked with assumed that integers were
two bytes long.  I guess the VAX is nonportable.


[Gee, this is fun!]  Back in V6 there was no "/usr/include" --
you had to code the definitions of the system structures directly
in your program or hunt down the kernel include files.  The
advent of "/usr/include" and the changes in the system calls
broke several programs that coded these things directly.  I
guess even V7 is nonportable.


Then, of course, there are the totally unnecessary additions to
C when it was hacked up for the phototypesetter 7 release.  To
take one example, consider "unsigned".  Who needs "unsigned"?
V6 was written without it -- if you needed an unsigned integer
you could always use a character pointer.  And I, for one, was
quite happy to put a "#" on the first line of my C program if
[if!] there were any #include or #define statements in my program.
[Actually, sometimes I still do this, just to be obstinate!]


[I think I'm getting carried away.  Time to come back to earth.]
As C has developed, it has provided more and more facilities for
approaching problems in an abstract, machine-independent way.
I for one applaud this growth.  I *want* to plan my programs
carefully, think about the issues involved, and have utilities
like "lint" tell me when I'm being careless.  I want to be able
to move my programs to new machines without having to rewrite
them.  As much as I like PDP-11's, I no longer use them (at
least, not with UNIX).  Eventually I'll log off of a VAX for the
last time.  Computer architectures are changing, and someday even
the assumption of a classical von Neumann architecture will be
invalid.  (This is already true for some machines.)  If C continues
to evolve, when that day comes C may still be around (in some form).
I am certain that if it sticks to a rigid PDP/VAX view of the
world it will be left behind in the dust.
-- 
  John Bruner (S-1 Project, Lawrence Livermore National Laboratory)
  MILNET: jdb@mordor.ARPA [jdb@s1-c]	(415) 422-0758
  UUCP: ...!ucbvax!dual!mordor!jdb 	...!decvax!decwrl!mordor!jdb