[net.lang.c] if sizeof

ok@edai.UUCP (Richard O'Keefe) (03/13/84)

     The question about passing NULL as a parameter having been settled,
(though on p163 of the Kernighan & Ritchie book you will find two
instances of NULL being passed where (char*) was expected), there
remains another problem with the size of pointers versus the size of
'int'.  The program I was trying to make portable has a function
something like
	structp consVect(n, p)
	    long n;
	    structp p[];
	    /* build an object containing n things */
which is called in several places:
	long n = ...;
	p = consVect(n, table);
which works fine, and also:
	q = table;
	while (...) *q++ = ...;
	p = consVect(q-table, table);
WHICH LINT REJECTS.  The reason is that I declared the argument "n" to
be 'long' (which it can be on a 32-bit machine, and if the compiler
treats int as 16 bits I don't want a structure with 100,000 elements
mysteriously truncated).  However, Lint is convinced that
		pointer-pointer='int'.

     This time I decided to read Kernighan and Ritchie with the utmost
care myself before going off half-cocked, and I found that Lint has
The Book on its side: page 189 (in the "C Reference Manual, 7.4") says

	If two pointers to objects of the same type are subtracted, the
	result is converted (by division by the length of the object) to
	an INT representing the number of objects separating the pointed
	to objects.  This conversion will in general give unexpected
	results unless the pointers point to objects in the same array,
	since pointers, even to objects of the same type, do not
	necessarily differ by a multiple of the object-length.

(As an aside: while K&R p98 explicitly permits p==0 and p!=0, I can find
nothing in K&R which requires p>0 or p-(char*)0 to be defined, where
char *p;)

     There would appear to be two courses of action open to someone who
wants his C compiler for a 32-bit machine to conform to K&R.

EITHER make sizeof (int) == 2 AND ALSO make sure that no array can
contain more than 32,767 bytes (that was 'int' remember, not 'unsigned
int').  This is clearly silly.

OR make sizeof (int) big enough to hold the largest possible difference
between two char* pointers into the user's address space.  Thus if the
user's address space is 16 Mbytes, sizeof (int) has to be 4 (3 won't do,
the difference has to be POSITIVE).

     If a program is to be moved from a 16-bit machine to a 32-bit
machine, obviously differences between pointers will have been 'int'
rather than 'long', and by the book this is guaranteed safe on any
machine!  There really doesn't seem to be any way of escaping from the
conclusion that sizeof (int) == sizeof (long*).

     On pp 182-183 we find the folllowing:

	Up to three sizes of integer, declared SHORT INT, INT, and LONG
	INT, are available.  Longer integers provide no less storage
	than shorter ones, but the implementation may make either short
	integers, or long integers, or both, equivalent to plain
	integers.  >>>"Plain" integers have the natural size suggested
	by the host machine architecture; the other sizes are provided
	to meet special needs.<<<	(Emphasis mine.)

No explicit definition of "suggested by ..." is given, so we turn to
the table at the top of p182.  Now that only describes the PDP-11,
Honeywell 6000, IBM 370, and Interdata 8/32 implementations, and it
doesn't say what size a pointer is.  We can still draw the conclusion
that in this "approved" list,
	"int" has at least as many bits as a machine address.
I propose that the definition of "plain" integers should be read as

	"Plain" integers (declared by the type 'int' without modifiers)
	are the smallest size fully supported by the host architecture
	which is sufficient to represent the difference between any two
	addresses in the user's data space.  If no integer size is
	fully supported by the architecture, the smallest size supporting
	addition, subtraction, arithmetic shifts, and logical operations
	which is sufficient to represent the difference between any two
	addresses in the user's data space.

[The second sentence is meant for machines which don't fully support
multiplication and division; given the operations listed you can program
them.]

     This still leaves Lint entitled to complain about my predecessor's
passing NULL instead of (foo*)0.  While int must have enough bits to
represent any pointer, pointers are not required to be compactly
encoded.  (The lamentable PR1ME shows this.  A character pointer on the
P400 is represented by 48 bits, but one 16-bit word just holds a byte
number (i.e. 1 bit), and one bit in the remaining 32 just says whether
or not the pointer is thought to be valid (for omitted arguments), so
32 bits is all you need on the PR1ME.)  Lint is entitled to complain for
another reason too.  'int' could be LONGER than a pointer!  Oh well,
I've put casts around all the NULLs anyway.

tjt@kobold.UUCP (03/15/84)

edai!ok cites section 7.4 of the "C Reference Manual" to show that
subtraction of two pointers yields an int (not an integer, or a long,
but explicitly an int).  He is correct that the pointer result of
pointer subtraction is a signed quantity (&p[0] - &p[0] should equal
-1), but incorrectly asserts that the compiler should therefore require
that no array contain more 32767 bytes (assuming 16-bit int's).  Since
pointer subtraction scales the result, the compiler would have to limit
you 32767 elements in an array, since only pointer subtraction between
two pointers in the same way is meaningful.  This is clearly an
undesirable restriction, but not any sillier than numerous machine
architectures that make it difficult to have arbitrarily large arrays.
The 8086 with 64K byte segments is one example of this.  At the other
extreme, the early Multics hardware (and as far as I know, the current
hardware) limits segments to 256K words.

Of course, like any good standard, the C reference manual says
conflicting things about anything even remotely controversial. :-)

Discussing pointers and integers in section 6.4:

	An integer or long integer may be added to or subtracted
	from a pointer; in such a case the first is converted as
	specified in the discussion of the addition operator.

	Two points to objects of the same type may be subtracted;
	in this case the result is converted to an integer as
	specified in the discussion of the subraction operator.

This discussion occurs in section 7.4 (the discussion of subtraction
was already quoted by edai!ok).  It doesn't say what is supposed to
happen if the value in a long cannot be represented as an int, so
accepting long's in pointer addition may be just a courtesy measure to
save the user from an explicit cast to int.

Finally, in section 14.4 (Explicit poiner conversions) we find:

	A pointer may be converted to any of the integral types
	large enough to hold it.  Whether an int or long is
	required is machine dependent.  The mapping function is
	also machine dependent, but is intended to be
	unsurprising to those who know the addressing structure
	of the machine.  Details for some particular machines
	are given below.

This is a direct contradiction of edai!ok's claim, and is the paragraph
normally used to justify e.g. 16 bit int's and 32-bit pointers.  Since
14.4 is in conflict with 7.4, one or the other must be revised.  I
think most people would prefer to see 7.4 changed to make the result of
pointer subtraction machine dependent.

By the way, although I think edai!ok is wrong with respect to what is
stated in the C reference manual, I absolutely agree that any compiler
ought to make int's large enough to hold any pointer unless the
performance penalties are prohibitive.  Even though this assumption is
not justified by the reference manual, it is made by too much C
programs to change easily.  This is why most C compilers for the
Motorola 68000 use 32-bit int's even though using 16-bit int's would
speed up most programs substantially (~ 50%, I think).  In this case,
portability wins out over efficiency.  Besides, Motorola will be coming
out with a 32-bit 68000 pretty soon, so why worry? :-)

A third option was posted recently that I am sure has occurred to many
people independently, namely have *two* compilers -- one with long
int's for portability, and one with short int's for speed.


-- 
	Tom Teixeira,  Massachusetts Computer Corporation.  Westford MA
	...!{ihnp4,harpo,decvax}!masscomp!tjt   (617) 692-6200 x275