[comp.lang.c] Why should "sizeof" be unsigned?

hokey@plus5.UUCP (Hokey) (06/18/87)

Is there a good or overriding reason why the sizeof operator must return
an explicitly unsigned value?

We have an application which stores strings in a "local" array if the strings
are small enough, otherwise it malloc()s space and saves the string in the
malloc()ed space.  In any event, we keep track of the string length.

There are 3 interesting cases: no string, a zero-length string, and >0 length
strings.  We thought it would be a swell idea to denote the "no string" case
with a length counter of -1.  Normally, this is a neat idea.  However, there
are cases in which we only want to mess around with strings which are bigger
than the local array size.  For these cases, we  can not use the expression

	length > sizeof local_string_buffer

because the -1 length (no string available) becomes slightly larger than the
number of bytes in the local string buffer when converted to unsigned.

For what it is worth:

	This is happening in a speed-critical section of code.
	We are very thorough and very lazy.
	We *could* add 1 to all our lengths and solve this problem (and
	create some others).
	I can think of several other ways to "get around" the problem
	(presently, we cast most sizeof operators to int).

None of these issues are key - it seems to me having sizeof return an
explicitly unsigned value violates "the principle of least astonishment".

One of the primary uses of sizeof is to make code more readable, portable, and
maintainable.  If I am on a machine with 4 byte character pointers and I want
to write nonportable code by not using sizeof(char *), I am more likely to use
"4" than "4U" (which will only work on "new" compilers anyway).

Then again, I think the size_t is pretty useless and that the proposed C
standard contains far too many typedefs that work against the good programmer.
-- 
Hokey

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/19/87)

In article <1748@plus5.UUCP> hokey@plus5.UUCP (Hokey) writes:
>None of these issues are key - it seems to me having sizeof return an
>explicitly unsigned value violates "the principle of least astonishment".

(a) AT&T "sizeof" has been this way for many years now.
(b) Since sizeof(thing) inherently cannot be negative, an unsigned
integer value for the sizeof operator seems exactly right.
(c) Your example wasn't very convincing (to me at least).

karl@haddock.UUCP (Karl Heuer) (06/19/87)

In article <1748@plus5.UUCP> hokey@plus5.UUCP (Hokey) writes:
>[In our application] there are 3 interesting cases: no string, a zero-length
>string, and >0 length strings.  We thought it would be a swell idea to denote
>the "no string" case with a length counter of -1.  [But we sometimes want to
>make the test] "length > sizeof local_string_buffer" [which causes problems
>because the special-case value of length is coerced to (unsigned)-1].  It
>seems to me having sizeof return an explicitly unsigned value violates "the
>principle of least astonishment".

I think the crux of the problem is that your variable "length" is logically a
union of "size_t" and a single out-of-band value which has nothing to do with
sizes.  It is not all that astonishing that you get incorrect results if you
use an untested variable containing the OOB value as if it were a size_t.

To answer the title question, size_t is unsigned because (a) it's always
nonnegative, and (b) the corresponding signed datatype may not be wide enough.
Having it be unsigned only on those machines where it's necessary would cause
even more astonishment when trying to port code.

>Then again, I think the size_t is pretty useless and that the proposed C
>standard contains far too many typedefs that work against the good programmer.

If size_t were not part of the standard, what type would you use for, say, the
argument to malloc()?

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

wesommer@athena.mit.edu (William Sommerfeld) (06/21/87)

In article <6001@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>In article <1748@plus5.UUCP> hokey@plus5.UUCP (Hokey) writes:
>>None of these issues are key - it seems to me having sizeof return an
>>explicitly unsigned value violates "the principle of least astonishment".
>
>(a) AT&T "sizeof" has been this way for many years now.
But Berkeley's hasn't.

>(b) Since sizeof(thing) inherently cannot be negative, an unsigned
>integer value for the sizeof operator seems exactly right.

This can run into problems in existing code..  For example, a couple
of months ago, I spent several hours helping someone track down what
appeared to be a DBM bug on the IBM RT/PC under IBM's BSD port.

I finally tracked it down to the following statement:

	if (i1 <= (i2+3) * sizeof(short))
		return (0);

(in additem(), in lib/libc/gen/ndbm.c).  By examining the assembler
output by the compiler, I determined that it was probably generating
an unsigned compare for that.  (I say "probably" because I can't read
RT assembler very well), and the behavior of the program was as if an
unsigned test was being generated there.

It turns out that the compiler was an ANSI C compiler.  I mistakenly
reported it as a compiler bug, and was corrected on it by someone
who had actually done work on an ANSI C interpreter.

					Bill Sommerfeld
					wesommer@athena.mit.edu

hokey@plus5.UUCP (Hokey) (06/25/87)

sizeof is an operator, not a function.

It doesn't "return" something in the classic sense.

I see no reason to have this operator have a "side effect" of an explicit
cast of its "value", especially when this cast is hidden and arguably
unnecessary.

One can write subroutines like malloc without a size_t.  Use function
prototypes.  This has the added benefit of permitting the compiler to
warn you if you will loose precision as part of the cast.

If one wishes, one could say our desire to treat a "length" counter of -1
is out of band data.  Using that logic, a nil pointer is out of band data, too.

Perhaps we should get rid of the nil pointer, and use a structure of a flag
value and a pointer, and only use the pointer if the flag value is true.
-- 
Hokey

mpl@sfsup.UUCP (M.P.Lindner) (06/26/87)

In article <1750@plus5.UUCP>, hokey@plus5.UUCP writes:
> sizeof is an operator, not a function.
	correct, go on... 
> It doesn't "return" something in the classic sense.
	au contrare, operators return values, otherwise 1+1 wouldn't be 2.
	However, if you mean they don't have an explicit "return" statement,
	granted (although this is not relevant).
> I see no reason to have this operator have a "side effect" of an explicit
> cast of its "value", especially when this cast is hidden and arguably
> unnecessary.
	No "side effect" is necessary.  Just as a == b returns an int regardless
	of the type of its operands, sizeof returns an unsigned.  No cast is
	involved, just as you don't have to say (int) (a == b).
> One can write subroutines like malloc without a size_t.  Use function
> prototypes.  This has the added benefit of permitting the compiler to
> warn you if you will loose precision as part of the cast.
	Right on! 
> If one wishes, one could say our desire to treat a "length" counter of -1
> is out of band data.  Using that logic, a nil pointer is out of band data, too.
	The fact is, a "length" of -1 is not just out of band data, it's data
	that can't be held by the type of the sizeof operator.  "nil" is out of
	band, but can still be represented in a pointer.  For an analogy, "not
	a number" is out of band for a float, but is still representable.
> Perhaps we should get rid of the nil pointer, and use a structure of a flag
> value and a pointer, and only use the pointer if the flag value is true.
> -- 
> Hokey
	I'm not sure I follow this, but I will assume it's sarcasm.

guy%gorodish@Sun.COM (Guy Harris) (06/27/87)

> 	The fact is, a "length" of -1 is not just out of band data, it's data
> 	that can't be held by the type of the sizeof operator.

You're missing the point.

The fact is, there are C compilers where the type of "sizeof" is
"int", not "unsigned int", and thus -1 is NOT "data that can't be
held by the type of the 'sizeof' operator".  (See the document "The C
Enviroment of UNIX/TS", supplied with the System III documentation.
It states:

	3.2.2 Unsigned numbers

	The value returned by "sizeof" is now "unsigned" rather than
	"int", so care must be exercised in the use of "sizeof" in a
	few strange cases.

I seem to remember seeing this change mentioned elsewhere, but I
don't remember where.  I don't know when this change was made; was it
made in V7 or afterwards?  The odd thing is that the 4BSD C compiler
is based on the System III VAX C compiler, but has "sizeof" yield an
"int".  I don't know if 1) the VAX C compiler hadn't been changed as
of System III, 2) the change antedated V7, and Berkeley changed the
compiler back to the V7 rules for backward compatiblity, or 3)
something else happened.  I seem to remember seeing *something* about
such a change in or before V7, but I also seem not to remember
noticing the type of "sizeof" differing between the System III and
4BSD C compiler.)

The complaint being made is that having "sizeof" yield a value of
type "unsigned int", rather than "int", precludes having -1 as an
out-of-band value for routines that normally take a "sizeof".  Saying
"'sizeof' yields a value of type 'unsigned int', so you can't use -1
as an out-of-band value anyway" doesn't argue that the complaint is
invalid, it just points out why the complaint is being made in the
first place!  (I have no strong opinion either way on this.  I merely
point out that "the type of 'sizeof' is 'unsigned int'" is not, by
itself, a valid argument against the proposition that it should be
"int".)
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

franka@mmintl.UUCP (Frank Adams) (07/02/87)

In article <22242@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
|The complaint being made is that having "sizeof" yield a value of
|type "unsigned int", rather than "int", precludes having -1 as an
|out-of-band value for routines that normally take a "sizeof".  Saying
|"'sizeof' yields a value of type 'unsigned int', so you can't use -1
|as an out-of-band value anyway" doesn't argue that the complaint is
|invalid, it just points out why the complaint is being made in the
|first place!  (I have no strong opinion either way on this.  I merely
|point out that "the type of 'sizeof' is 'unsigned int'" is not, by
|itself, a valid argument against the proposition that it should be
|"int".)

The point nobody seemed to notice is that on some machines, sizeof *has* to
be unsigned -- "int" isn't big enough.  Thus code which uses -1 as an out-
of-band value in an integer holding a sizeof result is not portable.  A
standardization which causes non-portable code to no longer compile is a
good thing.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108