[comp.std.c] size_t

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (06/19/89)

What exactly does pANS say about size_t?  In Turbo C 2.0 it is
defined as unsigned int in all memory models, yet in huge model
array indices are long.  Is this a bug?  
(If not, what is size_t good for, anyway?)
-- 
Tapani Tarvainen                 BitNet:    tarvainen@finjyu
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi

dfp@cbnewsl.ATT.COM (david.f.prosser) (06/20/89)

In article <934@tukki.jyu.fi> tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>What exactly does pANS say about size_t?  In Turbo C 2.0 it is
>defined as unsigned int in all memory models, yet in huge model
>array indices are long.  Is this a bug?  
>(If not, what is size_t good for, anyway?)
>-- 
>Tapani Tarvainen                 BitNet:    tarvainen@finjyu
>Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi

Section 4.1.5 of the pANS:

	The types [defined in <stddef.h>] are ...

		size_t

	which is the unsigned integral type of the result of the sizeof
	operator ...

It is also defined in <stdio.h>, <stdlib.h>, <string.h> and <time.h>.

Dave Prosser	...not an official X3J11 answer...

karl@haddock.ima.isc.com (Karl Heuer) (06/20/89)

In article <934@tukki.jyu.fi> tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>What exactly does pANS say about size_t?

It says (among other things) that |size_t| is big enough to hold the size of
the largest declarable object.

>In Turbo C 2.0 it is defined as unsigned int in all memory models, yet in
>huge model array indices are long.  Is this a bug?

Yes.  In huge model, |size_t| should be |unsigned long int|, and |ptrdiff_t|
should be |long int|.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

manderso@ugly.cs.ubc.ca (mark c anderson) (06/20/89)

In article <934@tukki.jyu.fi> tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>What exactly does pANS say about size_t?  In Turbo C 2.0 it is
>defined as unsigned int in all memory models, yet in huge model
>array indices are long.  Is this a bug?  

As has already been noted, size_t is defined as the "unsigned integral type
of the result of the sizeof operator", i.e. unsigned int (at least in this
case).

I'm not sure how Turbo C handles the huge memory model, but I was interested
to read how Microsoft deals with it:  if you cast the result of a sizeof
operation on a huge array to unsigned long, the correct result is produced.

A similar extension allows you to cast the result of a pointer-difference
operation on huge pointers to long, and get the desired result.

i.e.
	char huge *p, *q;
	long size;
	...
	size = (long) ( p - q );
---
Mark Anderson <manderso@ugly.cs.ubc.ca>
{att!alberta,uw-beaver,uunet}!ubc-cs!{good,bad,ugly}!manderso
Am I suspended in Gaffa?

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (06/20/89)

In article <845@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:
>Section 4.1.5 of the pANS:
>
>	The types [defined in <stddef.h>] are ...
>
>		size_t
>
>	which is the unsigned integral type of the result of the sizeof
>	operator ...
>
>It is also defined in <stdio.h>, <stdlib.h>, <string.h> and <time.h>.

If this is all pANS says about it, TurboC's behaviour is legal.
As static objects can't exceed 64K, sizeof result will always
fit in unsigned int; but in huge model one can have a dynamic
array with more than 64K elements.  Therefore:

What should be done when one has an array whose indices may not fit
in an int?  Is there a suitable type for that in pANS?

If I use long, TC will warn that "Conversion may lose significant digits"
every time in models other than huge, not to mention that it is waste
of resources, but in huge model int won't do.
Suggestions, anyone?

(BTW, thank you David for informative answers, even though the info on
realloc was somewhat unfortunate: it seems I'll need an extra variable
for every pointer to the buffer that is reallocated.)

-- 
Tapani Tarvainen                 BitNet:    tarvainen@finjyu
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi

geoff@cs.warwick.ac.uk (Geoff Rimmer) (06/21/89)

In article <845@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:
> Section 4.1.5 of the pANS:
> 
> 	The types [defined in <stddef.h>] are ...
> 
> 		size_t
> 
> 	which is the unsigned integral type of the result of the sizeof
> 	operator ...
> 
> It is also defined in <stdio.h>, <stdlib.h>, <string.h> and <time.h>.

Does this mean that size_t should be a #define rather than a typedef?
If it were a typedef, and I #include <stdio.h> AND <stdlib.h> (which
is a perfectly reasonable thing to do!), I would get errors.

> Dave Prosser	...not an official X3J11 answer...

	/---------------------------------------------------------------\
	|	GEOFF RIMMER						|
	|	email	: geoff@cs.warwick.ac.uk			|
	|		  geoff@uk.ac.warwick.cs			|
	|	address : Computer Science Dept, Warwick University, 	|
	|		  Coventry, England.				|
	|	PHONE	: +44 203 692320				|
	|	FAX	: +44 865 726753				|
	\---------------------------------------------------------------/

"First one I've had in twenty years and I won't be here to enjoy it."
	- Filthy, "Filthy Rich and Catflap" (best comedy series EVER)

karl@haddock.ima.isc.com (Karl Heuer) (06/27/89)

In article <2284@ubc-cs.UUCP> manderso@ugly.cs.ubc.ca (mark c anderson) writes:
>size_t is defined as the "unsigned integral type of the result of the sizeof
>operator", i.e. unsigned int (at least in this case).

Which just means that (in the case of TC huge model) both |size_t| and
|sizeof| are wrong.  If the compiler allows you to create object larger than
65535 bytes, then |size_t| should not be a 16-bit type.

>I'm not sure how Turbo C handles the huge memory model, but I was interested
>to read how Microsoft deals with it:  if you cast the result of a sizeof
>operation on a huge array to unsigned long, the correct result is produced.

A hollow voice says, "Kludgh".

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

gwyn@smoke.BRL.MIL (Doug Gwyn) (06/28/89)

In article <GEOFF.89Jun21011005@onyx.cs.warwick.ac.uk> geoff@cs.warwick.ac.uk (Geoff Rimmer) writes:
>Does this mean that size_t should be a #define rather than a typedef?

No, size_t must be a genuine type name.

>If it were a typedef, and I #include <stdio.h> AND <stdlib.h> (which
>is a perfectly reasonable thing to do!), I would get errors.

It is the implementor's job to make sure that there is no such problem.
I think it makes an interesting exercise to figure out how this can be
implemented.

dfp@cbnewsl.ATT.COM (david.f.prosser) (07/01/89)

In article <941@tukki.jyu.fi> tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>In article <845@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:
>>Section 4.1.5 of the pANS:
>>
>>	The types [defined in <stddef.h>] are ...
>>
>>		size_t
>>
>>	which is the unsigned integral type of the result of the sizeof
>>	operator ...
>>
>>It is also defined in <stdio.h>, <stdlib.h>, <string.h> and <time.h>.
>
>If this is all pANS says about it, TurboC's behaviour is legal.
>As static objects can't exceed 64K, sizeof result will always
>fit in unsigned int; but in huge model one can have a dynamic
>array with more than 64K elements.

That is essentially all that the pANS says directly about size_t, but
that doesn't mean that you can draw the following conclusions.  The
sizeof operator must be able to express the size, in bytes, of any
object created by a strictly conforming program.  If such a program can
allocate an object at runtime through malloc, calloc, or realloc that
is too big for the sizeof operator, then the implementation is not
conforming.  Since the constraints for size_t are the same as for
sizeof (at least in terms of representation), size_t must be big enough
to hold the number of bytes of any validly created object.

>Therefore:
>
>What should be done when one has an array whose indices may not fit
>in an int?  Is there a suitable type for that in pANS?

By the pANS, size_t should work, for any strictly conforming program.
Of course, if there is a "hugealloc()" function provided which is the
only access to objects that are bigger than what sizeof or size_t can
describe, this is still a conforming implementation.  If a program
makes use of such a function, then a larger than size_t integral type
would be necessary.

>
>If I use long, TC will warn that "Conversion may lose significant digits"
>every time in models other than huge, not to mention that it is waste
>of resources, but in huge model int won't do.
>Suggestions, anyone?

I, personally, dislike compilers that try to be "lint" at the same time,
but that may be my UNIX system biases showing through.  A conversion of
a larger integral type to a smaller unsigned type (such as size_t) is
well defined, even if the value being converted doesn't "fit".  It may
be that TC will be quiet if you put an explicit cast on the assignments.

>
>(BTW, thank you David for informative answers, even though the info on
>realloc was somewhat unfortunate: it seems I'll need an extra variable
>for every pointer to the buffer that is reallocated.)

I've found that often it is better to use pointers to such buffers only
in local situations and to keep only offsets in the shared (file scope)
parts.  When a buffer is then realloc()ed, there are no readjustments of
a slew of pointers.

Another favorite approach is to determine, if possible, the maximum
extent necessary for the buffer based on early information (for example,
by knowing the size of the input file), and never needing to grow the
buffer at all.  This can save a lot of program complexity.

>
>
>-- 
>Tapani Tarvainen                 BitNet:    tarvainen@finjyu
>Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi

Dave Prosser	...not an official X3J11 answer...

dfp@cbnewsl.ATT.COM (david.f.prosser) (07/06/89)

In article <971@tukki.jyu.fi> tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>Something related which I would call a bug is the behaviour of
>calloc() that e.g., calloc(1000,1000) won't give an error or NULL but
>silently truncates the product to 16960 (== 1000000 && 0x0ffff) and
>allocates that amount.  What does the pANS say about overflow handling
>in this situation?
>-- 
>Tapani Tarvainen                 BitNet:    tarvainen@finjyu
>Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi

There is a general statement in section 4.1.6 for the arguments to the
library functions.  It allows undefined behavior in the library if a
function is passed arguments with invalid values, or values outside of
the function's domain.  Since calloc() must produce an object with no
more bytes than can be counted in a size_t, a pair of arguments that,
while individually are valid, cannot be multiplied and produce a result
that fits in a size_t, will result in undefined behavior for calloc().
If there were some special part of calloc()'s description that
constrained the function to handle this case, the behavior would be
otherwise.

Dave Prosser	...not an official X3J11 answer...

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (07/11/89)

In article <1003@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:
>In article <971@tukki.jyu.fi> tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>>Something related which I would call a bug is the behaviour of
>>calloc() that e.g., calloc(1000,1000) won't give an error or NULL but
>>silently truncates the product to 16960 (== 1000000 && 0x0ffff) and
>>allocates that amount.  What does the pANS say about overflow handling
>>in this situation?
>>-- 
>
>There is a general statement in section 4.1.6 for the arguments to the
>library functions.  It allows undefined behavior in the library if a
>function is passed arguments with invalid values, or values outside of
>the function's domain.  

True but not particularly relevant (me thinks) since each argument _is_
valid and well within (I hope!) the domain that size_t can handle.

>Since calloc() must produce an object with no more bytes than can be 
>counted in a size_t, 

Care to quote the relevant sentence? Doesn't seem to have made it into my
Jan 88 draft :-).

Quoting from the Mark Williams Ansi C - A lexical guide:

calloc allocates a portion of memory large enough to hold count items,
each of which is size bytes long. It then initializes every byte within
the portion to zero.

calloc returns a pointer to the portion allocated. The pointer is aligned for
any type of object. If it cannot allocate the amount of memory requested,
it returns NULL.

My guess is that the above implementation is broken.

In the above case, if you're using calloc to allocate memory for an array then
you can't use sizeof to find the size of your array (in bytes) since sizeof 
returns size_t. Could this be the source of confusion?

>Dave Prosser	...not an official X3J11 answer...

Roelof Vuurboom ...still not an official X3J11 answer...
-- 
Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

dfp@cbnewsl.ATT.COM (david.f.prosser) (07/11/89)

In article <149@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes:
>In article <1003@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:
>>In article <971@tukki.jyu.fi> tarvaine@tukki.jyu.fi (Tapani Tarvainen) writes:
>>>Something related which I would call a bug is the behaviour of
>>>calloc() that e.g., calloc(1000,1000) won't give an error or NULL but
>>>silently truncates the product to 16960 (== 1000000 && 0x0ffff) and
>>>allocates that amount.  What does the pANS say about overflow handling
>>>in this situation?
>>>-- 
>>
>>There is a general statement in section 4.1.6 for the arguments to the
>>library functions.  It allows undefined behavior in the library if a
>>function is passed arguments with invalid values, or values outside of
>>the function's domain.  
>
>True but not particularly relevant (me thinks) since each argument _is_
>valid and well within (I hope!) the domain that size_t can handle.

True, each argument value is within the range of values representable by
size_t, but that's not sufficient in this case.

>
>>Since calloc() must produce an object with no more bytes than can be 
>>counted in a size_t, 
>
>Care to quote the relevant sentence? Doesn't seem to have made it into my
>Jan 88 draft :-).
>

I did slightly misstate the above.  Let me go into the argument in more
detail:

What you are probably looking for is a statement somewhere in the memory
allocation portion of the pANS (section 4.10.3) that explicitly requires
that any allocated object's size must be no bigger than can be sized by
size_t or that the multiplication of the arguments must not be bigger than
a size_t.  You won't find such simply because this is not the way the pANS
is written.  Instead, the pANS handles dynamically allocated objects just
the same as other objects, as much as possible.  The relevant part of the
pANS is that section 3.3.3.4 requires that the sizeof operator evaluate
to the size of its operand in bytes, and that the type of a sizeof
expression is an unsigned integral type--the same as the typedef size_t.

It is possible for calloc to allocate an object bigger than can be described
by size_t, but it is not required to do so, just as an implementation can
choose not to accept a request for a statically allocated object bigger than
can be described by size_t.  (There is a requirement that objects of at
least 32767 bytes must be accepted.)

As a consequence, a strictly conforming program cannot request an object
bigger than 32767 bytes.  Given the semantics of C, calloc(1000,1000) is
such a request.  (It takes a bunch of references before this can be
fully supported, but I'm taking this as given for this discussion.)
Therefore, the portable domain of calloc has been left behind.  Since there
are no explicit statements that override the generic "behavior is undefined
given out-of-bounds arguments" for calloc, an implementation has no
behavior constraints.  In fact, a valid implementation can "choose" to
"core dump" when the multiplication overflows in calloc!

Thus, my statement that calloc cannot allocate an object larger than can
be sized by size_t was inaccurate: a strictly conforming program cannot
attempt to allocate an object bigger than size_t as in the example because
the behavior of the library is undefined.

>
>Quoting from the Mark Williams Ansi C - A lexical guide:
>
>calloc allocates a portion of memory large enough to hold count items,
>each of which is size bytes long. It then initializes every byte within
>the portion to zero.
>
>calloc returns a pointer to the portion allocated. The pointer is aligned for
>any type of object. If it cannot allocate the amount of memory requested,
>it returns NULL.

This is a how calloc must behave when given strictly conforming argument
values.  Since calloc(1000,1000) cannot be part of a strictly conforming
program, the implementation can choose to behave in virtually any manner,
including exec'ing "rogue", as has been noted in earlier postings.

>
>My guess is that the above implementation is broken.
>
>In the above case, if you're using calloc to allocate memory for an array then
>you can't use sizeof to find the size of your array (in bytes) since sizeof 
>returns size_t. Could this be the source of confusion?

In any strictly conforming program, sizeof *must* be able to return the
number of bytes in *any* object.  The pANS only describes the behavior of
strictly conforming programs, and translators that accept all strictly
conforming programs.  Since, as I have argued above, a program that contains
a call to calloc(1000,1000) that is executed is not strictly conforming,
the pANS does not constrain calloc's behavior.

>
>>Dave Prosser	...not an official X3J11 answer...
>
>Roelof Vuurboom ...still not an official X3J11 answer...
>-- 
>Roelof Vuurboom  SSP/V3   Philips TDS Apeldoorn, The Netherlands   +31 55 432226
>domain: roelof@idca.tds.philips.nl             uucp:  ...!mcvax!philapd!roelof

All of this does not mean that I believe that calloc(1000,1000) should "not
work"; this has all been in the realm of what does the pANS require if
calloc(1000,1000) occurs.  Moreover, as the argument hinges on the less than
strictly conforming nature of the call, and since everything except the
result of the multiplication is strictly conforming, the argument may well
be tenuous.  I, nevertheless, am sticking by my guns.

Dave Prosser	...not an official X3J11 answer... (of course)

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/12/89)

In article <1062@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:
>It is possible for calloc to allocate an object bigger than can
>be described by size_t, but it is not required to do so, ...
>In fact, a valid implementation can "choose" to
>"core dump" when the multiplication overflows in calloc!

Hey, Dave -- let's not get carried away here!

The Standard requires that it either allocate storage or, *if the space
cannot be allocated*, return a null pointer.  The meaning of "unable to
allocate space" is not specified, which leaves it up to the implementor.

There is no direct way to feed the actual (dynamically-allocated) object
pointed to by the pointer returned from one of the *alloc() routines to
the sizeof operator, so there is no operational way that the semantics
of sizeof can be "probed" by the huge object that we're presuming may be
actually allocated.  One can try to cast the pointer from void* into a
pointer to a huge array, but that permits compile-time determination
that the object size limit (e.g. representability in a size_t) is being
violated, and in any case is something that would only be done AFTER
*alloc() is called.  I see no reason to say that all these size
considerations permit the implementation of *alloc() from doing its job
properly, i.e. either correctly allocating contiguous storage or else
returning a null pointer.

>... a strictly conforming program cannot attempt to allocate an object
>bigger than size_t as in the example because the behavior of the library
>is undefined.

I don't see that at all, unless related to the 32,767 bytes mentioned in
2.2.4.1 (about which more below).  sizeof is not required to work
properly when applied to huge objects, but if a program avoids trying to
do that I don't see that it becomes non-conforming just because it
handles huge objects (that WOULD break sizeof IF fed to it, but which
they AREN'T).

>In any strictly conforming program, sizeof *must* be able to return the
>number of bytes in *any* object

... to which it is applied!  I don't see how a program could be
considered to be in violation for something that it doesn't do.

Re: 2.2.4.1:

The 32,767-byte object size can be argued to be among the "minimum
implementation limits" that a strictly conforming program shall not
exceed.  On the other hand, one can argue that *alloc() is obliged
to return a null pointer if an implementation "hard limit" really
would be exceeded by the *alloc() request.

This issue probably deserves consideration by X3J11 for a formal ruling.
I would be most upset if *alloc() gave me a non-null pointer which then
wouldn't work right.  It is simpler to let calloc() tell me when at run
time I've exceeded such a limit than to have to keep performing such
checks myself in my application code.

roelof@idca.tds.PHILIPS.nl (R. Vuurboom) (07/13/89)

In article <1062@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:
>In article <149@ssp1.idca.tds.philips.nl> roelof@idca.tds.PHILIPS.nl (R. Vuurboom) writes:
> [Interesting and intricate argument why calloc(1000,1000) could even
>  dump its core on the floor]

phew!

>calloc(1000,1000) occurs.  Moreover, as the argument hinges on the less than
>strictly conforming nature of the call, and since everything except the
>result of the multiplication is strictly conforming, the argument may well
>be tenuous.  I, nevertheless, am sticking by my guns.

Well...as long as you don't shoot yourself in the foot :-)

I think your train of argument is (formally) correct.
I agree with Doug Gwyn however that allowing the calloc to fail to ensure
that a _possible_ sizeof call will succeed seems to be putting the cart
before the horse. It makes more sense to me to just define nonconformant
behaviour for sizeof for objects larger than size_t can handle.

>
>Dave Prosser	...not an official X3J11 answer... (of course)

Of course :-)

tarvaine@tukki.jyu.fi (Tapani Tarvainen) (08/09/89)

In article <975@cbnewsl.ATT.COM> dfp@cbnewsl.ATT.COM (david.f.prosser) writes:
...
> size_t must be big enough
>to hold the number of bytes of any validly created object.
...
>Of course, if there is a "hugealloc()" function provided which is the
>only access to objects that are bigger than what sizeof or size_t can
>describe, this is still a conforming implementation.  If a program
>makes use of such a function, then a larger than size_t integral type
>would be necessary.

It turns out this is exactly the case with TurboC: malloc(), calloc()
and realloc() won't allocate blocks bigger than 64K.  If you need
such, you must use farmalloc(), farcalloc(), farrealloc(), which
expect the block size as a long, so TC appears to be conforming
in this respect after all.  

Unfortunately this apparently means there is no standard-conforming
way to create objects bigger than 64K in TC, or indeed using the huge
model at all in any useful way.  I do hope Borland does something to
this in a future version of TC, either change the behaviour of huge or
provide a separate ANSI-huge model where everything is long that needs
to be and pointer declarations and arithmetic work automatically OK
so that I can take a conforming program that needs big blocks and
compile it without any changes, just by setting a compiler option.

Something related which I would call a bug is the behaviour of
calloc() that e.g., calloc(1000,1000) won't give an error or NULL but
silently truncates the product to 16960 (== 1000000 && 0x0ffff) and
allocates that amount.  What does the pANS say about overflow handling
in this situation?
-- 
Tapani Tarvainen                 BitNet:    tarvainen@finjyu
Internet:  tarvainen@jylk.jyu.fi  -- OR --  tarvaine@tukki.jyu.fi