[net.unix] get size of malloc'd object

misha@daisy.UUCP (Mike Umansky) (06/13/86)

	I have a need in my application to find the size of a
	previously malloc'd object (maybe an array or structure).
	For example, one routine may allocate some array or just
	one big chunk of memory and pass a pointer to the chunk to
	another routine.  This other routine now needs to know the
	size of the area pointed to by this pointer.  Also, this
	other routine has no idea about the structure of this area
	and doesn't even care.  Now, the question, how can this
	other routine get the size of the area pointed to by passed
	pointer.  Would (sizeof(*ptr)) work or will it just give
	me the size of the pointer type?  Another possibility would
	be to write a routine which would scan malloc's allocation
	array and try to match the pointers; then the size is
	easily obtained.  Any help and comments on this would be
	greately appreciated.  Either post your answers or mail them
	directly please.  Thanks for consideration.
-- 
--
		NAME:	Michael Umansky (misha)
		E-MAIL: ucbvax!hplabs!nsc!daisy!misha
		WORK:
			Daisy Systems Corp.
			700B Middlefield Road
			Mountain View, CA  94039-7006
			(415) 960-7166 (work)
		HOME:
			94 Cassia Ct.
			Hayward, CA  94544
			(415) 886-4805 (home)

thomas@utah-gr.UUCP (Spencer W. Thomas) (06/13/86)

In article <165@daisy.UUCP> misha@daisy.UUCP (Mike Umansky) writes:
>	I have a need in my application to find the size of a
>	previously malloc'd object (maybe an array or structure).

We had this same need, and for many years satisfied it by writing a
macro SIZE(ptr) that `knew' how malloc saved the block size (usually in
the word preceding ptr).  Finally, with the 4.3 release, and in the
interests of easier portability (we were distributing our own malloc
with our code), we went to a different scheme:  We now do storage
allocation via two routines `new' and `dispose'.  The function `new'
asks malloc for the size requested plus one word, puts the size in the
extra word, and returns a pointer to the next word (obviously, it has to
worry about alignment, too, so it may not be quite this simple). The
function `dispose' readjusts the pointer passed to it and calls free.  A
third routine, `size', can then easily look at the word preceding the
pointer passed to it and return the size of the block.  This scheme has
the advantage of being blissfully independent of the underlying storage
allocation management.

-- 
=Spencer   ({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA)

ed@valid.UUCP (Ed Patriquin) (06/13/86)

> Xref: pesnta net.unix:21451 net.unix-wizards:22133 net.arch:21118 net.lang.c:21930
> 
> 
> 	I have a need in my application to find the size of a
> 	previously malloc'd object (maybe an array or structure).
> 	Also, this other routine has no idea about the structure 
>	of this area and doesn't even care.  Now, the question, 
>	how can this other routine get the size of the area pointed
>	to by passed pointer.
> 		NAME:	Michael Umansky (misha)
> 			Daisy Systems Corp.

	Well in UNIX(TM) one could write their own memory allocator
	and include a chunksize (ptr) function,  I don't know how you
	would do it in DNIX(TM).  But really, why not establish a protocol
	between the offending functions.  Since some function somewhere
	knows how big the chunk is, create a data structure which contains
	the information and make it available to the rest of the program.
	
	It is really hard to come up which general purpose algorithmic 
	solutions to problems without having even a glimmer of what
	environment you are talking about, i.e. passing around display
	lists, or strings or matrix inversion.
-- 

	IRS:	Ed Patriquin
	USENET:	...!ucbvax!hplabs!pesnta!valid!ed
	USPS:	Valid Logic, 2820 Orchard Parkway, San Jose, Calif 95134
	ATT:	(408) 945-9400

This disclaimer is to disclaim any knowledge of a disclaimer.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/15/86)

In article <165@daisy.UUCP> misha@daisy.UUCP (Mike Umansky) writes:
-	I have a need in my application to find the size of a
-	previously malloc'd object (maybe an array or structure).
-	For example, one routine may allocate some array or just
-	one big chunk of memory and pass a pointer to the chunk to
-	another routine.  This other routine now needs to know the
-	size of the area pointed to by this pointer.  Also, this
-	other routine has no idea about the structure of this area
-	and doesn't even care.  Now, the question, how can this
-	other routine get the size of the area pointed to by passed
-	pointer.  Would (sizeof(*ptr)) work or will it just give
-	me the size of the pointer type?  Another possibility would
-	be to write a routine which would scan malloc's allocation
-	array and try to match the pointers; then the size is
-	easily obtained.

Sorry to be the bearer of bad tidings, but there is NO reliable
way to do what you're attempting.  malloc() may even have allocated
more space than you asked for, but in any case there's no portable
way to find out how much.  You'll just have to keep track of the
size of your malloc()ed objects yourself.

jer@peora.UUCP (J. Eric Roskos) (06/17/86)

	It is really hard to come up which general purpose algorithmic 
	solutions to problems without having even a glimmer of what
	environment you are talking about ...

Sure it is.  Just write your own routine to call malloc; allocate a
sizeof(int) worth of extra space, then store the size of the thing you
malloc'ed in the int at the front of the allocated block, advance the
pointer past the place where your stored the size, and return that as
the pointer to the block you allocated.  The size of the object is then
found in the word preceeding the location pointed to by the object pointer.

Also provide a routine to deallocate the block by backing the pointer up
before calling free().

This approach is portable, simple, and easy to understand.
Also it doesn't require any assumptions about what kind of objects are
being allocated.
-- 
E. Roskos
    "They are a nexus of commercially produced artifact and individual
     message that expresses a dialect central to popular culture."
				-- Sociologist explaining the reason for
				   having Florida postcards.

chris@umcp-cs.UUCP (Chris Torek) (06/18/86)

In article <2206@peora.UUCP>, jer@peora.UUCP (J. Eric Roskos) writes:
>... allocate a sizeof(int) worth of extra space, then store the
>size of the thing you malloc'ed in the int at the front of the
>allocated block, advance the pointer past the place where your
>stored the size, and return that as the pointer to the block you
>allocated.
>
>This approach is portable, simple, and easy to understand.  Also
>it doesn't require any assumptions about what kind of objects are
>being allocated.

Unfortunately, it is not necessarily portable.  I can imagine a
machine where `double' arguments must be aligned on an eight-byte
boundary, but integers are only four bytes wide.  In this case
calling the new routine to allocate doubles would result in an
odd-address-style trap when the returned value is used.

What can be done is this:

	#include <stdio.h>

	/* change all `char *'s to `void *'s in ANSI C */
	struct mymem {
		struct	mymem *m_next;	/* linked list */
		struct	mymem *m_prev;	/* doubly, for easy removal */
		unsigned m_size;	/* size of mem, in bytes */
		char	*m_mem;		/* storage */
	};

	/* queue head */
	static struct mymem m_base = { &m_base, &m_base };

	extern char *malloc();

	/*
	 * Allocate some memory, remembering its size.
	 */
	char *
	myalloc(size)
		unsigned size;
	{
		register struct mymem *m;

		if ((m = (struct mymem *) malloc(sizeof (*m))) == NULL)
			return (NULL);
		if ((m->m_mem = malloc(size)) == NULL) {
			free((char *) m);
			return (NULL);
		}
		m->m_size = size;
		/*
		 * Insert at head: change to tail if mem usage is more
		 * `queueish' than `stackish'.
		 */
		m->m_next = m_base.m_next;
		m->m_prev = &m_base;
		m_base.m_next->m_prev = m;
		m_base.m_next = m;
		return (m->m_mem);
	}

	/*
	 * Locate the queue element corresponding to the given memory
	 * region.
	 */
	static struct mymem *
	myfind(mem)
		register char *mem;
	{
		register struct mymem *m = &m_base;

		while ((m = m->m_next) != &m_base)
			if (m->m_mem == mem)
				return (m);
		return (NULL);
	}

	/*
	 * Free a region, and drop it from the queue.
	 */
	myfree(mem)
		char *mem;
	{
		register struct mymem *m = myfind(mem);

		if (m == NULL) {
			/*
			 * Complain?  Abort?  Or what?
			 * Change to suit local tastes...
			 */
			(void) fprintf(stderr, "myfree: invalid free\n");
			(void) fflush(stderr);
			abort();
		}
		/* remove from queue */
		m->m_next->m_prev = m->m_prev;
		m->m_prev->m_next = m->m_next;
		/* and discard */
		free(m->m_mem);
		free((char *) m);
	}

	unsigned
	mysize(mem)
		char *mem;
	{
		struct mymem *m;

		if ((m = myfind(mem)) == NULL)
			return (0);	/* not allocated */
		return (m->m_size);
	}

This definitely *is* portable, if slow for large remembrance queues.

Of course, malloc() knows all along how big the allocated region
is (though the number it knows might be larger than the original
request); so it is a shame to have to go through all this effort.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

jer@peora.UUCP (J. Eric Roskos) (06/18/86)

> Sure it is.  Just write your own routine to call malloc; allocate a
> sizeof(int) worth of extra space, then store the size of the thing you
> malloc'ed in the int at the front of the allocated block, advance the
> pointer past the place where your stored the size, and return that as
> the pointer to the block you allocated.  The size of the object is then
> found in the word preceeding the location pointed to by the object pointer.

hansen@pegasus points out that for this to work, the distance you advance
the pointer by has to be sufficient to put the pointer on the boundary
malloc usually aligns on.  So you have to allocate sizeof(ALIGN) extra
space for it to work for all possible types of data, where ALIGN is the
largest alignment boundary required for the machine.  E.g., on a typical
machine it might work for chars and ints, but not double-precision
floating point numbers.
-- 
E. Roskos

    "Hold on, Figment."
    "Why?"
    "The Idea Bag is full."
    "Then, let's imagine--"
    "--No, wait."
	    -- Dialogue from "Journey Into Imagination,"
	       Kodak exhibit, Epcot Center, Orlando, FL.

tps@sdchem.UUCP (Tom Stockfisch) (06/19/86)

In article <2206@peora.UUCP> jer@peora.UUCP (J. Eric Roskos) writes:
>
>	It is really hard to come up which general purpose algorithmic 
>	solutions to problems without having even a glimmer of what
>	environment you are talking about ...
>
>Sure it is.  Just write your own routine to call malloc; allocate a
>sizeof(int) worth of extra space, then store the size of the thing you
>malloc'ed in the int at the front of the allocated block, advance the
>pointer past the place where your stored the size, and return that as
>the pointer to the block you allocated.  The size of the object is then
>found in the word preceeding the location pointed to by the object pointer.
>...
>This approach is portable, simple, and easy to understand.
>Also it doesn't require any assumptions about what kind of objects are
>being allocated.
>-- 
>E. Roskos

This is NOT portable.  Suppose your machine requires "doubles" to be aligned on
addresses divisible by 8 and sizeof(int) == 4.  You do

	double	*dp =	(double *)my_malloc( (unsigned)100 * sizeof(double) );

If my_malloc() calls malloc() and then increments the
pointer returned by malloc by 4, you will get a segmentation fault as soon
as your allocated space is accessed.  A *portable* way to do this (which
wastes a little memory, but doesn't require that you know the worst-case
alignment size) is to hand both the unit size and number of units to
my_malloc() and allocate one extra *unit* to store the size of the space in.
Of course, if the unit size < sizeof(int) you have to do a little extra
diddling.

Usually when I need to know the end of the allocated space I need to access
it too often to be done by a subroutine call so I set up a structure to
replace the pointer:
	struct foo {
		arraytype	*beg;	/* begining of array */
		arraytype	*end;	/* end of array */
	}	bar;

-- Tom Stockfisch, UCSD Chemistry

brett@wjvax.UUCP (06/20/86)

In article <2206@peora.UUCP> jer@peora.UUCP (J. Eric Roskos) writes:
>
>	It is really hard to come up which general purpose algorithmic 
>	solutions to problems without having even a glimmer of what
>	environment you are talking about ...
>
>Sure it is.  Just write your own routine to call malloc; allocate a
>sizeof(int) worth of extra space, then store the size of the thing you
>malloc'ed in the int at the front of the allocated block, advance the
>pointer past the place where your stored the size, and return that as
>the pointer to the block you allocated.  The size of the object is then
>found in the word preceeding the location pointed to by the object pointer.
>
>Also provide a routine to deallocate the block by backing the pointer up
>before calling free().
>
>This approach is portable, simple, and easy to understand.
>Also it doesn't require any assumptions about what kind of objects are
>being allocated.

In fact, this assumes that the resulting pointer

	(malloc(len+sizeof(int))+sizeof(int))

is still well-aligned.  This is only true if sizeof(int) is the most coarse
alignment boundary on the machine.  If, for example, structures require more
coarse alignment, attempting to cast the above pointer to a structure pointer
will fail.

-------------
Brett Galloway
{pesnta,twg,ios,qubix,turtlevax,tymix,vecpyr,certes,isi}!wjvax!brett

jer@peora.UUCP (06/23/86)

> Unfortunately, it is not necessarily portable.  I can imagine a
> machine where `double' arguments must be aligned on an eight-byte
> boundary, but integers are only four bytes wide.

That's a good point.  It's unfortunate that in all the system-specific
#defines that one can include, there isn't one that specifies the alignment
to be used in malloc's, etc; malloc has it #defined in it, as do a number
of other programs (e.g., Peter Honeyman's "pathalias" program).

Personally I still favor putting the size up at the front, eventhough it
requires now making the alignment a configuration parameter (i.e., the user
has to specify it for various machines), because maintaining a linked list
or other external record of the size introduces a consistency problem -- you
have to keep your external size information consistent with what malloc
knows the size to be, i.e., the physical separation of the allocated block
and the array or linked list that maintains the information makes it more
likely that they can become inconsistent.

(Notice, incidentally, that this is essentially a flaw in C -- there is an
essential piece of information, the alignment constant, that is not
available from within the language, eventhough the language allows one
to do things that require this information.)

Actually there's a trick you could use to get the alignment constant, maybe,
however:

	char *p, *q;
	int diff;

	p = malloc(1);
	q = malloc(1);

	diff = q - p;
	if (diff < 0) diff = p - q;

But it's not really portable either.  Although the argument to malloc is
defined to be a number of "bytes", and although in C a "byte" is an
"unspecified unit ... which is the same size as a char" (K&R p. 126), and
subtracting two pointers gives "the number of elements between p and q"
(K&R p. 98), where in this case an "element" is a char which is the same
size as a "byte", which means the distance computation itself is correct,
you're not guaranteed that the two consecutive mallocs will give
consecutive blocks of memory -- they certainly won't be likely to after
some other mallocs and frees have been done.  It's hard to see a situation
where the initial allocations wouldn't be consecutive, but it's not
guaranteed, and that's sufficient reason to have doubts.  (The reason for
the "if" above is to handle mallocs that allocate consecutively upwards
or downwards; both of which, like stacks that grow upwards or downwards,
do exist.)

Of course, our "compiler wizard" who is sitting here in my office reading
the newspaper says "just align it on a quadword".  :-)
-- 
E. Roskos

zben@umd5.UUCP (Ben Cranston) (06/23/86)

In article <2081@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:

> In article <2206@peora.UUCP>, jer@peora.UUCP (J. Eric Roskos) writes:
>> ... allocate a sizeof(int) worth of extra space, then store the
>> size of the thing you malloc'ed in the int at the front of the
>> allocated block, advance the pointer past the place where your
>> stored the size, and return that as the pointer to the block you
>> allocated.

> Unfortunately, it is not necessarily portable.  I can imagine a
> machine where `double' arguments must be aligned on an eight-byte
> boundary, but integers are only four bytes wide.  In this case
> calling the new routine to allocate doubles would result in an
> odd-address-style trap when the returned value is used.

I can see three solutions to this problem.  In order of lowest to highest
degree of hacking to Un*x itself:

1. Erect a superstructure upon Malloc that maintains the required additional
   information (this is what Chris did in the deleted part of his posting).
   Requires *no* changes to Un*x, best when the additional gore of the new
   superstructure can be swamped in the gore of a large system within which
   the my-malloc is being distributed (hi Chris!).

2. Require something like malloc.h to include not only the function headers
   for malloc and free, but either one of these two:

#define ALIGNTYPE double
   -or-
#define ALIGNSIZE 8

   The definitions should be obvious.  This requires only the provenance
   of the appropriate header file (and unfortunately standardization of the
   particulars of the implementation... :-).

3. Define the Malloc package portably in C, so the information required can
   be extracted from the underlying data base.  The problem with this
   approach is that the primitives upon which Malloc is based, (e.g. sbrk() )
   would then have to be standardized (hey, maybe this posting belongs in
   this [net.arch] group after all :-).

If anybody is going to do this, and would like to use a Fibonacci number
variant of the binary buddy system, see Rick's and my article in CACM:
"A Simpified Recombination Scheme for the Fibonacci Buddy System", I think
it was either June or July of 1975...

> Of course, malloc() knows all along how big the allocated region
> is (though the number it knows might be larger than the original
> request); so it is a shame to have to go through all this effort.

Yes, but I think there is a question whether you want to tell the user
about 1. how much he ASKED for, or 2. how much happens to BE there...
I think there is an argument for either side, so perhaps there needs to
be two ways of doing this.

Feeping Creaturism???

-- 
                    umd5.UUCP    <= {seismo!umcp-cs,ihnp4!rlgvax}!cvl!umd5!zben
Ben Cranston zben @ umd2.UMD.EDU    Kingdom of Merryland Sperrows 1100/92
                    umd2.BITNET     "via HASP with RSCS"

desj@brahms.BERKELEY.EDU (David desJardins) (06/24/86)

   I can't believe all of this discussion about clever tricks to try to guess
how malloc does alignment, or what the maximum alignment boundary is, or
whatever.  No matter what you do along these lines, it is going to be possible
to devise a perfectly valid C implementation in which it will fail.
   It seems obvious that the solution to the whole problem is to define a
structure which contains two fields: the pointer to memory returned by malloc,
and an int (or long int, or whatever) which contains the size of the block
requested from malloc.  This is guaranteed to work, as opposed to the many
alternate implementations that have been proposed and are guaranteed to fail
on at least some architectures.
   Note that in a strongly-typed language you would have had to do the right
thing from the beginning...

   -- David desJardins

ark@alice.UucP (Andrew Koenig) (06/24/86)

There is a very simple way to do this portably:

	struct chunk {
		unsigned size;
		char *mem;
	};

	struct chunk
	challoc (n)
		unsigned n;
	{
		struct chunk ch;
		ch.size = n;
		ch.mem = malloc (n);
		return ch;
	}

	void
	chfree (ch)
		struct chunk ch;
	{
		if (ch.mem)
			free (ch.mem);
	}

Now rewrite your program to deal with memory through chunk structures
instead of through simple pointers.  It should be possible to make this
easier by using macros.

franka@mmintl.UUCP (Frank Adams) (06/24/86)

In article <2216@peora.UUCP> jer@peora.UUCP writes:
~>determine alignment by difference between consecutive malloc'ed objects

>you're not guaranteed that the two consecutive mallocs will give
>consecutive blocks of memory -- they certainly won't be likely to after
>some other mallocs and frees have been done.  It's hard to see a situation
>where the initial allocations wouldn't be consecutive, but it's not
>guaranteed, and that's sufficient reason to have doubts.

(1) If you are using the "buddy system"[1] for storage allocation, and the
total storage available is not a power of two, initial allocations might
well not be consecutive.

(2) Some version of malloc have header information about the allocated block
immediately before the block.  For such versions, the distance between blocks
always includes at least one such header.

(3) Storage may be allocated and freed in the process of starting the
program, before your code gets control.

(4) Storage for all programs may be allocated from a single heap, so that
other tasks would have done malloc's and free's before (or even between)
yours.

In short, this is completely unreliable.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

[1] For a description of the buddy system, see Knuth, _The_Art_of_Computer
_Programming,_vol._1,_Fundamental_Algorithms_.

tanner@ki4pv.UUCP (Tanner Andrews) (06/24/86)

The problem with alignments, it would appear to me, would be easily
solved by allowing for the largest machine type, which should be
double.  Thus (cast the routine type as needed when you call it):

 /*  allocate core -- must free using my_free()
  */
char *my_malloc(sz)
short int sz;
	{
	register char
		*p;

	if  (p = malloc(sz+sizeof(double)))  {
		*((short int *)p) = sz;
		p += sizeof(double);
		}
	return(p);
	}

 /*  size of allocated object -- works for any p such that p is return
  *  value from my_malloc
  *  which has not been my_free'd
  */
short int my_sizeof(p)
char *p;
	{

	return(*((short int *)(p - sizeof(double)))

	}

-- 
<std dsclm, copies upon request>	   Tanner Andrews

jiml@uwslh.UUCP (Jim Leinweber) (06/24/86)

One nasty efficiency quirk probably ought to be mentioned.  The schemes which
add extra user information to remember the size of the object are okay,
modulo some pointer alignment details.  Care with the size of the objects
may be needed if you are allocating lots of them, though.  Some memory
allocators allocate things in fixed sizes, such as power-of-two minus some
internal overhead. If a program is allocating funny sized buffers scaled to
cooperate with brain damaged allocators, and then adds extra overhead, the
resulting size may cause poor allocators to waste lots of space.  Of course,
'knowing' the size of blocks actually allocated is not portable. But if you
are tight on address space or physical memory, it could matter.

Jim Leinweber   			usenet:  jiml@uwslh.uucp
State Hygiene Laboratory        	  {seismo, harvard, topaz, ihnp4,...}!
University of Wisconsin			  uwvax!uwslh!jiml
465 Henry Mall				internet:    uwslh!jiml@wisc.edu
Madison, Wisconsin, USA  53706		phone:   (608) 262-8092
-- 
James E. Leinweber   ...!{seismo,harvard,topaz,ihnp4}!uwvax!uwslh!jiml
Wisconsin State Hygiene Lab          (608) 262-8092
University of Wisconsin; 465 Henry Mall; Madison, WI  53706

arnold@ucsfcgl.UUCP (Ken Arnold%CGL) (06/25/86)

In article <2081@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
>In article <2206@peora.UUCP>, jer@peora.UUCP (J. Eric Roskos) writes:
>>... allocate a sizeof(int) worth of extra space, then store the
>>size of the thing you malloc'ed in the int at the front of the
>>allocated block, advance the pointer past the place where your
>>stored the size, and return that as the pointer to the block you
>>allocated.
>>
>
>Unfortunately, it is not necessarily portable.  I can imagine a
>machine where `double' arguments must be aligned on an eight-byte
>boundary, but integers are only four bytes wide.  In this case
>calling the new routine to allocate doubles would result in an
>odd-address-style trap when the returned value is used.

So put the int at the end of the space, after increasing the size
to be allocated by sizeof (int) + enough space to align the int
on a byte which is a multiple of sizeof (int).  Then return the
original malloc()ed pointer.

Can we say "hack"?

		Ken Arnold

ark@alice.UucP (Andrew Koenig) (06/25/86)

> One nasty efficiency quirk probably ought to be mentioned.  The schemes which
> add extra user information to remember the size of the object are okay,
> modulo some pointer alignment details.  Care with the size of the objects
> may be needed if you are allocating lots of them, though.  Some memory
> allocators allocate things in fixed sizes, such as power-of-two minus some
> internal overhead. If a program is allocating funny sized buffers scaled to
> cooperate with brain damaged allocators, and then adds extra overhead, the
> resulting size may cause poor allocators to waste lots of space.  Of course,
> 'knowing' the size of blocks actually allocated is not portable. But if you
> are tight on address space or physical memory, it could matter.

Translated into English, this says:

	Some memory allocators are broken.  Thus we should all
	warp our programs into forms that work well with these
	broken memory allocators.

jer@peora.UUCP (J. Eric Roskos) (06/25/86)

> Yes, but I think there is a question whether you want to tell the user
> about 1. how much he ASKED for, or 2. how much happens to BE there...

The only portable value is how much he asked for.  If he got more than
that, it is just an artifact of the memory allocation scheme, which would
be implementation-dependent, and thus he shouldn't use it; allowing him to
know how much extra he really got is a step in the direction of promoting
him to use it, which would seem not to be a good idea.
-- 
E. Roskos

arnold@ucsfcgl.UUCP (Ken Arnold%CGL) (06/26/86)

In article <9894@ucsfcgl.ucsfcgl.UUCP> I write:
>So put the int at the end of the space, after increasing the size
>to be allocated by sizeof (int) + enough space to align the int
>on a byte which is a multiple of sizeof (int).  Then return the
>original malloc()ed pointer.
>
>Can we say "hack"?

I don't know what came over me.  I only wish I could say that I was on
drugs at the time I wrote this, since at least my embarassment could
have been partially compensated by the entertainment.  Please forgive
my obvious error, and just chalk it up to a long, tired day.

		Ken Arnold

desj@brahms.BERKELEY.EDU (David desJardins) (06/26/86)

In article <9894@ucsfcgl.UUCP> arnold@ucsfcgl.UUCP (Ken Arnold) writes:
>So put the int at the end of the space, after increasing the size
>to be allocated by sizeof (int) + enough space to align the int
>on a byte which is a multiple of sizeof (int).  Then return the
>original malloc()ed pointer.

   This is the silliest thing I've seen yet.  If you have a pointer to the
beginning of a region, and store the size of the region at the end of the
region, how are you ever going to be able to recover this information??

   -- David desJardins

chris@umcp-cs.UUCP (Chris Torek) (06/26/86)

In article <6922@ki4pv.UUCP> tanner@ki4pv.UUCP (Tanner Andrews) writes:
>The problem with alignments, it would appear to me, would be easily
>solved by allowing for the largest machine type, which should be
>double.

Why?

If you mean to state that in every C compiler now available, the
largest machine type is `double', then I might believe you.  But
if you mean that the largest machine type in any future C compiler
will always be `double'---well, to put it mildly, I doubt it.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

flaps@utcs.UUCP (06/27/86)

A way to get a sufficient alignment constant should be as follows:

union ALIGNED {
	char garbage0;
	unsigned char garbage1;
	short garbage2;
	int garbage3;
	unsigned garbage4;
	long garbage5;
	unsigned long garbage6;
	float garbage7;
	double garbage8;
};

(sorry if I've left out any)

Then you can use sizeof(ALIGNED) as the alignment constant.  Am I mistaken?
The problem is that this number might be greater than the actual
alignment constant, for example if a "double" has length 8 but only
needs quad-word alignment.

--
Alan J Rosenthal
{decvax|cbosgd|linus}!ihnp4!utcs!flaps, utzoo!utcs!flaps

greg@utcsri.UUCP (Gregory Smith) (06/27/86)

In article <1986Jun27.01:36:11.4839@utcs.uucp> flaps@utcs.UUCP (Alan J Rosenthal) writes:
>
>A way to get a sufficient alignment constant should be as follows:
>
>union ALIGNED {
>	char garbage0;
>	unsigned char garbage1;
>	short garbage2;
>	int garbage3;
>	unsigned garbage4;
>	long garbage5;
>	unsigned long garbage6;
>	float garbage7;
>	double garbage8;
>};
>
>(sorry if I've left out any)
>
>Then you can use sizeof(ALIGNED) as the alignment constant.  Am I mistaken?
			 ^ union
>The problem is that this number might be greater than the actual
>alignment constant, for example if a "double" has length 8 but only
>needs quad-word alignment.
>
How about:
struct foo{
	char ignatz;
	union ALIGNED wumpus;	/* union ALIGNED as above */
};
then the most strict alignment is given by

	sizeof( struct foo ) - sizeof( union ALIGNED )

Interesting aside:
I first thought of declaring 'struct foo xyzzy' and using:

	((char *)&xyzzy.wumpus - (char*)&xyzzy)

The problem is that this is not a run time constant, but it could
be if the semantics of '-' in constant expressions allowed pointer
subtraction if both pointers were based at the same label. The
portability of this is dubious. The interesting part:

int i = 
	((char *)&xyzzy.wumpus - (char*)&xyzzy);

this gave the following on PCC:

"test.c", line 20: compiler error:
		expression causes compiler loop: try simplifying

-- 
"Shades of scorpions! Daedalus has vanished ..... Great Zeus, my ring!"
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg

levy@ttrdc.UUCP (Daniel R. Levy) (06/29/86)

In article <9894@ucsfcgl.ucsfcgl.UUCP>, arnold@ucsfcgl.UUCP writes:

>In article <2081@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes:
>So put the int at the end of the space, after increasing the size
>to be allocated by sizeof (int) + enough space to align the int
>on a byte which is a multiple of sizeof (int).  Then return the
>original malloc()ed pointer.
>
>Can we say "hack"?
>
>		Ken Arnold

Now tell me please, how is one supposed to get at the int containing the
size later on (only knowing the pointer)?  You have to know the size
in order to calculate the offset to reach the size data!  Hardly sounds useful
(that's why the other suggestions have been to PREpend the size data, so its
location is known, though details have to be ironed out which has been fueling
much of the debate).

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/29/86)

In article <9895@ucsfcgl.ucsfcgl.UUCP> arnold@ucsfcgl.UUCP (Ken Arnold%CGL) writes:
-In article <9894@ucsfcgl.ucsfcgl.UUCP> I write:
->So put the int at the end of the space, after increasing the size
->to be allocated by sizeof (int) + enough space to align the int
->on a byte which is a multiple of sizeof (int).  Then return the
->original malloc()ed pointer.
-
-I don't know what came over me.  I only wish I could say that I was on
-drugs at the time I wrote this, since at least my embarassment could
-have been partially compensated by the entertainment.  Please forgive
-my obvious error, and just chalk it up to a long, tired day.

I dunno, Ken -- I thought the idea of putting the size at the end of
the data was pretty amusing.

It might be worth observing that several requests to these nets take
the general form:
	"How do I do X?  Here's what I tried so far..."
where "X" is only part of a solution to the REAL problem.  Usually,
there are other solutions to the REAL problem that do not have "X"
as a subcomponent.  It often happens that solving "X" is much harder
than solving the REAL problem.  So, if you find yourself having
difficulties like this, it may be that you've prematurely settled on
a difficult approach and that it might be worthwhile to step back
and consider whether there is some simpler, more straightforward
method.

Now, if you find yourself truly needing to keep track of the size of
dynamically-allocated chunks of data space, you can take either of
two good general approaches.  The first is:

	typedef struct
		{
		void	*loc;
		size_t	size;
		}	alloc_info;	/* this should be in a header file */

	alloc_info *
	my_alloc( size_t length )	/* use this instead of malloc */
		{
		extern void	free(void *), *malloc(size_t);
		register alloc_info	*p;

		if ( (p = (alloc_info *)malloc( sizeof(alloc_info) )) != 0
		  && (p->loc = malloc( p->size = length )) == 0
		   )	{
			free( (void *)p );
			p = 0;
			}
		return p;
		}

and the second is to supply one's own memory allocator (Knuth's
"boundary tag" method is particularly suitable for this application).
Make sure that your allocator can coexist with malloc(), since the C
library may have to invoke malloc().  One way to guarantee this is to
rip off a large pool of memory with malloc() early on, then have one's
own allocator restricted to allocating from this pool.  If you're more
ambitious, you could try implementing your own virtual memory scheme
using buffers into a disk file (this makes data access more complicated
than simply using pointers, however).  The best implementation I ever
had was under single-job RT-11, where I could directly manipulate the
KT-11 memory management hardware registers.  (Not feasible under UNIX,
but perhaps worth considering for stand-alone applications.)

jer@peora.UUCP (J. Eric Roskos) (06/29/86)

>    It seems obvious that the solution to the whole problem is to define a
> structure which contains two fields: the pointer to memory returned by malloc,
> and an int (or long int, or whatever) which contains the size of the block
> requested from malloc.  This is guaranteed to work, as opposed to the many
> alternate implementations that have been proposed and are guaranteed to fail
> on at least some architectures.
>    Note that in a strongly-typed language you would have had to do the right
> thing from the beginning...

Wouldn't malloc() be something of a hole in a strongly-typed language
anyway?  I mean, if you are going to allocate an object in a strongly-typed
language, you should have to allocate an object of a particular type, not
just "a block of memory that is aligned on the worst-case alignment
boundary".

The above "obvious" solution has the shortcoming that either you return a
pointer to the structure (in which case the memory allocator doesn't work
like standard malloc), or you have to have a way to coordinate the structure
with the allocated block of memory: when the user frees a block, you have
to find the structure given only the address of the block -- which is
stored in the structure, but not in the block you're freeing.

From this discussion I have become firmly convinced (well, semi-firmly)
that this reflects a problem in the C language.  Specifically, C does
allow constraints on alignment, which is reasonable, but there isn't any
way to find out anything about alignment from within the language.
The ANSI C folks should look into that.  Maybe there should be an operator
"alignment" that works the same as "sizeof" but gives the alignment required
for a specified type; and some defined constant (like MAXALIGN) that gives
the worst-case alignment.  For example, how can you write a portable
malloc?  After all, that's really what we've been trying to do here.  You
can argue "malloc is sort of part of the language so it is OK for it to
have to have changes made to it when you port it to a different machine,"
(or put "is too close to the hardware" in place of "is part of the
language", or some similar argument) but the problem is, that seems to give
malloc an unduly privileged status.  What if you do want to write your
own malloc?  Malloc really isn't "part of the language," and there's no
reason why it should have to be considered "too close to the hardware"
except inasmuch as there is this problem with C that you have to tell your
program something you've determined from an outside source that couldn't
be found out from within the language itself.  But of course the
compiler knows about alignment requirements, and we have here a 
demonstrated "need to know," only the compiler won't tell you what it
knows because there's nothing in C to let it do so! 

Well, the above is my argument for adding information about alignment
to the C language.
-- 
E. Roskos

mash@mips.UUCP (John Mashey) (06/30/86)

Although the topic is interesting, I suggest that it's not so crucial as
to be posted to to 4 groups, whose readership overlaps heavily anyway.
The discussion has gotten especially far from anything related to net.arch.
How about moving this to net.lang.c?
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

aglew@ccvaxa.UUCP (07/01/86)

... > alignment

E. Rosko is right - C needs to be able to specify alignment better than
double. Of course it's non-portable, but C is used to write one hell of
a lot of machine dependent programs in an almost portable manner. I've
needed to be able to specify an alignment spec in programs where it really
made a big difference to have data structures aligned on cache boundaries.

Doesn't Ada have an alignment pragma? If not Ada, then some flavour of 
Pascal that I've seen somewhere in my travels.

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms

franka@mmintl.UUCP (Frank Adams) (07/01/86)

In article <2219@peora.UUCP> jer@peora.UUCP writes:
>> Yes, but I think there is a question whether you want to tell the user
>> about 1. how much he ASKED for, or 2. how much happens to BE there...
>
>The only portable value is how much he asked for.  If he got more than
>that, it is just an artifact of the memory allocation scheme, which would
>be implementation-dependent, and thus he shouldn't use it; allowing him to
>know how much extra he really got is a step in the direction of promoting
>him to use it, which would seem not to be a good idea.

Depends on what you mean by "portable".  Certainly, it is possible to use
this number in an implementation-dependent way; but there are natural and
non-implementation-dependent ways to use the information.  E.g., if one is
allocating character strings and one of them grows, one can try to fit the
expanded string into the allocated size, and only reallocate (and reset all
the pointers to the data) if it won't fit.

The philosphy of C is to give the programmer access to information if there
are reasonable ways to use it, not to try to withhold the information if
there are unreasonable ways to use it.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

jer@peora.UUCP (J. Eric Roskos) (07/03/86)

> Depends on what you mean by "portable".  Certainly, it is possible to use
> this number in an implementation-dependent way; but there are natural and
> non-implementation-dependent ways to use the information.  E.g., if one is
> allocating character strings and one of them grows, one can try to fit the
> expanded string into the allocated size, and only reallocate (and reset all
> the pointers to the data) if it won't fit.

Well, actually it depends on some other factors, just fairly abstract ones
(and thus debatable).  According to the System V manual, "Malloc returns a
pointer to a block of at least _size_ bytes suitably aligned for any use."

Notice that this places no upper bound on the size that's actually allocated.
Now, you can argue "well, obviously it can't be more than the number of
bytes representable in an int", or something of that nature, but such an
argument wouldn't hold water in any formal proof.  The only thing it says
is *at least* that many bytes.

The problem is that, unless the definition is constrained more, it makes
it impossible to prove a program that uses the "actual size allocated"
correct.  Suppose you store the actual size allocated in an int in your
program.  Then I can argue that on some implementation, the maximum
actual size allocated requires a long int.  If you use a long int, I
can argue that it requires 2* the number of bits in a long int.  And so
on.  Since there's no upper bound specified for the size, there's always
a risk of truncation.

This seems to be somewhat abstract and theoretical, but the problem is
really that there's an "unknown variable" involved that's outside
the program, and it makes it hard to make assertions about the program's
portability to other environments since the variable changes according
to the environment.  You can think of more realistic examples; e.g.,
the situation where you want to save space by storing the size in a small
number of bits in a word, and it'll work fine for one implementation but
not another; but that gets into other portability problems of C (e.g.,
not knowing how many bits is really in an "int", etc.) and it wasn't
my intent to be arguing over the existing problems with C, just the
ones that might be avoided in a new feature.

I agree, however, that for "most cases" the "actual size allocated" might
be useful.  But it's difficult to see why, other than to avoid wasted space,
you would really need it; and my original reasoning in my original comment
was that you shouldn't provide features that introduce these sorts of
problems just out of a "liberty and freedom" approach to C -- I think you
shouldn't introduce arbitrary constraints, but there's a fine but definite
line between arbitrary constraints, in many cases, and purposeful ones.
It's just that the purposeful ones are sometimes esoteric.

If you wonder why anyone would be concerned with formal proofs, it's just
that more and more "real world" applications (especially those for the
gov't) require them, and the inability to do them well for C is one of
the reasons so many people favor Ada.  Personally I like C and would
rather have a "disciplined C".
-- 
E. Roskos
(See net.net-people.)

franka@mmintl.UUCP (Frank Adams) (07/09/86)

In article <2233@peora.UUCP> jer@peora.UUCP writes:
>[Frank Adams]
>> Depends on what you mean by "portable".  Certainly, it is possible to use
>> this number in an implementation-dependent way; but there are natural and
>> non-implementation-dependent ways to use the information.  E.g., if one is
>> allocating character strings and one of them grows, one can try to fit the
>> expanded string into the allocated size, and only reallocate (and reset all
>> the pointers to the data) if it won't fit.
>
>Well, actually it depends on some other factors, just fairly abstract ones
>(and thus debatable).  According to the System V manual, "Malloc returns a
>pointer to a block of at least _size_ bytes suitably aligned for any use."
>
>The problem is that, unless the definition is constrained more, it makes
>it impossible to prove a program that uses the "actual size allocated"
>correct.

So constrain the definition more.  It isn't written in stone.  I dare say
that there is no system in existence that would fail under the assumption
that the actual size allocated fits in a long -- probably none would fail
assuming an int.

Compared to the total task of dealing with the C environment in such a way
that correctness proofs become possible, this is a very tiny issue, easily
dealt with.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108