[comp.lang.c++] fragmentation of free store

jean@paradim.UUCP (Jean Pierre LeJacq) (04/02/91)

The free store memory managment system can coalesce
deleted memory to form larger blocks.  However, it
does not appear possible to defragment this memory
due to direct references to free store memory.

If this is the case free store memory allocation
could fail even when plenty of fragmented memory is
available.  Thus even programs that carefully delete
memory when no longer required could fail if run
for a sufficiently long period of time.

It would thus appear necessary to provide class
specific new and delete for any real application.
Am I missing something in this analysis (e.g. only
indirect references to free store memory)?

                Jean Pierre LeJacq

                uucp:     uunet!paradim!jean
                internet: paradim!jean@uunet.uu.net
                voice:    203 295 8185

gwu@nujoizey.tcs.com (George Wu) (04/03/91)

In article <285@paradim.UUCP>, jean@paradim.UUCP (Jean Pierre LeJacq) writes:
|> The free store memory managment system can coalesce
|> deleted memory to form larger blocks.  However, it
|> does not appear possible to defragment this memory
|> due to direct references to free store memory.
|> 
|> If this is the case free store memory allocation
|> could fail even when plenty of fragmented memory is
|> available.  Thus even programs that carefully delete
|> memory when no longer required could fail if run
|> for a sufficiently long period of time.
|> 
|> It would thus appear necessary to provide class
|> specific new and delete for any real application.
|> Am I missing something in this analysis (e.g. only
|> indirect references to free store memory)?

     Am I dated, here?  Jean Pierre's article seems to indicate there is
an implemented solution to memory fragmentation, but for some reason I don't
understand, this solution is not wholey effective.  Anyways, the remainder
of this message is based upon the assumption that no one has yet bothered
to implement a way to coalesce multiple memory fragments into a larger
single chunk of memory.

     Fragmented memory has long been a problem in UNIX, not just C++.  At
least up through BSD 4.3, the only reason our VAXen at C-MU ever came down
was because the kernel's free list was full of tiny fragments rather than
larger, usable chunks of memory.  After a couple of months, we'd simply have
to reboot the machine because the system had no way of coalescing the pieces.

     While I don't know for certain, nearly all other UNIX variants must
have the same problem.  The fault tolerant flavors are the most likely to
have attacked this.

     Therefore, while overloading new and delete to coalesce freed bits of
memory will help, to really solve the problem, you need to fix the problem
in the kernel as well (unless your application is clearly *the* primary,
almost sole, program to be run on the machine).

							George

----
George J Wu, Software Engineer        | gwu@tcs.com or uunet!tcs!gwu
Teknekron Communications Systems, Inc.| (415) 649-3752
2121 Allston Way, Berkeley, CA, 94704 | Quit reading news.  Get back to work.

smith@glinda.ctron.com (Larry Smith) (04/05/91)

In article <1986@godzilla.tcs.com> gwu@nujoizey.tcs.com (George Wu) writes:
>In article <285@paradim.UUCP>, jean@paradim.UUCP (Jean Pierre LeJacq) writes:
>|> The free store memory managment system can coalesce
>|> deleted memory to form larger blocks.  However, it
>|> does not appear possible to defragment this memory
>|> due to direct references to free store memory.
>
>     Am I dated, here?  Jean Pierre's article seems to indicate there is
>an implemented solution to memory fragmentation, but for some reason I don't
>understand, this solution is not wholey effective.  Anyways, the remainder
>of this message is based upon the assumption that no one has yet bothered
>to implement a way to coalesce multiple memory fragments into a larger
>single chunk of memory.

Actually, I believe the problem is due to the fact that *currently allocated*
chunks of memory are fragging the free space, and there is nothing that can
be done about this.  The only fix is to move the allocated blocks around
until they are all contiguous, and if you move them the zillions of pointers
imbedded in the data structures on the stack and in the heap will be wrong.

More modern systems use double-indirection to solve this problem.  Basically,
the memory manager keeps a list of master pointers and only gives out pointers
(in most cases, actually indexes) to these master pointers.  When the blocks
are moved, the master pointers are updated, and, since the master pointers
don't move, the program's pointers remain valid.  This scheme is no end of
useful - you can, for example, augment each master pointer with a count.  You
increment the count each time the master pointer is allocated.  When an
actual pointer is assigned it gets a copy of this count and each time it is
dereferenced the counts are checked.  If they don't match - BAM! - dangling
pointer, the program is trying to reference freed (and maybe *reallocated*)
memory.

There are lots of other things you can do with this type of scheme, but I won't
go into them now.  Sadly, few compilers use this mechanism to implement
pointers.  C, C++, Pascal, etc all implement pointers as bare addresses.  Of
course, you can implement the above scheme yourself in C, etc, but then you'll
have to dereference twice whereever the pointer is used.  This is why you'll
always see things like "WindowRecord^^.HorizSize" in Macintosh Pascal code -
the Mac's menu manager does this, but the compiler is unaware of it.  Result:
ugly code.

It will be a great advance for the poor overworked programmers of the world
when all compilers understand double-indirection schemes like this.  Until
then references to free memory or an over-fragmented heap will continue to
blow us out of the water - or force us to write code like the Mac's.

Larry Smith
smith@ctron.com

sabbagh@acf5.NYU.EDU (sabbagh) (04/05/91)

>     Therefore, while overloading new and delete to coalesce freed bits of
>memory will help, to really solve the problem, you need to fix the problem
>in the kernel as well (unless your application is clearly *the* primary,
>almost sole, program to be run on the machine).

Not necessarily. One could malloc a HUGE chunk of memory, then devise a memory
allocation scheme using handles (double indirection). This is how memory on
the Mac works. Using proper encapsulation, one can reduce the overhead and
make it almost completely transparent to the user.

Hadil G. Sabbagh
E-mail:		sabbagh@cs.nyu.edu
Voice:		(212) 998-3125
Snail:		Courant Institute of Math. Sci.
		251 Mercer St.
		New York,NY 10012

"Injustice anywhere is a threat to justice everywhere."
					- Martin Luther King, Jr.
Disclaimer: This is not a disclaimer.

smith@glinda.ctron.com (Larry Smith) (04/06/91)

In article <12932@goofy.Apple.COM> russell@apple.com (Russell Williams) writes:
>Re double indirection to solve free store fragmentation:; the first level 
>of pointer is often called a Handle or master pointer:

Thanks, I knew that.

>In article <1398@glinda.ctron.com> smith@glinda.ctron.com (Larry Smith) 
>writes:
>> Sadly, few compilers use this mechanism to implement
>> pointers.  C, C++, Pascal, etc all implement pointers as bare addresses. 
> Of
>> course, you can implement the above scheme yourself in C, etc, but then 
>you'll
>> have to dereference twice whereever the pointer is used.  This is why 
>you'll
>> always see things like "WindowRecord^^.HorizSize" in Macintosh Pascal 
>code
>
>First, Apple's MPW C++ and Symantec's Think C for the Mac both implement 
>this.  In MPW C++, classes derived from HandleObject get automatic double 
>indirection.  This is even more attractive in C++ than Pascal because 
>p^^.field in Pascal becomes (*p)->field or (**p).field in C.

The *compilers* do no such thing.  If you have to type (*p)->field then
you are dereferencing the handle yourself.  My point is, it's of scant use
if you have to do it yourself, as the rest of your posting pointed out.
I believe the *compiler* should handle all the ugly details.  When I say:

char *foobar = strdup("The rain in Spain falls mainly in the plain");

I want foobar to have a *handle* to the string.  If the system needs to
compact memory, fine!  Let it do so, whenever it wants.  Later, I want to
be able to say:

printf("%s\n",foobar);

And have the compiler issue the correct double-indirection instructions.

Bottom line: *pointers* should be implemented as *handles*.

>There are several costs of such schemes:
>1. Overhead is added to every object reference.  One workaround is to 
>dereference once before periods of heavy use of the referenced data 
>structure and pass a pointer.  The structure must then be locked against 
>movement and unlocked later.  This is one of the more common Mac 
>programming errors (forgetting to lock a dereferenced Handle).

And, yes, I am quite well aware that this will entail a hit in performance.
I would eagerly sacrifice more performance than this scheme entails to be able
to have a runtime errors like: ERROR - ATTEMPT TO REFER TO PREVIOUSLY FREED
MEMORY rather than having the offending line go ahead and bonk some random
data structure thereby costing me days of debugging agony.  C++ does almost
everything on the heap, it frags like nobody's business and wild pointers can
tromp over data structures at will.  Memory get used but not allocated,
freed and used again, allocated and never freed - even freed but never
allocated.  I'll take a nice clean coredump with a stack trace to the offending
line over a whacked data structure and a program that limps along and THEN
coredumps and points to a perfectly innocent line that has nothing to do with
the problem.

>2. It is difficult to make multiple inheritance work.  MI involves passing 
>around pointers to the middle of an object.  MPW C++ disallows MI for 
>HandleObjects (how would you like pointers represented as a 
><handle>,<offset> pair?).

Implementation detail.  If all pointers *were* handles this would not be a
problem, and, yes, pointers *should* be <handles><version_num> (and <offset>
if appropriate to an implementation).

>3. It is difficult to allocate such objects on the stack.  Since you can 
>take the address of a stack object, you'd have to allocate a dummy handle 
>on the stack for each stack HandleObject to provide the extra level of 
>indirection in case it was needed.  MPW C++ disallows stack allocation of 
>HandleObjects.

Again, it's the *distinction* between pointers and handles that causes the
problem.  If pointers were *implemented* as handles none of that would be
visible to the programmer.

>In sum, double indirection is a useful tool but not a panacea.  Many Mac 
>programmers long for the free and easy (as it were) memory usage style of 
>a virtual, directly addressed heap, but the handle-based heap does allow 
>Mac programs to run in a much more constrained memory environment.

It is not a panacea (you spell good for a programmer) but it is *far* better
than what we are dealing with now.  As for the "free and easy" virtually
addressed heap - THAT'S WHAT I GOT NOW AND IT'S KILLING ME WITH BUGS!  (I
work on a SPARC). Fragged heaps are a pain in the butt no matter whether
your heap is virtual or not, very large address spaces merely delay the
inevitable day when the system has enough memory but not all in one spot,
and it does NOTHING to help find dangling or wild pointers.

>*** Disclaimer: I do not represent Apple's official position on anything ***
*** Disclaimer: I do not represent Cabletron's official position on anything ***

Larry Smith
smith@ctron.com

ml27192@uxa.cso.uiuc.edu (Mark Lanett) (04/06/91)

smith@glinda.ctron.com (Larry Smith) writes:

[frag problem deleted]

>More modern systems use double-indirection to solve this problem.  Basically,
>the memory manager keeps a list of master pointers and only gives out pointers
>(in most cases, actually indexes) to these master pointers.  When the blocks
>are moved, the master pointers are updated, and, since the master pointers
>don't move, the program's pointers remain valid.  This scheme is no end of
>useful - you can, for example, augment each master pointer with a count.  You
>increment the count each time the master pointer is allocated.  When an
>actual pointer is assigned it gets a copy of this count and each time it is
>dereferenced the counts are checked.  If they don't match - BAM! - dangling
>pointer, the program is trying to reference freed (and maybe *reallocated*)
>memory.

>There are lots of other things you can do with this type of scheme, but I won't
>go into them now.  Sadly, few compilers use this mechanism to implement
>pointers.  C, C++, Pascal, etc all implement pointers as bare addresses.  Of
>course, you can implement the above scheme yourself in C, etc, but then you'll
>have to dereference twice whereever the pointer is used.  This is why you'll
>always see things like "WindowRecord^^.HorizSize" in Macintosh Pascal code -
>the Mac's menu manager does this, but the compiler is unaware of it.  Result:
>ugly code.

>It will be a great advance for the poor overworked programmers of the world
>when all compilers understand double-indirection schemes like this.  Until
>then references to free memory or an over-fragmented heap will continue to
>blow us out of the water - or force us to write code like the Mac's.

In fact, Apple's Object Pascal uses "handles" (their term for pointers to
master pointers) when dealing with objects. You say FooClass.fieldOrMember,
which then leads to an automatic double-indirection. The system gets to
move things around, and for the most part it's one happy system. To make
things even better, when you enter an object it's automatically locked so
any memory allocation you do inside the object doesn't cause it to get moved
around. Apple's version of CFront gives you these handle-based objects so
you can use them in C++ also, tho' you give up multiple inheritance (no great
loss -- yet). In C++ the class does the second dereferencing automatically so
you never see it. In both cases you may not allocate such an object on the heap.

>Larry Smith
>smith@ctron.com

--
//-----------------------------------------------------------------------------
Mark Lanett						ml27192@uxa.cs.uiuc.edu

Jim.Spencer@p510.f22.n282.z1.mn.org (Jim Spencer) (04/06/91)

Larry Smith writes in a message to All

LS> There are lots of other things you can do with this type of scheme, 
LS> but I won't go into them now. Sadly, few compilers use this mechanism 
LS> to implement pointers. C, C++, Pascal, etc all implement pointers 
LS> as bare addresses. Of course, you can implement the above scheme 
LS> yourself in C, etc, but then you'll have to dereference twice 
LS> whereever the pointer is used. This is why you'll always see 
LS> things like "WindowRecord^^.HorizSize" in Macintosh Pascal code 
LS> - the Mac's menu manager does this, but the compiler is unaware 
LS> of it. Result: ugly code. 

This isn't completely true.  Object Pascal as it has been implemented on the Mac returns handles to objects.  The methods for these objects can be called without dereferencing the handle.  MPW C++ similarly has, in order to be able to use MacApp, a handle based object.

russell@apple.com (Russell Williams) (04/08/91)

In article <1403@glinda.ctron.com> smith@glinda.ctron.com (Larry Smith) 
writes:
> >First, Apple's MPW C++ and Symantec's Think C for the Mac both 
implement 
> >this.  In MPW C++, classes derived from HandleObject get automatic 
double 
> >indirection.  This is even more attractive in C++ than Pascal because 
> >p^^.field in Pascal becomes (*p)->field or (**p).field in C.
> 
> The *compilers* do no such thing.  If you have to type (*p)->field then
> you are dereferencing the handle yourself.  My point is, it's of scant 
use
> if you have to do it yourself, as the rest of your posting pointed out.
> I believe the *compiler* should handle all the ugly details.  When I say:

Sorry to be unclear.  If you declare:
class foo : HandleObject { int i;... };
foo *pfoo = new foo;

You then refer to fields of pfoo like pfoo->i.  My comment in the previous 
posting referred to the desirability of doing this because of the extra 
awkwardness of writing the double indirection explicitly in a C-syntax 
language compared to a Pascal one.  That is, if you *don't* use 
HandleObjects, you have to write (*p)->field or (**p).field.  Think & MPW 
give you the syntactic sugar you want for double indirection.  

As for making everything double indirect, that's not a good choice, since 
of course it wouldn't be portable.  Many Mac toolbox routines need 
pointers rather than handles (WindowRecords are pointer and not 
handle-based, for instance).  If the compiler automagically dereferenced 
handles into pointers for them, it would also have to lock them to keep 
them from moving during the call.  What with callback functions, "hooks" 
and other nasties, it quickly becomes impossible for the compiler to keep 
track of what's going on.    There are also lots of folks who don't want 
to pay the overhead for double indirection all the time.

As for the difficulties of multiple inheritance and stack allocation, just 
making all pointers behave that way doesn't make things less difficult; 
those things must still be done.  Making pointers non-linear and larger 
than long will break lots of software (not to mention being slow), so you 
don't want to force it on people.  

By the way, I don't work on MPW C++, so I'm not arguing with you as to why 
these things should or shouldn't be done.  I'm just giving some reasons 
why they haven't been done.

frose@synoptics.COM (Flavio Rose) (04/10/91)

On the subject of double indirection:

I'm studying here at SynOptics the possible use of Borland
C++ to write Microsoft Windows applications in a medium
memory model. We would like to store the data for our class
instances mainly in the global heap as moveable segments.

I'm just beginning to study this issue and find it somewhat
confusing. Is there an analog to MPW C++'s HandleObject in
Borland C++? Any advice on this subject would be much
appreciated...

Flavio Rose
SynOptics Communications, Inc.