[comp.sys.amiga.programmer] How fast/slow is memory DEallocation?

barrett@jhunix.HCF.JHU.EDU (Dan Barrett) (05/31/91)

	I wrote a program that uses a LARGE data structure with thousands of
pointers, and I'm now porting it to my Amiga 1000 (stock 68000).  I have
noticed that my data structure is being deallocated extremely slowly ---
almost excruciatingly so.

	The structure is a binary search tree in which every node contains
three linked lists and ANOTHER binary search tree (which contains 3 lists
and another tree, etc, recursively to any level).  When the structure has a
thousand nodes or so, and I deallocate it, it takes several minutes!
Building the structure took much less time, and there was more computation
going on then!

	At first, I thought the problem might be Manx's malloc()/free()
functions, so I switched to AllocMem/FreeMem.  This sped things up a bit,
but not much.

	I have a feeling that this is a system-dependent issue and not an
error in my code.  The program already runs on 3 other CPU's, and
deallocation is lightning fast.  I am using Manx 5.0d.

	Any suggestions?

                                                        Dan

 //////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
| Dan Barrett, Department of Computer Science      Johns Hopkins University |
| INTERNET:   barrett@cs.jhu.edu           |                                |
| COMPUSERVE: >internet:barrett@cs.jhu.edu | UUCP:   barrett@jhunix.UUCP    |
 \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////

barrett@jhunix.HCF.JHU.EDU (Dan Barrett) (06/01/91)

In article <8504@jhunix.HCF.JHU.EDU> I wrote:
>I have noticed that my [LARGE] data structure is being deallocated extremely
>slowly --- almost excruciatingly so.
>At first, I thought the problem might be Manx's malloc()/free()
>functions, so I switched to AllocMem/FreeMem.  This sped things up a bit,
>but not much.

	The problem is solved.  Guess what it was??  I forgot to 
#include <functions.h>, so AllocMem/FreeMem had no prototypes.  For some
reason, this caused FreeMem to be VERY slow:  probably things were being
implicitly coerced in a bad way....

	Here's a little benchmark to tickle your fancy.  This program
does several thousand allocate/deallocate calls on a data structure with
(appropriately) several thousand nodes.  STACK was set to 100,000 just
in case.

	Using AllocMem/FreeMem:

		run time == 30 seconds

	Using MANX malloc/free:

		run time == 5 minutes, 35 seconds

Well, I was certainly surprised!

                                                        Dan

 //////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
| Dan Barrett, Department of Computer Science      Johns Hopkins University |
| INTERNET:   barrett@cs.jhu.edu           |                                |
| COMPUSERVE: >internet:barrett@cs.jhu.edu | UUCP:   barrett@jhunix.UUCP    |
 \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////

dillon@overload.Berkeley.CA.US (Matthew Dillon) (06/01/91)

In article <8504@jhunix.HCF.JHU.EDU> barrett@jhunix.HCF.JHU.EDU (Dan Barrett) writes:
>
>	I wrote a program that uses a LARGE data structure with thousands of
>pointers, and I'm now porting it to my Amiga 1000 (stock 68000).  I have
>noticed that my data structure is being deallocated extremely slowly ---
>almost excruciatingly so.
>..
>	I have a feeling that this is a system-dependent issue and not an
>error in my code.  The program already runs on 3 other CPU's, and
>deallocation is lightning fast.  I am using Manx 5.0d.
>
>	Any suggestions?
>
>							 Dan

    If the system must allocate & deallocate similar-sized objects then
    I generally cache deallocated objects in a list instead of calling
    free() or FreeMem().  When you want to allocate an object simply check
    the appropriate list first.  If the list is not empty simply pop the
    object off of it, zero its memory (if necessary), and return it.

    Using singly linked lists for your freelist is generally the easiest
    way to implement this.  Example:

    #define CACHE_MAX	256		/*  cache 0-255 byte allocations */
    #define CACHE_SIZE	(CACHE_MAX/4)   /*  number of entries in cache   */
    void *CacheAry[CACHE_MAX/4];

    void *
    MyAlloc(bytes)
    long bytes;
    {
	void **ptr;
	long lws = (bytes + 3) >> 2;    /*  # of longwords  */

	if (lws < CACHE_MAX/4) {
	    void *res;

	    ptr = CacheAry + lws;
	    if (res = *ptr) {           /*  if list not empty   */
		*ptr = *(void **)res;   /*  unlink from list    */

		/*
		 *  zero object here (??)
		 */

		return(res);
	    }
	    /*
	     *	list empty, fall through to normal allocation
	     */
	}

	Do Normal Allocation of (lws * 4) bytes here

	/*
	 *  zero object here (??) then return object pointer.
	 */

    }

    void
    MyFree(res, bytes)
    void *res;
    long bytes;
    {
	long lws = (bytes + 3) >> 2;    /*  # of longwords      */

	if (lws < CACHE_MAX/4) {        /*  add to linked list  */
	    void **ptr = CacheAry + lws;

	    *(void **)res = *ptr;       /*  link into list      */
	    *ptr = res;
	} else {

	    Do Normal deallocation here

	}
    }


    Now, obviously that is kind of rough.  For one, if you are doing a
    *lot* of allocations you can cut down by changing 'normal allocation'
    to allocate out of a large block which is allocated in, say, 4K chunks.

    In some cases, when allocating and deallocating huge numbers of objects
    you might want to add code to return memory all the way back to the
    system when the free-list(s) get too large (too many deallocated
    objects in the freelist that could be free'd back to the system)

    But, essentially, the above is pretty much what you want.  Note
    especially the fact that allocations are rounded up to the nearest four
    byte boundry.  Do NOT make the mistake of allocating only enough to
    satisfy 'bytes' in the 'Do normal allocation' part above, this is a
    quick way to crash because the deallocator will then cache the object
    which may later be allocated from the same list with a request 3 bytes
    larger.  That's about the only sneakiness in the code.

    P.S. I haven't tested the above.

					    -Matt

>
> //////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
>| Dan Barrett, Department of Computer Science	    Johns Hopkins University |
>| INTERNET:   barrett@cs.jhu.edu	    |				     |
>| COMPUSERVE: >internet:barrett@cs.jhu.edu | UUCP:   barrett@jhunix.UUCP    |
> \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////

--

    Matthew Dillon	    dillon@Overload.Berkeley.CA.US
    891 Regal Rd.	    uunet.uu.net!overload!dillon
    Berkeley, Ca. 94708
    USA

barrett@jhunix.HCF.JHU.EDU (Dan Barrett) (06/01/91)

In article <dillon.8233@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>    If the system must allocate & deallocate similar-sized objects then
>    I generally cache deallocated objects in a list instead of calling
>    free() or FreeMem().

	Won't work for my application, unfortunately.  I do all the
allocations first, traverse the data structure, and then deallocate the
whole thing.  There are no intermediate dealloc/allocs.

	See my previous post for the solution to what was wrong:  with
no prototypes for AllocMem/FreeMem, they behaved incorrectly.

                                                        Dan

 //////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\
| Dan Barrett, Department of Computer Science      Johns Hopkins University |
| INTERNET:   barrett@cs.jhu.edu           |                                |
| COMPUSERVE: >internet:barrett@cs.jhu.edu | UUCP:   barrett@jhunix.UUCP    |
 \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////

umueller@iiic.ethz.ch (Urban Dominik Mueller) (06/07/91)

In article <8524@jhunix.HCF.JHU.EDU> barrett@jhunix.HCF.JHU.EDU (Dan Barrett) writes:
>In article <dillon.8233@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes:
>>    [cache small similar-sized allocations]
>
>	Won't work for my application, unfortunately.  I do all the
>allocations first, traverse the data structure, and then deallocate the
>whole thing.  There are no intermediate dealloc/allocs.

The situation you describe is a most typical application for a second
speedup scheme. I assume all your tree nodes have the same size, if
not, you have to change the function to distribute bytes:

int NumberOfFreeItems=0;

void *ItemAlloc()
{
    if (NumberOfFreeItems==0) {
	AllocArrayOf30Items();
	PutArrayInLinkedList();
	NumberOfFreeItems=30;
    }
    return AllocedArray[--NumberOfFreeItems];
}

Note there is no ItemFree(), you can free only all items at once. This
tailored alloc will reduce memory efficiency, but speed up allocations
and deallocations a lot.

    -Dominik