barrett@jhunix.HCF.JHU.EDU (Dan Barrett) (05/31/91)
I wrote a program that uses a LARGE data structure with thousands of pointers, and I'm now porting it to my Amiga 1000 (stock 68000). I have noticed that my data structure is being deallocated extremely slowly --- almost excruciatingly so. The structure is a binary search tree in which every node contains three linked lists and ANOTHER binary search tree (which contains 3 lists and another tree, etc, recursively to any level). When the structure has a thousand nodes or so, and I deallocate it, it takes several minutes! Building the structure took much less time, and there was more computation going on then! At first, I thought the problem might be Manx's malloc()/free() functions, so I switched to AllocMem/FreeMem. This sped things up a bit, but not much. I have a feeling that this is a system-dependent issue and not an error in my code. The program already runs on 3 other CPU's, and deallocation is lightning fast. I am using Manx 5.0d. Any suggestions? Dan //////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ | Dan Barrett, Department of Computer Science Johns Hopkins University | | INTERNET: barrett@cs.jhu.edu | | | COMPUSERVE: >internet:barrett@cs.jhu.edu | UUCP: barrett@jhunix.UUCP | \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////
barrett@jhunix.HCF.JHU.EDU (Dan Barrett) (06/01/91)
In article <8504@jhunix.HCF.JHU.EDU> I wrote: >I have noticed that my [LARGE] data structure is being deallocated extremely >slowly --- almost excruciatingly so. >At first, I thought the problem might be Manx's malloc()/free() >functions, so I switched to AllocMem/FreeMem. This sped things up a bit, >but not much. The problem is solved. Guess what it was?? I forgot to #include <functions.h>, so AllocMem/FreeMem had no prototypes. For some reason, this caused FreeMem to be VERY slow: probably things were being implicitly coerced in a bad way.... Here's a little benchmark to tickle your fancy. This program does several thousand allocate/deallocate calls on a data structure with (appropriately) several thousand nodes. STACK was set to 100,000 just in case. Using AllocMem/FreeMem: run time == 30 seconds Using MANX malloc/free: run time == 5 minutes, 35 seconds Well, I was certainly surprised! Dan //////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ | Dan Barrett, Department of Computer Science Johns Hopkins University | | INTERNET: barrett@cs.jhu.edu | | | COMPUSERVE: >internet:barrett@cs.jhu.edu | UUCP: barrett@jhunix.UUCP | \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////
dillon@overload.Berkeley.CA.US (Matthew Dillon) (06/01/91)
In article <8504@jhunix.HCF.JHU.EDU> barrett@jhunix.HCF.JHU.EDU (Dan Barrett) writes: > > I wrote a program that uses a LARGE data structure with thousands of >pointers, and I'm now porting it to my Amiga 1000 (stock 68000). I have >noticed that my data structure is being deallocated extremely slowly --- >almost excruciatingly so. >.. > I have a feeling that this is a system-dependent issue and not an >error in my code. The program already runs on 3 other CPU's, and >deallocation is lightning fast. I am using Manx 5.0d. > > Any suggestions? > > Dan If the system must allocate & deallocate similar-sized objects then I generally cache deallocated objects in a list instead of calling free() or FreeMem(). When you want to allocate an object simply check the appropriate list first. If the list is not empty simply pop the object off of it, zero its memory (if necessary), and return it. Using singly linked lists for your freelist is generally the easiest way to implement this. Example: #define CACHE_MAX 256 /* cache 0-255 byte allocations */ #define CACHE_SIZE (CACHE_MAX/4) /* number of entries in cache */ void *CacheAry[CACHE_MAX/4]; void * MyAlloc(bytes) long bytes; { void **ptr; long lws = (bytes + 3) >> 2; /* # of longwords */ if (lws < CACHE_MAX/4) { void *res; ptr = CacheAry + lws; if (res = *ptr) { /* if list not empty */ *ptr = *(void **)res; /* unlink from list */ /* * zero object here (??) */ return(res); } /* * list empty, fall through to normal allocation */ } Do Normal Allocation of (lws * 4) bytes here /* * zero object here (??) then return object pointer. */ } void MyFree(res, bytes) void *res; long bytes; { long lws = (bytes + 3) >> 2; /* # of longwords */ if (lws < CACHE_MAX/4) { /* add to linked list */ void **ptr = CacheAry + lws; *(void **)res = *ptr; /* link into list */ *ptr = res; } else { Do Normal deallocation here } } Now, obviously that is kind of rough. For one, if you are doing a *lot* of allocations you can cut down by changing 'normal allocation' to allocate out of a large block which is allocated in, say, 4K chunks. In some cases, when allocating and deallocating huge numbers of objects you might want to add code to return memory all the way back to the system when the free-list(s) get too large (too many deallocated objects in the freelist that could be free'd back to the system) But, essentially, the above is pretty much what you want. Note especially the fact that allocations are rounded up to the nearest four byte boundry. Do NOT make the mistake of allocating only enough to satisfy 'bytes' in the 'Do normal allocation' part above, this is a quick way to crash because the deallocator will then cache the object which may later be allocated from the same list with a request 3 bytes larger. That's about the only sneakiness in the code. P.S. I haven't tested the above. -Matt > > //////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ >| Dan Barrett, Department of Computer Science Johns Hopkins University | >| INTERNET: barrett@cs.jhu.edu | | >| COMPUSERVE: >internet:barrett@cs.jhu.edu | UUCP: barrett@jhunix.UUCP | > \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\///////////////////////////////////// -- Matthew Dillon dillon@Overload.Berkeley.CA.US 891 Regal Rd. uunet.uu.net!overload!dillon Berkeley, Ca. 94708 USA
barrett@jhunix.HCF.JHU.EDU (Dan Barrett) (06/01/91)
In article <dillon.8233@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: > If the system must allocate & deallocate similar-sized objects then > I generally cache deallocated objects in a list instead of calling > free() or FreeMem(). Won't work for my application, unfortunately. I do all the allocations first, traverse the data structure, and then deallocate the whole thing. There are no intermediate dealloc/allocs. See my previous post for the solution to what was wrong: with no prototypes for AllocMem/FreeMem, they behaved incorrectly. Dan //////////////////////////////////////\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\ | Dan Barrett, Department of Computer Science Johns Hopkins University | | INTERNET: barrett@cs.jhu.edu | | | COMPUSERVE: >internet:barrett@cs.jhu.edu | UUCP: barrett@jhunix.UUCP | \\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\/////////////////////////////////////
umueller@iiic.ethz.ch (Urban Dominik Mueller) (06/07/91)
In article <8524@jhunix.HCF.JHU.EDU> barrett@jhunix.HCF.JHU.EDU (Dan Barrett) writes: >In article <dillon.8233@overload.Berkeley.CA.US> dillon@overload.Berkeley.CA.US (Matthew Dillon) writes: >> [cache small similar-sized allocations] > > Won't work for my application, unfortunately. I do all the >allocations first, traverse the data structure, and then deallocate the >whole thing. There are no intermediate dealloc/allocs. The situation you describe is a most typical application for a second speedup scheme. I assume all your tree nodes have the same size, if not, you have to change the function to distribute bytes: int NumberOfFreeItems=0; void *ItemAlloc() { if (NumberOfFreeItems==0) { AllocArrayOf30Items(); PutArrayInLinkedList(); NumberOfFreeItems=30; } return AllocedArray[--NumberOfFreeItems]; } Note there is no ItemFree(), you can free only all items at once. This tailored alloc will reduce memory efficiency, but speed up allocations and deallocations a lot. -Dominik