ecc_jim@ecc.tased.oz.au (03/20/91)
I posted this way back when comp.sys.acorn started up and didn't get a reply. In case it was our news program, here it is again: We are developing a package using C 1.3B that makes extensive use of tree structures for handling indexes into data files. These structures are loaded into memory when needed (using malloc) and discarded when finished with (using free). If the current wimpslot is too small to handle the data, it is automatically extended to accomodate it (this is handled by RISC_OSLIB). Unfortunately, when the associated memory has been "free'd" it is not handed back to the operating system for general usage. The memory gets re-used within the program so when another "malloc" is done the memory is not wasted. This has the unfortunate side effect of building the wimpslot up to a certain maximum level (depending on the size of the indexes). This level is approaching the limit for 1 megabyte machines - hence the problem. The practical upshot of this is that the package starts out using around 400K and then builds (after use) up to around 600K. So when the package is "idle" there is 200K of free memory inside the wimpslot that cannot be passed back to the operating system. ANSI C release 3 has an additional memory management tool called a flex pool. This is a method of claiming free operating system memory and then passing it back when finished with (Edit does this for the files it loads). There is, however, a restriction on a flex memory allocation (from page 225 of the ANSI C release 3 guide): "...you cannot have flex pointers within blocks of memory allocated by flex." This means that you cannot have linked lists or binary trees - the main reason that memory is being used by our application. That is the problem, does anyone have a solution? _______________________________________________________________________________ Jim Palfreyman Elizabeth Computer Centre jim@r140.tased.oz.au Link Road phone: +61 02 496999 Claremont fax: +61 02 496969 TAS 7011 Australia _______________________________________________________________________________ Imagine that Cray decides to make a personal computer. It has a 150 MIPS processor, 200 Mb of RAM, 1500 Mb of disc storage, a screen resolution of 4096 x 4096 pixels with 16777216 colours, relies entirely on voice recognition for input, fits in your shirt pocket and costs $300. What's the first question the computer community asks? "Is it PC compatible?" _______________________________________________________________________________
oystein@ifi.uio.no (ystein Carl Emil Hebnes) (03/23/91)
In article <1991Mar20.143227.247@ecc.tased.oz.au>, ecc_jim@ecc.tased.oz.au writes: > We are developing a package using C 1.3B that makes extensive use of tree > structures for handling indexes into data files. > > These structures are loaded into memory when needed (using malloc) and > discarded when finished with (using free). If the current wimpslot is too > small to handle the data, it is automatically extended to accomodate it > (this is handled by RISC_OSLIB). > > Unfortunately, when the associated memory has been "free'd" it is not handed > back to the operating system for general usage. The memory gets re-used > within the program so when another "malloc" is done the memory is not > wasted. > > This has the unfortunate side effect of building the wimpslot up to a > certain maximum level (depending on the size of the indexes). This level is > approaching the limit for 1 megabyte machines - hence the problem. > [...] > ANSI C release 3 has an additional memory management tool called a flex > pool. This is a method of claiming free operating system memory and then > passing it back when finished with (Edit does this for the files it loads). > There is, however, a restriction on a flex memory allocation (from page 225 > of the ANSI C release 3 guide): > > "...you cannot have flex pointers within blocks of memory allocated by > flex." > > This means that you cannot have linked lists or binary trees - the main > reason that memory is being used by our application. > > That is the problem, does anyone have a solution? > I'll try a quick, mindless reply to this. Say you want to build a B-tree or indeed any high-level data structure dynamically allocating room for elements (nodes) as you go along, and wanting to be able to free this space at will. If you use malloc(), the memory can't be reused by the system until the application quits, even though you free() it. So, you try flex. If you try to allocate room for the individual nodes, and then keep node pointers inside these memory areas, the flex manager will fail to update any pointers to a node as it moves it around the memory. The C manual explains this in more detail. So, you're stuck, right? Well, I did a wee bit of thinking, and came up with the following: Let's assume all the nodes have the same size (nsize = sizeof (struct foo)). We can't keep pointers to the nodes themselves, so we have to be a bit more devious. The first thing to do is to allocate an array of void pointers. Call this the anchor block. It will contain anchors, as required by the flex functions. It is vital that the anchor block is never moved around, so malloc() it, or flex_alloc() it before anything else (the first call to flex_alloc() is guaranteed to return a pointer to an area which will never be moved around.) Now, you're ready to allocate your first node. The thing to do, is to allocate a block of several nodes, say 100. This is the other type of block, a node block. The first anchor in the anchor block should point to this block (you pass that pointer's address as a parameter to flex_alloc().) As you allocate more nodes, you'll fill up the first node block, and eventually, you'll allocate another node block in the same way, and it'll be pointed to by the second anchor, and so on. Just make sure that no_of_anchors * nodes_per_block gives you enough nodes. An overestimate does no harm, anchor pointers are cheap. So, what about the internal node pointers in the structure? The left/right, next or parent pointers? They won't be ordinary pointers. They must contain 2 elements. First the node block number, and then the node number in the node block. That's the trick. To access e.g. the next node in a list, then, you first read the node block number, then aquire the actual node block address by following the right pointer in the anchor block, and then calculate the node address by a formula similar to node_block_start + (node_no - 1) * nsize. As you see, following pointers requires a bit more work than usual, but you gain the ability to dynamically manage memory. Which brings me to the subject of freeing memory again. You could maintain a free list for each node block, and when a block has its last node freed, you can flex_free() the whole node block. The best strategy for freeing depends on the pattern of insertions and deletions from the structure. If you do these in a stackwise or queuewise fashion, you can put many nodes in each node block and minimize administration overhead. If, however, insertions and deletions occur in random order for random nodes, the node blocks should be kept rather small, so that you don't get stuck with a lot of node blocks with only a few nodes in each. Please feel free to comment on and correct all the inaccuracies, overkills and actual errors in this method. I didn't consult the manuals on this one, and there's probably room for some improvement. Last, but not least: where's that C++ 2.1 compiler for the Archimedes ??? Acorn ??? _/ _ | _ . _ | "There are no problems, only 8 ft thick, 20 ft high, //\ \/ |_ -|- |_| | |/ || reinforced concrete walls (i.e. challenges!) /_/ / \_| |_ |_ | | || The solution? Nuke'em or dig under them (i.e. use a ________________________| Cray or apply a bit of lateral thinking.)"
john@acorn.co.uk (John Bowler) (03/27/91)
In article <1991Mar22.171119.11147@ifi.uio.no> oystein@ifi.uio.no (ystein Carl Emil Hebnes) writes: > >In article <1991Mar20.143227.247@ecc.tased.oz.au>, ecc_jim@ecc.tased.oz.au writes: >> We are developing a package using C 1.3B that makes extensive use of tree >> structures for handling indexes into data files. [[description of problem deleted]] >> ANSI C release 3 has an additional memory management tool called a flex >> pool. >>[...] >> There is, however, a restriction on a flex memory allocation (from page 225 >> of the ANSI C release 3 guide): >> >> "...you cannot have flex pointers within blocks of memory allocated by >> flex." >> >> This means that you cannot have linked lists or binary trees - the main >> reason that memory is being used by our application. > [summary deleted] > >Let's assume all the nodes have the same size (nsize = sizeof (struct foo)). >We can't keep pointers to the nodes themselves, so we have to be a bit more >devious. The first thing to do is to allocate an array of void pointers. >Call this the anchor block. It will contain anchors, as required by the flex >functions. It is vital that the anchor block is never moved around, so >malloc() it, or flex_alloc() it before anything else (the first call to >flex_alloc() is guaranteed to return a pointer to an area which will never >be moved around.) > >Now, you're ready to allocate your first node. The thing to do, is to >allocate a block of several nodes, say 100. This is the other type of block, >a node block. The first anchor in the anchor block should point to this block >(you pass that pointer's address as a parameter to flex_alloc().) >As you allocate more nodes, you'll fill up the first node block, and >eventually, you'll allocate another node block in the same way, and it'll >be pointed to by the second anchor, and so on. Just make sure that >no_of_anchors * nodes_per_block gives you enough nodes. An overestimate >does no harm, anchor pointers are cheap. > >So, what about the internal node pointers in the structure? The left/right, >next or parent pointers? >They won't be ordinary pointers. They must contain 2 elements. First the >node block number, and then the node number in the node block. That's the >trick. To access e.g. the next node in a list, then, you first read the >node block number, then aquire the actual node block address by following >the right pointer in the anchor block, and then calculate the node address >by a formula similar to node_block_start + (node_no - 1) * nsize. > >As you see, following pointers requires a bit more work than usual, but >you gain the ability to dynamically manage memory. To (probably mis)quote someone famous ``any problem can be solved by an extra level of indirection''. This is effectively what ystein Carl Emil Hebnes's algorithm does. A simpler implementation is to use the ``normal'' malloc algorithm with flex_alloc but store pointers to the flex anchors instead of the flex anchor values in the allocated data sturctures (ie instead of the pointer to the object store a pointer to the pointer). Ie:- struct btree { struct btree *left; struct btree *right } *btree_ptr; becomes:- struct flex_btree { struct flex_btree **left; struct flex_btree **right; } *flex_btree_ptr; And:- btree_ptr->left becomes *(flex_btree_ptr->left); /* Brackets for clarity */ To use this, you must allocate two objects per node; one is an anchor which must be allocated via malloc (or allocated statically, as flex_btree_ptr above), the other is the actual object: /* Caller must allocate (*anchor) */ void allocate_node(struct flex_btree **anchor) { flex_alloc(anchor, sizeof **anchor); /* (((The following may look expensive, but CSE will speed it * up as long as no functions are called))) */ (*anchor)->left = 0; (*anchor)->right = 0; /* +Other initialisation */ } So this might be used:- ... /* New node needed */ anchor = malloc(sizeof *anchor); /* struct flex_btree **anchor */ allocate_node(anchor); (*anchor_of_parent)->left = anchor; ... There is a problem:- Allocating individual anchors using malloc() may result in (very) bad internal fragmentation of the malloc() heap and is inefficient because of the one word overhead per mallocated block in the RISC OS implementation. (Ie you only get 50% storage utilisation at best for anchors). The answer to this is to write your own anchor allocator (and this allocator *can* be used for all anchors - not just flex_btree ones); use the standard free list approach. (NB - I have *only* compiled the following code; no testing! There should be some #ifdef DEBUG consistency checking in here...) #include <stdlib.h> extern void reclaim_memory(void); static void **anchor_free_list = 0; /* Following allocates anchors 128 at a time. */ #define HANDLE_BLOCK_SIZE 128 static void make_anchors(void) { void **new = malloc(HANDLE_BLOCK_SIZE * sizeof *new); if (new) { int i; for (i=0; i<(HANDLE_BLOCK_SIZE-1); ++i) new[i] = new+i+1; /* No cast, target is (void*) */ /* Paranoia... routine may be called when we still * have anchors? */ new[i] = anchor_free_list; anchor_free_list = new; return; } /* Out of memory - reclaim_memory() tries to get mallocated * space back from the rest of the system, causing the app * to fail *gracefully* if it cannot. */ reclaim_memory(); /* External to module */ make_anchors(); } void **new_anchor(void) { void **anchor; if (anchor_free_list == 0) make_anchors(); anchor = anchor_free_list; /* Isn't (void *) wonderful - no casting required! */ anchor_free_list = *anchor; /* Unlink head of list */) *anchor = 0; /* Paranoia */ return anchor; } void free_anchor(void **anchor) { #ifdef FREEING_ANCHOR_FREES_FLEX_OBJECT /* Following code is slightly dubious; the caller must * still be very careful to remove pointers to the anchor * before freeing it! Alternatively put in a paranoid * check here and abort the program if *anchor is != 0. */ if (*anchor) flex_free(anchor); #endif *anchor = anchor_free_list; anchor_free_list = anchor; } Even using this approach anchors are malloc()ed. If the node size in the original program is very small (so the total number of allocated nodes was actually very large) the overhead of the anchor allocations may still be unacceptable. You can't get round this with the current flex implementation - you really need to rearrange the anchors themselves and free up unused memory in the anchor free list. Notice that, despite the words in the manual, the first allocated flex block *CAN* move in 3.1B; this can arise if malloc() needs to extend the heap (effectively the malloc heap is the first flex block in 3.1B...). In the original posters problem it is very likely that the first flex block will move as a result of normal malloc()ing in the program (the program uses a lot of heap!) Because the stack extension mechanism uses the heap to obtain extra stack space such a movement may occur on a function call; see the example on page 226 of the manual; never pass (*anchor) as a function argument, or assign it to a variable. Future versions of the library will make it possible to avoid this potential problem, in particular the default behaviour will be that a program which uses flex will not be permitted to extend the heap (after the call to flex_init()). In *this* environment it is possible to place the anchors in the first flex block and to allocate new anchors using flex_extend() of that block (the rest of flex space will be shunted up to make the space). This is a general and useful paradigm; either:- 1) After flex_init() allocate the initial anchor array using flex_alloc() (make an initial call to make_anchors(), which would then need to be external). In make_anchors() use flex_extend() to increase the size of the anchor array. Every now and then go through the anchor_free_list stripping off anchor slots which are at the end of the array, then use flex_extend to contract the array - a call to a function to do this should go in reclaim_memory(). *DO NOT* use heap_init/heap_alloc in the program. Or:- 2) Do the same thing, but use heap_init/heap_alloc to allocate anchor buffers in make_anchors (use heap_alloc exactly as malloc() is used in make_anchors() above). Use heap_free() when an array of anchors is emptied. You could allocate anchors individually, but there *is* a storage overhead in heap_alloc() so this is not really sensible. RISC iX users should note that the RISC iX storage manager (malloc/free - no flex in RISC iX) does *not* have this storage overhead problem; it is sensible to use malloc() to allocate individual anchors in RISC iX as it implements the block allocation algorithm internally. On the other hand writers of portable UNIX programs should certainly not rely on this. General points: i) Use RISC OS flex plus an extra level of indirection to allow the OS to reclaim program data space. ii) flex anchors must be allocated in static data space or in the malloc heap. (The stack is in the heap and is static, so you can allocate automatically, but you *MUST* flex_free the anchor before the function returns). After 3.1B you will be able to allocate in the first flex block (or the heap_alloc heap), but don't do this at the moment. iii) Use free lists to make efficient use of heap space for small extensively used data structures. With most malloc() implementations free lists also have an enormous speed benefit (sometimes several orders of magnitude, but this does *not* apply to the RISC iX malloc). They also have the normal code correctness benefits gained from object specific interfaces; the compiler can do more type checking in the allocate and free function calls and the code can do better initialisation. Free lists can be used for flex objects, but the list must be maintained as a list of handles (meaning two objects per list entry) and the algorithm defeats the object of using flex (that it be possible to return memory to the OS when an object is no longer used). John Bowler (jbowler@acorn.co.uk)
torbenm@diku.dk (Torben [gidius Mogensen) (04/17/91)
gavinw@syma.sussex.ac.uk writes: >The design of the ARM makes >the use of base+offset descriptors less natural than straight >addresses as descriptors, because writeback modifies the base >address instead of modifying the offset register. >Two enhancements to future ARMs that I would like to see are >i) 'modify offset-register' writeback addressing-modes >ii) trap register-values-outside-given-range to speed up range >checking. >-- Gavin Wraith In the LDR and STR instructions, when you use the [Rn,Rm]! or [Rn],Rm modes to get base+offset addressing, there is no requirement to have Rn be base and Rm be offset - you might as well have Rm be base and Rn be offset, and thus get offset-register writeback. However, I don't think this is what you want, as the new offset-register is now not an offset to the base register. If you want to modify the offset you would want an address calculation with three values: base+offset+index, and write offset+index back to the offset register. This requires two additions and a writeback after the first, and can thus not be accomodated in the present cycle count of the LDR and STR instructions. An alternative solution (more in the RISC philosophy) is to use two instructions: ADD offset,offset,index; LDR Rd,[base,offset]. This is likely to have the same cycle count as a single instruction would. As for trapping register-values-outside-given-range, this would require a LOT of extra hardware in the CPU: two extra registers for each present registers and extra ALUs for range checking. A much better solution is to trap addresses out of range in load and store instructions. This would be logically be handled by MEMC. An idea would be to allocate a page for each array, and let MEMC trap addresses outside this range. This would require two things not present in the current architecture: variable sized pages, and the ability of load/store instructions to specify a page number. It would, however, allow all arrays to start at logical address zero (in the page). Addressing modes could be "Page Rp,[#m]", "Page Rp,[Rm]" and "Page Rp,[Rm,shift #s]", the idea being that the Rn register in the present addressing modes would refer to a page rather than being a base address. This would require only one extra bit in the LDR/STR instructions. As for hardware, the CPU must be able to use page+offset addressing modes in addition to logical addresses, and the MEMC must be able to handle (a large number of) variable sized pages. I believe this is far out of the range of likely developments for the ARM, but could be used in less "RISCy" designs. Torben Mogensen (torbenm@diku.dk)
nbvs@cl.cam.ac.uk (Nicko van Someren) (04/17/91)
In article <4826@syma.sussex.ac.uk> gavinw@syma.sussex.ac.uk writes: >Two enhancements to future ARMs that I would like to see are > >i) 'modify offset-register' writeback addressing-modes >ii) trap register-values-outside-given-range to speed up range >checking. While they are at it, they could make it a CISC processor! ;-) +-----------------------------------------------------------------------------+ | Nicko van Someren, nbvs@cl.cam.ac.uk, (44) 223 358707 or (44) 860 498903 | +-----------------------------------------------------------------------------+