[comp.sys.acorn] Memory handling

ecc_jim@ecc.tased.oz.au (03/20/91)

I posted this way back when comp.sys.acorn started up and didn't get a
reply. In case it was our news program, here it is again:

We are developing a package using C 1.3B that makes extensive use of tree
structures for handling indexes into data files.

These structures are loaded into memory when needed (using malloc) and
discarded when finished with (using free). If the current wimpslot is too
small to handle the data, it is automatically extended to accomodate it
(this is handled by RISC_OSLIB).

Unfortunately, when the associated memory has been "free'd" it is not handed
back to the operating system for general usage. The memory gets re-used
within the program so when another "malloc" is done the memory is not
wasted.

This has the unfortunate side effect of building the wimpslot up to a
certain maximum level (depending on the size of the indexes). This level is
approaching the limit for 1 megabyte machines - hence the problem.

The practical upshot of this is that the package starts out using around
400K and then builds (after use) up to around 600K. So when the package is
"idle" there is 200K of free memory inside the wimpslot that cannot be
passed back to the operating system.

ANSI C release 3 has an additional memory management tool called a flex
pool. This is a method of claiming free operating system memory and then
passing it back when finished with (Edit does this for the files it loads).
There is, however, a restriction on a flex memory allocation (from page 225
of the ANSI C release 3 guide):

   "...you cannot have flex pointers within blocks of memory allocated by
flex."

This means that you cannot have linked lists or binary trees - the main
reason that memory is being used by our application.

That is the problem, does anyone have a solution?

_______________________________________________________________________________

          Jim Palfreyman                  Elizabeth Computer Centre 
          jim@r140.tased.oz.au            Link Road
          phone: +61 02 496999            Claremont
            fax: +61 02 496969            TAS 7011  Australia

_______________________________________________________________________________

           Imagine that Cray decides to make a personal computer.

 It has a 150 MIPS processor, 200 Mb of RAM, 1500 Mb of disc storage, a screen
    resolution of 4096 x 4096 pixels with 16777216 colours, relies entirely
    on voice recognition for input, fits in your shirt pocket and costs $300.

          What's the first question the computer community asks?

                           "Is it PC compatible?"

_______________________________________________________________________________

oystein@ifi.uio.no (ystein Carl Emil Hebnes) (03/23/91)

In article <1991Mar20.143227.247@ecc.tased.oz.au>, ecc_jim@ecc.tased.oz.au writes: 
> We are developing a package using C 1.3B that makes extensive use of tree
> structures for handling indexes into data files.
> 
> These structures are loaded into memory when needed (using malloc) and
> discarded when finished with (using free). If the current wimpslot is too
> small to handle the data, it is automatically extended to accomodate it
> (this is handled by RISC_OSLIB).
> 
> Unfortunately, when the associated memory has been "free'd" it is not handed
> back to the operating system for general usage. The memory gets re-used
> within the program so when another "malloc" is done the memory is not
> wasted.
> 
> This has the unfortunate side effect of building the wimpslot up to a
> certain maximum level (depending on the size of the indexes). This level is
> approaching the limit for 1 megabyte machines - hence the problem.
> [...]
> ANSI C release 3 has an additional memory management tool called a flex
> pool. This is a method of claiming free operating system memory and then
> passing it back when finished with (Edit does this for the files it loads).
> There is, however, a restriction on a flex memory allocation (from page 225
> of the ANSI C release 3 guide):
> 
>    "...you cannot have flex pointers within blocks of memory allocated by
> flex."
> 
> This means that you cannot have linked lists or binary trees - the main
> reason that memory is being used by our application.
> 
> That is the problem, does anyone have a solution?
> 

I'll try a quick, mindless reply to this.

Say you want to build a B-tree or indeed any high-level data structure
dynamically allocating room for elements (nodes) as you go along, and
wanting to be able to free this space at will.

If you use malloc(), the memory can't be reused by the system until the
application quits, even though you free() it.

So, you try flex. If you try to allocate room for the individual nodes,
and then keep node pointers inside these memory areas, the flex manager
will fail to update any pointers to a node as it moves it around the
memory. The C manual explains this in more detail. So, you're stuck, right?
Well, I did a wee bit of thinking, and came up with the following:

Let's assume all the nodes have the same size (nsize = sizeof (struct foo)).
We can't keep pointers to the nodes themselves, so we have to be a bit more
devious. The first thing to do is to allocate an array of void pointers.
Call this the anchor block. It will contain anchors, as required by the flex
functions. It is vital that the anchor block is never moved around, so
malloc() it, or flex_alloc() it before anything else (the first call to
flex_alloc() is guaranteed to return a pointer to an area which will never
be moved around.)

Now, you're ready to allocate your first node. The thing to do, is to
allocate a block of several nodes, say 100. This is the other type of block,
a node block. The first anchor in the anchor block should point to this block
(you pass that pointer's address as a parameter to flex_alloc().)
As you allocate more nodes, you'll fill up the first node block, and
eventually, you'll allocate another node block in the same way, and it'll
be pointed to by the second anchor, and so on. Just make sure that
no_of_anchors * nodes_per_block gives you enough nodes. An overestimate
does no harm, anchor pointers are cheap.

So, what about the internal node pointers in the structure? The left/right,
next or parent pointers?
They won't be ordinary pointers. They must contain 2 elements. First the
node block number, and then the node number in the node block. That's the
trick. To access e.g. the next node in a list, then, you first read the
node block number, then aquire the actual node block address by following
the right pointer in the anchor block, and then calculate the node address
by a formula similar to node_block_start + (node_no - 1) * nsize.

As you see, following pointers requires a bit more work than usual, but
you gain the ability to dynamically manage memory. Which brings me to the
subject of freeing memory again. You could maintain a free list for each
node block, and when a block has its last node freed, you can flex_free()
the whole node block. The best strategy for freeing depends on the
pattern of insertions and deletions from the structure. If you do these
in a stackwise or queuewise fashion, you can put many nodes in each node
block and minimize administration overhead. If, however, insertions and
deletions occur in random order for random nodes, the node blocks should
be kept rather small, so that you don't get stuck with a lot of node blocks
with only a few nodes in each.

Please feel free to comment on and correct all the inaccuracies, overkills
and actual errors in this method. I didn't consult the manuals on this one,
and there's probably room for some improvement.

Last, but not least: where's that C++ 2.1 compiler for the Archimedes ???
Acorn ???

 _/     _  |   _  .   _ | "There are no problems, only 8 ft thick, 20 ft high,
//\ \/ |_ -|- |_| | |/ ||  reinforced concrete walls (i.e. challenges!)
/_/ /  \_| |_ |_  | |  ||  The solution? Nuke'em or dig under them (i.e. use a
________________________|  Cray or apply a bit of lateral thinking.)"

john@acorn.co.uk (John Bowler) (03/27/91)

In article <1991Mar22.171119.11147@ifi.uio.no> oystein@ifi.uio.no (ystein Carl Emil Hebnes) writes:
>
>In article <1991Mar20.143227.247@ecc.tased.oz.au>, ecc_jim@ecc.tased.oz.au writes: 
>> We are developing a package using C 1.3B that makes extensive use of tree
>> structures for handling indexes into data files.
[[description of problem deleted]]
>> ANSI C release 3 has an additional memory management tool called a flex
>> pool.
>>[...]
>> There is, however, a restriction on a flex memory allocation (from page 225
>> of the ANSI C release 3 guide):
>> 
>>    "...you cannot have flex pointers within blocks of memory allocated by
>> flex."
>> 
>> This means that you cannot have linked lists or binary trees - the main
>> reason that memory is being used by our application.
>
[summary deleted]
>
>Let's assume all the nodes have the same size (nsize = sizeof (struct foo)).
>We can't keep pointers to the nodes themselves, so we have to be a bit more
>devious. The first thing to do is to allocate an array of void pointers.
>Call this the anchor block. It will contain anchors, as required by the flex
>functions. It is vital that the anchor block is never moved around, so
>malloc() it, or flex_alloc() it before anything else (the first call to
>flex_alloc() is guaranteed to return a pointer to an area which will never
>be moved around.)
>
>Now, you're ready to allocate your first node. The thing to do, is to
>allocate a block of several nodes, say 100. This is the other type of block,
>a node block. The first anchor in the anchor block should point to this block
>(you pass that pointer's address as a parameter to flex_alloc().)
>As you allocate more nodes, you'll fill up the first node block, and
>eventually, you'll allocate another node block in the same way, and it'll
>be pointed to by the second anchor, and so on. Just make sure that
>no_of_anchors * nodes_per_block gives you enough nodes. An overestimate
>does no harm, anchor pointers are cheap.
>
>So, what about the internal node pointers in the structure? The left/right,
>next or parent pointers?
>They won't be ordinary pointers. They must contain 2 elements. First the
>node block number, and then the node number in the node block. That's the
>trick. To access e.g. the next node in a list, then, you first read the
>node block number, then aquire the actual node block address by following
>the right pointer in the anchor block, and then calculate the node address
>by a formula similar to node_block_start + (node_no - 1) * nsize.
>
>As you see, following pointers requires a bit more work than usual, but
>you gain the ability to dynamically manage memory.

To (probably mis)quote someone famous ``any problem can be solved
by an extra level of indirection''.  This is effectively what ystein
Carl Emil Hebnes's algorithm does.  A simpler implementation is to
use the ``normal'' malloc algorithm with flex_alloc but store
pointers to the flex anchors instead of the flex anchor values in
the allocated data sturctures (ie instead of the pointer to the object
store a pointer to the pointer).  Ie:-

struct btree {
	struct btree	*left;
	struct btree	*right
} *btree_ptr;

becomes:-

struct flex_btree {
	struct flex_btree	**left;
	struct flex_btree	**right;
} *flex_btree_ptr;

And:-

	btree_ptr->left

becomes

	*(flex_btree_ptr->left);	/* Brackets for clarity */

To use this, you must allocate two objects per node; one is an
anchor which must be allocated via malloc (or allocated statically,
as flex_btree_ptr above), the other is the actual object:

/* Caller must allocate (*anchor) */
void allocate_node(struct flex_btree **anchor) {
	flex_alloc(anchor, sizeof **anchor);
	/* (((The following may look expensive, but CSE will speed it
	 * up as long as no functions are called))) */
	(*anchor)->left = 0;
	(*anchor)->right = 0;
	/* +Other initialisation */
}

So this might be used:-

	...
	/* New node needed */
	anchor = malloc(sizeof *anchor);    /* struct flex_btree **anchor */
	allocate_node(anchor);
	(*anchor_of_parent)->left = anchor;
	...

There is a problem:-

Allocating individual anchors using malloc() may result in (very)
bad internal fragmentation of the malloc() heap and is inefficient
because of the one word overhead per mallocated block in the RISC
OS implementation.  (Ie you only get 50% storage utilisation at
best for anchors).

The answer to this is to write your own anchor allocator (and this
allocator *can* be used for all anchors - not just flex_btree ones);
use the standard free list approach.  (NB - I have *only* compiled
the following code; no testing!  There should be some #ifdef DEBUG
consistency checking in here...)

#include <stdlib.h>

extern void reclaim_memory(void);

static void **anchor_free_list = 0;

/* Following allocates anchors 128 at a time. */

#define HANDLE_BLOCK_SIZE 128

static void make_anchors(void) {
	void **new = malloc(HANDLE_BLOCK_SIZE * sizeof *new);
	if (new) {
		int i;

		for (i=0; i<(HANDLE_BLOCK_SIZE-1); ++i)
			new[i] = new+i+1;   /* No cast, target is (void*) */
		/* Paranoia... routine may be called when we still
		 * have anchors? */
		new[i] = anchor_free_list;
		anchor_free_list = new;
		return;
	}

	/* Out of memory - reclaim_memory() tries to get mallocated
	 * space back from the rest of the system, causing the app
	 * to fail *gracefully* if it cannot.
	 */
	reclaim_memory();	/* External to module */
	make_anchors();
}

void **new_anchor(void) {
	void **anchor;

	if (anchor_free_list == 0) make_anchors();
	anchor = anchor_free_list;
	/* Isn't (void *) wonderful - no casting required! */
	anchor_free_list = *anchor;	/* Unlink head of list */)
	*anchor = 0;			/* Paranoia */
	return anchor;
}

void free_anchor(void **anchor) {
#ifdef FREEING_ANCHOR_FREES_FLEX_OBJECT
	/* Following code is slightly dubious; the caller must
	 * still be very careful to remove pointers to the anchor
	 * before freeing it!  Alternatively put in a paranoid
	 * check here and abort the program if *anchor is != 0.
	 */
	if (*anchor) flex_free(anchor);
#endif
	*anchor = anchor_free_list;
	anchor_free_list = anchor;
}

Even using this approach anchors are malloc()ed.  If the node size
in the original program is very small (so the total number of
allocated nodes was actually very large) the overhead of the anchor
allocations may still be unacceptable.  You can't get round this
with the current flex implementation - you really need to rearrange
the anchors themselves and free up unused memory in the anchor free
list.

Notice that, despite the words in the manual, the first allocated
flex block *CAN* move in 3.1B; this can arise if malloc() needs to
extend the heap (effectively the malloc heap is the first flex
block in 3.1B...).  In the original posters problem it is very
likely that the first flex block will move as a result of normal
malloc()ing in the program (the program uses a lot of heap!)
Because the stack extension mechanism uses the heap to obtain extra
stack space such a movement may occur on a function call; see the
example on page 226 of the manual; never pass (*anchor) as a function
argument, or assign it to a variable.

Future versions of the library will make it possible to avoid this
potential problem, in particular the default behaviour will be that
a program which uses flex will not be permitted to extend the heap
(after the call to flex_init()).  In *this* environment it
is possible to place the anchors in the first flex block and to
allocate new anchors using flex_extend() of that block (the rest
of flex space will be shunted up to make the space).  This is a
general and useful paradigm; either:-

1) After flex_init() allocate the initial anchor array using
flex_alloc() (make an initial call to make_anchors(), which would
then need to be external).  In make_anchors() use flex_extend() to
increase the size of the anchor array.  Every now and then go
through the anchor_free_list stripping off anchor slots which are
at the end of the array, then use flex_extend to contract the array
- a call to a function to do this should go in reclaim_memory().
*DO NOT* use heap_init/heap_alloc in the program.

Or:-

2) Do the same thing, but use heap_init/heap_alloc to allocate
anchor buffers in make_anchors (use heap_alloc exactly as malloc()
is used in make_anchors() above).  Use heap_free() when an array
of anchors is emptied.  You could allocate anchors individually,
but there *is* a storage overhead in heap_alloc() so this is not
really sensible.

RISC iX users should note that the RISC iX storage manager (malloc/free
- no flex in RISC iX) does *not* have this storage overhead problem;
it is sensible to use malloc() to allocate individual anchors in
RISC iX as it implements the block allocation algorithm internally.
On the other hand writers of portable UNIX programs should certainly
not rely on this.

General points:

i) Use RISC OS flex plus an extra level of indirection to allow
the OS to reclaim program data space.

ii) flex anchors must be allocated in static data space or in the
malloc heap.  (The stack is in the heap and is static, so you can
allocate automatically, but you *MUST* flex_free the anchor before
the function returns).  After 3.1B you will be able to allocate in
the first flex block (or the heap_alloc heap), but don't do this
at the moment.

iii) Use free lists to make efficient use of heap space for small
extensively used data structures.  With most malloc() implementations
free lists also have an enormous speed benefit (sometimes several
orders of magnitude, but this does *not* apply to the RISC iX
malloc).  They also have the normal code correctness benefits gained
from object specific interfaces; the compiler can do more type
checking in the allocate and free function calls and the code can
do better initialisation.

Free lists can be used for flex objects, but the list must be
maintained as a list of handles (meaning two objects per list entry)
and the algorithm defeats the object of using flex (that it be
possible to return memory to the OS when an object is no longer
used).

John Bowler (jbowler@acorn.co.uk)

torbenm@diku.dk (Torben [gidius Mogensen) (04/17/91)

gavinw@syma.sussex.ac.uk writes:

>The design of the ARM makes
>the use of base+offset descriptors less natural than straight
>addresses as descriptors, because writeback modifies the base
>address instead of modifying the offset register.

>Two enhancements to future ARMs that I would like to see are

>i) 'modify offset-register' writeback addressing-modes
>ii) trap register-values-outside-given-range to speed up range
>checking.

>-- Gavin Wraith

In the LDR and STR instructions, when you use the [Rn,Rm]! or [Rn],Rm
modes to get base+offset addressing, there is no requirement to have
Rn be base and Rm be offset - you might as well have Rm be base and Rn
be offset, and thus get offset-register writeback. However, I don't
think this is what you want, as the new offset-register is now not an
offset to the base register. If you want to modify the offset you
would want an address calculation with three values: base+offset+index, 
and write offset+index back to the offset register. This requires two
additions and a writeback after the first, and can thus not be
accomodated in the present cycle count of the LDR and STR instructions.
An alternative solution (more in the RISC philosophy) is to use two
instructions: ADD offset,offset,index; LDR Rd,[base,offset]. This is
likely to have the same cycle count as a single instruction would.

As for trapping register-values-outside-given-range, this would
require a LOT of extra hardware in the CPU: two extra registers for
each present registers and extra ALUs for range checking. A much
better solution is to trap addresses out of range in load and store
instructions. This would be logically be handled by MEMC. An idea
would be to allocate a page for each array, and let MEMC trap
addresses outside this range. This would require two things not
present in the current architecture: variable sized pages, and the
ability of load/store instructions to specify a page number. It would,
however, allow all arrays to start at logical address zero (in the
page). Addressing modes could be "Page Rp,[#m]", "Page Rp,[Rm]" and
"Page Rp,[Rm,shift #s]", the idea being that the Rn register in the
present addressing modes would refer to a page rather than being a
base address. This would require only one extra bit in the LDR/STR
instructions. As for hardware, the CPU must be able to use page+offset
addressing modes in addition to logical addresses, and the MEMC must
be able to handle (a large number of) variable sized pages.

I believe this is far out of the range of likely developments for the
ARM, but could be used in less "RISCy" designs.

	Torben Mogensen (torbenm@diku.dk)

nbvs@cl.cam.ac.uk (Nicko van Someren) (04/17/91)

In article <4826@syma.sussex.ac.uk> gavinw@syma.sussex.ac.uk writes:
>Two enhancements to future ARMs that I would like to see are
>
>i) 'modify offset-register' writeback addressing-modes
>ii) trap register-values-outside-given-range to speed up range
>checking.

While they are at it, they could make it a CISC processor!     ;-)



+-----------------------------------------------------------------------------+
| Nicko van Someren, nbvs@cl.cam.ac.uk, (44) 223 358707 or (44) 860 498903    |
+-----------------------------------------------------------------------------+