[comp.lang.c] Memory Allocation

belanger@micomvax.UUCP (Robert Belanger) (09/06/88)

Hi everybody,

	We are in the process of speeding up memory allocation. Because we
have a CPU with a linear address space, but no (easy) access to disk. We have
four(4) megabytes of DRAM available. What we want to do is avoid any system
in order to speed up memory allocation and also reduce the overhead associated
with memory allocation.

	We tought of several scheme

	1)	we could allocate a long block of memory and manage it 
	ourselves (by maintaining headers to our data blocks) but two problems
	arises 1) when we free a block then we have to scan the whole area in
	order to see if we could merge two blocks together. 2) scanning the
	list (on other system with swap space) induce a lot of disk swapping.

	2)	we could allocate a long block of memory and also create two
	lists in memory (one for free blocks the other for used ones). problems
	1) when somebody frees up a list we have to maintain in the block (or
	just before) where in our used list is the header to refers to it.
	2) it looks awfully complex to maintain a stable relation between the
	lists and the actual data.

So, I am asking your help, what do you think? do you know (or think) of any
other alternative?

Thank you very much
#if enough_interest
I post a summary of answers
#endif
===============================================================================
Philips Electronics Ltd.			     	     Robert L. Belanger
600 Frederick Philips Blvd		     	     Advanced Development Group
St-Laurent, Quebec, CANADA					   ECHO project
H4M 2S9	(514)-744-8200-2495		  		Eternity is very long!!
			        	  	       Especially in the end!!!
E-MAIL: philabs!micomvax!belanger

"Je me laissai fondre comme un suppositoire dans le derriere du
'Moonshine Bowling'", 
	Francois, Au clair de la lune.
===============================================================================

u-dmfloy%sunset.utah.edu@utah-cs.UUCP (Daniel M Floyd) (09/07/88)

It sounds like you want to make your own allocator to by pass
some less desireable (slower) system allocator.

You might try an allocation scheme similar to a DOS (i.e. MS/PS)
disk space allocation.

Divide your memory into convinient chunks. Then refer to those
chunks in some sort of look up table. Keep a pointer to the first
used block and the last. Always allocate at the end. The table will
either contain some sort of chain (if you can segment your allocation),
or a flag that says used or not. In all probablity, you're not going
to waste a lot of time searching for the next allocation (it's at
the end-oh yes I almost forgot when the end is at the end you wrap around).
Most likely, your deletions will occur in a first in first out method or
nearly enough that your active used area will sort of crawl toward the end
then wrap and start over. This method *will* waste memory and fragmenation
will occur in the active used area for more waste. But, you say you've
got memory to burn. If you end up with insufficient space at the end
pointer, then search for the next unused block (in the active used area)
and try it. This should happen rarely (hopefully never). I saw an article
in byte doing this with disks. The author showed conincingly, that
fragmentation won't be a big issue this way.

For example:
Divide the 4M into 1k blocks. Set asside 537k bytes to use as your
table (if you're paranoid about droping a bit, do it twice). Each
bit within the 537k represents a 1k block, 0 if not used, 1 if used.
beginpointer=start of 537k block
endpointer=start of 537k block

then in psuedo run-time

allocate 2k
  set 2 bits accoringly
  endpointer=endpointer+2

allocate 3k ...alocate...

deallocate 3k
  unset 3 bits

allocate ...

deallocate first 2k
  unset 2 bits
  move startpointer to first set bit
  (you'll have to search, but you can search a byte or a word
  at a time and then narrow it down to the bit.)


As you can see the allocation is quick (usually), with a small cost on
deallocation.

It's just an idea. Hope it helps. Hope I was clear.

gary@apexepa.UUCP (Gary Wisniewski) (09/09/88)

I doesn't sound like you have any special needs which demand an unusual
allocation technique---you just need a good one.

Rather than try to invent a new algorithm, why not look in Knuth's
"Fundamental Algorithms" and "Sorting and Searching".  There are plenty
of proven techniques specifically aimed at memory management.  Chances
are, you'll find that a hybrid will work perfectly.

I often prefer simplicity in memory allocation schemes.  Usually I'll choose
a first-fit or best-fit algorithm depending upon the needs of the application.
It is usually pretty easy to get excellent performance.  However, there are
some cases where the allocation demands are severe (low memory availablity
combined with high-volume allocator usage).  In such cases, more complex
algorithms such as Knuth's "boundary tag" algorithm can be used.  I generally
find such efforts not worth it except in extreme cases.


-- 
Gary J. Wisniewski				  Apex Software Corporation
{allegra,bellcore,cadre}!pitt!darth!apexepa!gary  Phone: (412) 681-4343

cdold@starfish.Convergent.COM (Clarence Dold) (09/10/88)

From article <5702@utah-cs.UUCP>, by u-dmfloy%sunset.utah.edu@utah-cs.UUCP (Daniel M Floyd):
> 
> You might try an allocation scheme similar to a DOS (i.e. MS/PS)
> disk space allocation.
> 
> Divide your memory into convinient chunks. Then refer to those
> chunks in some sort of look up table. Keep a pointer to the first
> used block and the last. Always allocate at the end. The table will
If you are using a Unisys or Convergent system, look at malloc(3x) rather 
than malloc(3c).  It implements a 'convenient chunk' scheme, incredibly
faster, especially if you are paging to swap space.
-- 
Clarence A Dold - cdold@starfish.Convergent.COM		(408) 435-5274
		...pyramid!ctnews!mitisft!professo!dold
		P.O.Box 6685, San Jose, CA 95150-6685

bill@proxftl.UUCP (T. William Wells) (09/10/88)

In article <1262@micomvax.UUCP> belanger@micomvax.UUCP (Robert Belanger) writes:
: Hi everybody,
:
:       We are in the process of speeding up memory allocation. Because we
: have a CPU with a linear address space, but no (easy) access to disk. We have
: four(4) megabytes of DRAM available. What we want to do is avoid any system
: in order to speed up memory allocation and also reduce the overhead associated
: with memory allocation.

One scheme comes to mind for fast memory allocation: the "buddy
system allocator".  This is *extremely* fast.  The drawback is
that you must allocate in blocks of certain sizes.  For example,
the standard method uses blocks of size k*2**n, where k is a
constant and n is a parameter.  Look in Knuth or in "Data
Structures and Algorithms", Aho, Hopcroft, and Ullman, for a
description of this and other memory allocation systems.

bill@proxftl.UUCP (T. William Wells) (09/10/88)

In article <1262@micomvax.UUCP> belanger@micomvax.UUCP (Robert Belanger) writes:
:       1)      we could allocate a long block of memory and manage it
:       ourselves (by maintaining headers to our data blocks) but two problems
:       arises 1) when we free a block then we have to scan the whole area in
:       order to see if we could merge two blocks together. 2) scanning the
:       list (on other system with swap space) induce a lot of disk swapping.

Neither of these needs to be true.  The trick is to do the block
merging while allocating.  In other words, freeing a block is
just marking it as freed; when allocating a block, you look at
the current free block's successor, and if it is free, merge it
then.  The second problem can go away *if* your method of
maintaining the current allocation point is consistent with the
allocation patterns of your program.

---
Bill
novavax!proxftl!bill

g-rh@cca.CCA.COM (Richard Harter) (09/12/88)

In article <733@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
>In article <1262@micomvax.UUCP> belanger@micomvax.UUCP (Robert Belanger) writes:
>:       We are in the process of speeding up memory allocation. Because we
>: have a CPU with a linear address space, but no (easy) access to disk...

>One scheme comes to mind for fast memory allocation: the "buddy
>system allocator".  This is *extremely* fast.  The drawback is
>that you must allocate in blocks of certain sizes.  For example,
>the standard method uses blocks of size k*2**n, where k is a
>constant and n is a parameter...

Best fit is also very fast, albeit not as fast a carefully implemented
buddy system.  The trick is round sizes up modulo a small convenient 
power of two to reduce the range of requests to a reasonable range,
and then maintain a separate list for each size.  To access the lists
quickly maintain a table of list head pointers.  [There are some tricks
for searching the table more efficiently than a linear search.]  For
example the round factor might be 8 and the table size 128, which
takes care of requests up to 1024.  For blocks beyond that size you can
maintain a single "large block" list.  If you have numerous large requests
you can have an additional table for large blocks going up by powers of
two.  There is a fixed overhead price in space for each allocated block.
Writing a fast "best-fit" allocator is an interesting exercise in
programing.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

psrc@poseidon.UUCP (Paul S. R. Chisholm) (09/13/88)

< "NO toon can resist the old shave-and-a-haircut gag!" >

In article <1262@micomvax.UUCP>, belanger@micomvax.UUCP (Robert Belanger) writes:
> We are in the process of speeding up memory allocation. . . .  What
> we want to do is avoid any system in order to speed up memory
> allocation and also reduce the overhead associated with memory
> allocation. . . .
>Rober L. Belanger, philabs!micomvax!belanger

Consider this war story (private communication from Chris Van Wyk to
Jon Louis Bentley, and quoted in the latter's WRITING EFFICIENT
PROGRAMS, p. 37):

	My interpreter for IDEAL seemed to be awfully slow and to take
	a lot of space, so I profiled its execution.  It turned out
	that over a sample of ten test cases that exercise every part
	of the code, it was spending almost seventy percent of its time
	in the system's memory allocator!

	Further investigation revealed that most of this was used in
	allocating one particular kind of node (more than 68,000 times,
	with the next most particular node being allocated around 2,000
	times).  I added the minimal bookkeeping necessary to keep my
	own queue of free nodes of this particular kind, and reduced
	the memory allocator's share of the time to about thirty
	percent.

	There were three benefits of this change:

	1.  less time in the allocator (it's a circular list with a
	roving pointer,

	2.  less memory fragmentation (our allocator doesn't compact),
	and

	3.  now that the statistics aren't overwhelmed by the
	allocator's share, I can find places that need to be sped up
	with sophisticated algorithms or data structures.

	On the other hand, it would not be worth my time to provide my
	own bookkeeping for every kind of node I allocate, so I save
	programming effort on another front.

The key point here is that Van Wyk measured his program (with a
non-trivial test input), found where the program spent its time, and
*then* optimized it.

(Bentley gives another example, a Fortran compiler that had an
extremely slow routine.  The programmer spent a week speeding that one
routine up.  Ten years later, someone reported a bug; guess where?
Looking back at the code, the programmer realized that this routine
*never* worked, and must never have been called previously in ten
years of use!)

Bentley, Jon Louis, WRITING EFFICIENT PROGRAMS, Prentice-Hall,
Englewood Cliffs, NJ, 1982.  ISBN 0-13-970251-2, 0-13-970244-X (pbk).
A nice collection of ways of making software faster, smaller, or both,
with examples and real-world "war stories".  There may not be anything
you've never heard, but they're collected in one volume, and easy to
read.

Paul S. R. Chisholm, psrc@poseidon.att.com (formerly psc@lznv.att.com)
AT&T Bell Laboratories, att!poseidon!psrc, AT&T Mail !psrchisholm
I'm not speaking for the company, I'm just speaking my mind.

chase@Ozona.orc.olivetti.com (David Chase) (09/14/88)

In article <499@poseidon.UUCP> psrc@poseidon.UUCP (Paul S. R. Chisholm) writes
(quoting a letter from Van Wyk to Bentley):
>	times).  I added the minimal bookkeeping necessary to keep my
>	own queue of free nodes of this particular kind, and reduced
>	the memory allocator's share of the time to about thirty
>	percent.
>The key point here is that Van Wyk measured his program (with a
>non-trivial test input), found where the program spent its time, and
>*then* optimized it.

This is one good approach (I've used it several times); another (if
you simply must write your own memory allocator) is to use somebody
else's benchmarks to ensure that you at least implement a good
algorithm.  Before writing yet another allocator, I suggest that (as a
minimum) you read "Dynamic Memory Allocation in Computer Simulation"
by Norman R. Nielsen, in CACM 20:11 (November 1977).  I don't know if
anyone has done a similar study of allocation techniques that have
been invented since then, but I'd like to hear about it if they have.

Here, I'll even save you the trouble.  By most measures, the best
algorithm maintains a free list for each node size that is requested
(NO ROUNDING UPWARD EXCEPT FOR WORD ALIGNMENT).  This algorithm was
typically fastest (though buddy blocks has better worst-case
performance) and nearly as effective in limited memory situations
("limited memory" = "sum of user requests is 80% of what is available,
ignoring all overhead imposed by allocator").  On the "base demand
list", this algorithm required only 85.6% of available memory to
satisfy a request for 80%; the best required 83.6%, but was 10 ten
times slower.

On a more rigorous set of 18 test loads, this algorithm was best at
50% (succeeded 18 times, max of 42 IBM 360/65 microseconds for
alloc-dealloc), best at 66.7 % (succeeded 18 times, 42 usecs) and
probably best at 80% (succeeded 15 times, 169 uSecs).  I say probably
best because only selected algorithms were given the full treatment,
and some other might have been more successful.  However, of the
algorithms given the full treatment, the most successful faster
(worst-case) algorithm only succeeded 7 times.

I should add that I've implemented a couple along these lines, and
played with a few that other people have implemented, and I am
happiest with the approach taken in the Boehm-Demers garbage collector
(available from the Rice archive server*, I think).  It uses multiple
free lists for blocks below a certain size (4K - overhead), and a
single free list for blocks larger than that.  It can easily be
convinced not to try to collect garbage, but unless you take delight
in writing truly squirrelly code, it will correctly take out the
trash.  For more information about this collector, see

  Boehm, H., and M. Weiser, "Garbage Collection in an Uncooperative
  Environment", Software Practice & Experience, to appear, September
  1988.

Weiser experimented with a special version of the collector for
tracing storage leaks; typically, the collector does a better job than
programmers.

I realize that electronic mail is at your fingertips, but I know that
there are university libraries containing the cited material in
Silicon Valley, Austin, Houston, and Boston, and probably other places
besides those.  I can't really recommend the UT or UH campuses, but
walks through Stanford, Rice, MIT, or Harvard are quite pleasant.  Get
off your computer butts and go read some of this stuff.

* mail -s "help" "archive-server@rice.edu" < /dev/null
  mail -s "send index sun-source" "archive-server@rice.edu" < /dev/null

David Chase
Olivetti Research Center, Menlo Park, CA

bill@proxftl.UUCP (T. William Wells) (09/17/88)

In article <33184@cca.CCA.COM> g-rh@CCA.CCA.COM (Richard Harter) writes:
: In article <733@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
: >In article <1262@micomvax.UUCP> belanger@micomvax.UUCP (Robert Belanger) writes:
: >:       We are in the process of speeding up memory allocation. Because we
: >: have a CPU with a linear address space, but no (easy) access to disk...
:
: >One scheme comes to mind for fast memory allocation: the "buddy
: >system allocator". ...
:
: Best fit is also very fast, albeit not as fast a carefully implemented
: buddy system. ...
:       There is a fixed overhead price in space for each allocated block.
: Writing a fast "best-fit" allocator is an interesting exercise in
: programing.

Yes.  A exact- + worst-fit (a close relative of best-fit)
allocator that I wrote as part of a real time kernel turned out
to be one of the largest sections of the kernel; it was easily
the most complicated.

Two disadvantages come to mind for best-fit.  The first is the
memory overhead.  If you keep track of the block size and make
your minimum block size be two pointers, you can keep the
overhead per allocated block to a single bit.  Of course, there
is also the overhead of allocated but unused space; this is often
small compared to a word per block, especially if the typical
allocation amount is small.

The second disadvantage is fragmentation: exact-fit + worst fit
is supposed to create less fragmentation than best-fit.  I have
not done the experiment, though.

---
Bill
novavax!proxftl!bill

g-rh@cca.CCA.COM (Richard Harter) (09/18/88)

In article <778@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
>In article <33184@cca.CCA.COM> g-rh@CCA.CCA.COM (Richard Harter) writes:

>: Writing a fast "best-fit" allocator is an interesting exercise in
>: programing.

>Yes.  A exact- + worst-fit (a close relative of best-fit)
>allocator that I wrote as part of a real time kernel turned out
>to be one of the largest sections of the kernel; it was easily
>the most complicated.

It shouldn't be all that bad -- the best fit allocator I am using
currently runs about 200 lines of C, excluding comments.    That includes
both the allocate and the free routine.  However it doesn't have any
tuning code or any special code for handling small blocks.

>Two disadvantages come to mind for best-fit.  The first is the
>memory overhead.  If you keep track of the block size and make
>your minimum block size be two pointers, you can keep the
>overhead per allocated block to a single bit.  Of course, there
>is also the overhead of allocated but unused space; this is often
>small compared to a word per block, especially if the typical
>allocation amount is small.

I don't understand this comment.  If you mean to say that you can get
by with a single bit to mark whether a block is allocated or not, sure.
But the block size word will (on a ptr per word machine) take the same
space as a ptr.  

In any case, yes, there is more memory overhead for a best-fit allocator.
Personally I think this is a moot point.  The losses for fragmentation
and oversize allocation are much more signifigant unless you have a heavy
load of short requests.  However small block requests are best handled by
having dedicated blocks for the small blocks.

One point which I feel strongly about is that most allocators use the
blocks (free and allocated) to hold control information.  I think that
this is a mistake.  I put all control information in structures in a 
separate node space.  This adds overhead, but it makes life a lot safer.
Array indexing errors do not damage the allocator.  Free's can be checked
for validity.

>The second disadvantage is fragmentation: exact-fit + worst fit
>is supposed to create less fragmentation than best-fit.  I have
>not done the experiment, though.

I have -- I ran an extensive series of tests on allocator strategies a
number of years ago.  What it comes down to is that you can get the results
you want by picking your assumptions.  Exact fit (whether EF+WF) is important;
it defeats the 2/3 rule.  Given that, different size distributions and 
different allocate/free patterns give different results.
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

bill@proxftl.UUCP (T. William Wells) (09/18/88)

In article <33422@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
: In article <778@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
: >In article <33184@cca.CCA.COM> g-rh@CCA.CCA.COM (Richard Harter) writes:
:
: >: Writing a fast "best-fit" allocator is an interesting exercise in
: >: programing.
:
: >Yes.  A exact- + worst-fit (a close relative of best-fit)
: >allocator that I wrote as part of a real time kernel turned out
: >to be one of the largest sections of the kernel; it was easily
: >the most complicated.
:
: It shouldn't be all that bad -- the best fit allocator I am using
: currently runs about 200 lines of C, excluding comments.    That includes
: both the allocate and the free routine.  However it doesn't have any
: tuning code or any special code for handling small blocks.

Well, mine did have that kind of code; it also had grunge for
handling buffer pools and a few hacks to make it fit with the
rest of the kernel.  Anyway, 200 lines of C code, translated into
assembler, would have been a lot of code compared to any of the
other parts of the kernel.

: >Two disadvantages come to mind for best-fit.  The first is the
: >memory overhead.  If you keep track of the block size and make
: >your minimum block size be two pointers, you can keep the
: >overhead per allocated block to a single bit.  Of course, there
: >is also the overhead of allocated but unused space; this is often
: >small compared to a word per block, especially if the typical
: >allocation amount is small.
:
: I don't understand this comment.  If you mean to say that you can get
: by with a single bit to mark whether a block is allocated or not, sure.
: But the block size word will (on a ptr per word machine) take the same
: space as a ptr.
:
: In any case, yes, there is more memory overhead for a best-fit allocator.
: Personally I think this is a moot point.  The losses for fragmentation
: and oversize allocation are much more signifigant unless you have a heavy
: load of short requests.  However small block requests are best handled by
: having dedicated blocks for the small blocks.

Let me amplify, then.  Most allocation schemes make use of a word
of data for each block.  However, with the buddy system, if you
are able to pass the size of the block to the free routine, you
can get away with one bit of information in each allocated
block.  Generally, this size is either known at compile time or
is easily computable.  And the bit can usually be buried in the
data structure without costing anything.

In comparing the two systems, assuming one has a specific
application in mind, one should count the word of overhead per
block plus the fragmentation for the usual systems, vs.  the
wasted space plus the fragmentation for the buddy system.

There are also a few minor points favoring the buddy system; one
is that it can do a much better job of reducing fragmentation
(though you milage almost certainly will vary :-) Another is
that, if you know that you have a buddy system allocator, you can
sometimes choose which structures things go in so that the
structure sizes are the appropriate ones for your allocator.

Another point, which I haven't given much though to, but I though
I'd toss out to see what people think, is that a buddy system
allocator can go very well with a paged memory system; wasted
space at the end of small blocks is irrelevant and wasted space
at the end of large blocks wastes address space but does not
waste much working set memory.

: One point which I feel strongly about is that most allocators use the
: blocks (free and allocated) to hold control information.  I think that
: this is a mistake.  I put all control information in structures in a
: separate node space.  This adds overhead, but it makes life a lot safer.
: Array indexing errors do not damage the allocator.  Free's can be checked
: for validity.

Depends on what you want.  If you want a bulletproof system, then
you should definitely segregate the control information.  If you
want a high-performance system, then you have to put the control
information in an unsafe place.  That's life.

: >The second disadvantage is fragmentation: exact-fit + worst fit
: >is supposed to create less fragmentation than best-fit.  I have
: >not done the experiment, though.
:
: I have -- I ran an extensive series of tests on allocator strategies a
: number of years ago.  What it comes down to is that you can get the results
: you want by picking your assumptions.  Exact fit (whether EF+WF) is important;
: it defeats the 2/3 rule.  Given that, different size distributions and
: different allocate/free patterns give different results.

Yeah, and that really is the bottom line: you have to pick the
best allocator based on an analysis and evaluation of the
specific task to be performed.  That makes life difficult for
those writing general purpose allocators: they have no idea what
the allocation patterns are likely to be.  That suggests that one
in that position should find allocators with reasonable
worst-case performance, and then pick the fastest or the simplest
of those schemes.

---
Bill
novavax!proxftl!bill

g-rh@cca.CCA.COM (Richard Harter) (09/19/88)

In article <784@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:

	re space for allocator programs

]Well, mine did have that kind of code; it also had grunge for
]handling buffer pools and a few hacks to make it fit with the
]rest of the kernel.  Anyway, 200 lines of C code, translated into
]assembler, would have been a lot of code compared to any of the
]other parts of the kernel.

	Good point.  My concern has been with allocators for large
systems rather than for kernels.  I don't begrudge the time or code
spent on making an allocator hot; in many applications allocators get
a heavy workout and the payback for improving them is large.  A slow
allocator can make you look very bad.


]There are also a few minor points favoring the buddy system; one
]is that it can do a much better job of reducing fragmentation
](though you milage almost certainly will vary :-) Another is
]that, if you know that you have a buddy system allocator, you can
]sometimes choose which structures things go in so that the
]structure sizes are the appropriate ones for your allocator.

	Actually it buries the fragmentation in the oversize
allocation.  If block sizes are random the buddy system has a 67%
storage usage efficiency.  Exact fit allocators can get upwards of
90% if request sizes are not too small (and small request size are
best handled separately.)  Efficiencies depend on the ratio of mean
requested size to maximum space available, the request size distribution,
and the request/free pattern characteristics.

	I can't say I'm much enthused about the notion of bending the
code structure to fit the allocator characteristics.

]Another point, which I haven't given much though to, but I though
]I'd toss out to see what people think, is that a buddy system
]allocator can go very well with a paged memory system; wasted
]space at the end of small blocks is irrelevant and wasted space
]at the end of large blocks wastes address space but does not
]waste much working set memory.

	Now this is a very interesting point, and one which I hadn't
thought about.  There is a clear profit to be gained in taking page
sizes into account -- page faults are a lovely way to take a performance
hit.  In a best fit (or EF+WF) allocator one should have criteria for
not straddling page boundaries.  I'm going to have think about that one.

]: One point which I feel strongly about is that most allocators use the
]: blocks (free and allocated) to hold control information.  I think that
]: this is a mistake.  I put all control information in structures in a
]: separate node space.  This adds overhead, but it makes life a lot safer.
]: Array indexing errors do not damage the allocator.  Free's can be checked
]: for validity.

]Depends on what you want.  If you want a bulletproof system, then
]you should definitely segregate the control information.  If you
]want a high-performance system, then you have to put the control
]information in an unsafe place.  That's life.

But note that the performance hit is on storage use and not on time.
Moreover the hit is not all that great unless there are a lot of short
requests.  I am a fan of bulletproof code.  However different situations
make for different requirements.


-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

tanner@cdis-1.uucp (Dr. T. Andrews) (09/19/88)

In article <33422@cca.CCA.COM>, g-rh@cca.CCA.COM (Richard Harter) writes:
) One point which I feel strongly about is that most allocators use the
) blocks (free and allocated) to hold control information.  I think that
) this is a mistake.  I put all control information in structures in a 
) separate node space.

Unless you hard-code a limit of <n> allocated blocks, you have to
allocate the separate node space from the same heap, leaving the
control information somewhat vulnerable (OK, so now it's arry[-3]
instead of [-1] that trashes you).

I stuck the control information in the blocks, and can still check
my free() calls.  Two things assist in this.

On free() calls, I trace the list until I find the block to be freed.
If it isn't a valid block, it isn't free()d.  As the block is being
freed, I coalesce the adjacent free block(s).  At most 3 blocks may
be collected (the freed block and the ones immediately above and
below).  It seems much better to do this work wen the block is freed
(if ever) rather than to perform trash-mashing at malloc() time.

Secondly: the third control word in the block (after a pointer to the
next block and the size of the current block) is the "XOR" of the two
other control words.  This gives a good indication if arry[-1] has
been tampered.
-- 
...!bikini.cis.ufl.edu!ki4pv!cdis-1!tanner  ...!bpa!cdin-1!cdis-1!tanner
or...  {allegra killer gatech!uflorida decvax!ucf-cs}!ki4pv!cdis-1!tanner

bill@proxftl.UUCP (T. William Wells) (09/22/88)

In article <33451@cca.CCA.COM> g-rh@XAIT.Xerox.COM (Richard Harter) writes:
: In article <784@proxftl.UUCP> bill@proxftl.UUCP (T. William Wells) writes:
:
: ] [comments about an allocator I wrote for a kernel.]
:
:       Good point.  My concern has been with allocators for large
: systems rather than for kernels.  I don't begrudge the time or code
: spent on making an allocator hot; in many applications allocators get
: a heavy workout and the payback for improving them is large.  A slow
: allocator can make you look very bad.

Yeah.  This was a message passing kernel; you couldn't do
*anything* without passing a message.  And guess what the first
step in sending a message was!

:       I can't say I'm much enthused about the notion of bending the
: code structure to fit the allocator characteristics.

Depends on how and when you do it.  An example of where this
worked nicely is where I was building a directed graph whose
nodes pointed to another directed graph.  I had a choice of
putting a field in the pointing nodes of the first directed graph
or in the pointed to nodes of the second.

My reasoning went as follows: it doesn't matter much for the
algorithm where I put the field, so there is nothing wrong with
putting it in either place.  If the underlying allocator is
malloc, then it doesn't matter where.  If the underlying
allocator is my special allocator, then I can choose one or the
other depending on which fits the allocator better.  Of course, I
also had to contend with the matter of differing processors, so I
then asked: on what kinds of processors would this optimization
matter?  The answer to that gave me the sizes of the fields, and
the nature of the allocator let me then decide where to put the
field.  And, BTW, it went in the first graph.

: ][comment on using a buddy system with paged systems]

:       Now this is a very interesting point, and one which I hadn't
: thought about.  There is a clear profit to be gained in taking page
: sizes into account -- page faults are a lovely way to take a performance
: hit.  In a best fit (or EF+WF) allocator one should have criteria for
: not straddling page boundaries.  I'm going to have think about that one.

Studying the performance of a buddy system allocator in a paged
system might be an interesting research topic.  I know that it
would beat hell out of some of the theses I reviewed when I was
doing interviews.  Any takers?  :-)

---
Bill
novavax!proxftl!bill

g-rh@XAIT.XEROX.COM (Richard Harter) (09/23/88)

In article <7105@cdis-1.uucp> tanner@cdis-1.uucp (Dr. T. Andrews) writes:
>In article <33422@cca.CCA.COM>, g-rh@cca.CCA.COM (Richard Harter) writes:
>) One point which I feel strongly about is that most allocators use the
>) blocks (free and allocated) to hold control information.  I think that
>) this is a mistake.  I put all control information in structures in a 
>) separate node space.

>Unless you hard-code a limit of <n> allocated blocks, you have to
>allocate the separate node space from the same heap, leaving the
>control information somewhat vulnerable (OK, so now it's arry[-3]
>instead of [-1] that trashes you).

	You are always vulnerable to wild writes unless there is hardware
memory area protection available.  However the vast majority of overwrite
problems come from running over the ends of arrays.  What you do is to
allocate a block of nodes as an array of nodes upon need.If you're feeling
paranoid you put some buffer space at the end of each block of nodes.  The
general idea is to avoid contiguity between user data space and allocator
control space.  The problem with associating control information directly
with allocation blocks is that you when you pass an address back to the
user you are, in effect, also passing back a pointer to the control
information associated with the block.  The control block is immediately
susceptible to damage by the user because you have violated physical
information hiding.

	This is the sort of thing that makes for mystery bugs, to coin
a phrase.  Mystery bugs are, loosely, bugs where you cannot infer the
source of the error logically from the error because the effect of the
error depends on the arcana of the implementation and the actual arrangement
of the code and data.

>I stuck the control information in the blocks, and can still check
>my free() calls.  Two things assist in this.

>On free() calls, I trace the list until I find the block to be freed.
>If it isn't a valid block, it isn't free()d.  As the block is being
>freed, I coalesce the adjacent free block(s).  At most 3 blocks may
>be collected (the freed block and the ones immediately above and
>below).  It seems much better to do this work wen the block is freed
>(if ever) rather than to perform trash-mashing at malloc() time.

	Immediate coalescence is usually the right thing to do.  Your
strategy for validating freed blocks sounds like it is expensive time
wise.  
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

tanner@cdis-1.uucp (Dr. T. Andrews) (09/23/88)

In article <33692@XAIT.XEROX.COM>, g-rh@XAIT.XEROX.COM (Richard Harter) writes:
) 	Immediate coalescence is usually the right thing to do.  Your
) strategy for validating freed blocks sounds like it is expensive time
) wise.  

Not really.  As I scan and coalesce the blocks, I perform the
validation.  of the pointers.  It's real simple (one "xor"),
but catches most errors.  The control block contains only:
	[0]	back ptr
	[1]	size of block (is even, the low bit == "in use")
	[2]	[0] ^ [1]

I don't mind the small expense at free() time, because in general
that's not done nearly as often as malloc() in my applications.
-- 
...!bikini.cis.ufl.edu!ki4pv!cdis-1!tanner  ...!bpa!cdin-1!cdis-1!tanner
or...  {allegra killer gatech!uflorida decvax!ucf-cs}!ki4pv!cdis-1!tanner

njk@freja.dk (Niels J|rgen Kruse) (09/27/88)

In article <33422@cca.CCA.COM>, g-rh@cca.CCA.COM (Richard Harter) writes:
> One point which I feel strongly about is that most allocators use the
> blocks (free and allocated) to hold control information.  I think that
> this is a mistake.  I put all control information in structures in a
> separate node space.  This adds overhead, but it makes life a lot safer.
> Array indexing errors do not damage the allocator.  Free's can be checked
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> for validity.

But then they will damage something else. The consequences of
this may be much more suptle and affect output only without
being fatal. Non-fatal bugs does not make life easier!
Corrupting the allocators overhead information is almost
allways fatal (eventually) and can be checked for with
heapwalkers, to pinpoint the cause. In particular, liberation
with boundary tags gives a heap with a lot of redundant
information that can be compared for consistency. Such a
heapwalker ought to be supplied with allocators that store
overhead information in the heap area.

        Niels J|rgen Kruse
        Stud. Scient. at DIKU (U. of Copenhagen)

bill@proxftl.UUCP (T. William Wells) (09/28/88)

In article <7105@cdis-1.uucp> tanner@cdis-1.uucp (Dr. T. Andrews) writes:
: On free() calls, I trace the list until I find the block to be freed.
: If it isn't a valid block, it isn't free()d.  As the block is being
: freed, I coalesce the adjacent free block(s).  At most 3 blocks may
: be collected (the freed block and the ones immediately above and
: below).

Do I wish I could afford this kind of strategy!  :-) However,
what you are describing is *expensive*.  One might want to do
this kind of thing in a system where catching bugs is of the
essence, but any other use is likely to cost an unacceptable
amount of machine time, if one does much allocating.

:          It seems much better to do this work wen the block is freed
: (if ever) rather than to perform trash-mashing at malloc() time.

Given what you are doing, you should do the coalesce at free
time.  However, in systems where coalescing involves extra work,
it is usually better done at malloc time: this is generally more
efficient because some of the work may already be being done at
malloc time, and because the coalescing doesn't have to be done
until needed (if ever).  However, this is a general comment;
specific allocators may have different characteristics.

---
Bill
novavax!proxftl!bill

g-rh@XAIT.XEROX.COM (Richard Harter) (09/30/88)

In article <6913@cdis-1.uucp> tanner@cdis-1.uucp (Dr. T. Andrews) writes:
>In article <33692@XAIT.XEROX.COM>, g-rh@XAIT.XEROX.COM (Richard Harter) writes:
>) 	Immediate coalescence is usually the right thing to do.  Your
>) strategy for validating freed blocks sounds like it is expensive time
>) wise.  
>
>Not really.  As I scan and coalesce the blocks, I perform the
>validation.  of the pointers.  It's real simple (one "xor"),
>but catches most errors.  The control block contains only:
>	[0]	back ptr
>	[1]	size of block (is even, the low bit == "in use")
>	[2]	[0] ^ [1]
>
>I don't mind the small expense at free() time, because in general
>that's not done nearly as often as malloc() in my applications.

	This is reasonable -- your original posting talked about
traversing the free list, which is another matter entirely.  I am
not convinced one way or another that this is perfectly safe.  One
thing that should be done in your scheme is to check that the pointers
in the preceding and following blocks point correctly (a length is as
good as a pointer in this context.)  At a miminum you should set the
use bit to free in the returned block.  What you want to catch is 
the multiple returns of a block.  It is unsafe to test on the contents
of a freed location.  However the odds are that the contents of the
old control area in the freed block are either completely changed or
are unaltered.  Pointer checks and use bit checks will catch both.
[I am uncomfortable "the odds are" programming.]  Much safer is to
do a two way check on pointers.  Suppose that I have three successive
blocks, A, B, and C, and that B is freed.  The check is to determine
A and C from the control area for B and then check to make sure that
A and C both point to B.  If you don't do this kind of checking, this
is what can happen.  A and C are free, B allocated.  Return B, leaving
its control area unchanged.  Now A points to D, just past C.  The
control area in C is unaltered since it is part of the free area for A.
Now erroneously free C.  Its control area points to B and D, apparently
in use.  No coalescence is done, but C is entered into the avail list.
In due course both the large block A and the false block C are allocated
with sundry disastrous results.

	The above scenario is not the only one possible.  Pointer checking
may be provably "safe" under the assumption that the control area of
a false return either contains stale pointers or garbage.  However I
don't have a proof at hand.  
-- 

In the fields of Hell where the grass grows high
Are the graves of dreams allowed to die.
	Richard Harter, SMDS  Inc.

barmar@think.COM (Barry Margolin) (09/30/88)

In article <6913@cdis-1.uucp> tanner@cdis-1.uucp (Dr. T. Andrews) writes:
>I don't mind the small expense at free() time, because in general
>that's not done nearly as often as malloc() in my applications.

Shouldn't you free everything you allocate?  Or do you just depend on
everything being freed when you exit?

Barry Margolin
Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

nevin1@ihlpb.ATT.COM (Liber) (10/01/88)

In article <28893@think.UUCP> barmar@kulla.think.com.UUCP (Barry Margolin) writes:
>In article <6913@cdis-1.uucp> tanner@cdis-1.uucp (Dr. T. Andrews) writes:
>>I don't mind the small expense at free() time, because in general
>>that's not done nearly as often as malloc() in my applications.

>Shouldn't you free everything you allocate?  Or do you just depend on
>everything being freed when you exit?

In my experience, free()s are done much less often than malloc()s (As
a matter of fact, the best case is when # free()s invoked = # malloc()s
invoked).

From what I have seen, memory is usually malloc()ed for one or more of the
following reasons (I don't necessarily agree with all of these reasons,
but this is based upon code I have seen and not just code that I have
written):

1.  The amount of space needed is variable.

2.  The amount of space needed is unknown at compile time.

3.  The data needs to be shared between different functions.

4.  The data needs to live through more than one function invocation
(and static variables may not suffice because the data used may depend
on where a given function is called from)

5.  Space for some variables may not be allocated unless they are going
to be specifically used.

If you are using malloc() for either reason #3 or reason #4, the number
of free()s will usually be much less than the number of malloc()s.
It isn't worth cleaning up the heap before exit(), since exit() (at
least on Unix), does it for you.
-- 
 _ __		NEVIN J. LIBER  ..!att!ihlpb!nevin1  (312) 979-4751  IH 4F-410
' )  )  "I catch him with a left hook. He eels over. It was a fluke, but there
 /  / _ , __o  ____  he was, lying on the deck, flat as a mackerel - kelpless!"
/  (_</_\/ <__/ / <_	These are NOT AT&T's opinions; let them make their own.

snguyen@nprdc.arpa (Son Nguyen) (02/08/89)

In article <7208@pyr.gatech.EDU>, dvu@pyr.gatech.EDU (Dinh Vu) writes:

@ I am learning C, and having difficulty with pointers, and
@ structures.  The small program below compiled fine, but
@ it gave core dump when I run it.  Would someone give me 
@ some light on this matter.
@ 
@ 
@ 	 #include <stdio.h>
@ 	
@ 	struct abc {
@ 		int x;
@ 		int y;
@ 	};
@ 
@ 	main()
@ 	{
@ 		struct abc *var;
@ 
@ 		var->x = 5;
@ 		var->y = 10;
@ 	}
@ 
@ 
@ 
@ Dinh Vu

	Your problem is that you are trying to assign values to the internal
	fields of structure 'abc' WITHOUT allocating memory for the pointer
	called 'var'.  To fix it, you must put the following statements above
	the two lines 'var->x = 5' and 'var->y = 10':

	   if ((var = (struct abc *) malloc (sizeof (struct abc))) == NULL)  {
		printf ("ERROR: Can't allocate memory for 'var'\n");
		exit (1);
	   }


	These above statements will solve your problem.  Remember that you
	must ALWAYS allocate memory for the pointers before assigning values
	for the internal fields of these pointers.



	Son Nguyen

	PS:  If you have more question
	     about C, ask me.......

rns@se-sd.sandiego.ncr.com (Rick Schubert) (02/11/89)

[Sorry if this is a duplicate; I don't think my original got out.]

My standard 2-day waiting period is up, and no one else has made this
comment, so here goes:

In article <1435@arctic.nprdc.arpa> snguyen@nprdc.arpa (Son Nguyen) writes:
>	   if ((var = (struct abc *) malloc (sizeof (struct abc))) == NULL)  {
>		printf ("ERROR: Can't allocate memory for 'var'\n");
>		exit (1);
>	   }

That should be:
		fprintf (stderr, "ERROR: Can't allocate memory for 'var'\n");

Error messages should be written to stderr, not stdout (which is where printf
writes); that's what stderr is for.  This is important (on systems that have
pipes) if stdout is piped into another program or (on systems that have
file redirection) if stdout is redirected to a file.  (But it's good to see
that you've checked the return from malloc!)

-- Rick Schubert (rns@se-sd.sandiego.NCR.COM)

desnoyer@Apple.COM (Peter Desnoyers) (05/02/89)

In article <11021@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
>
>Getting back to data files, it's not necessary to know how big
>they are while reading them.  Just use code like the following:
>
>	[ while data
>	    if (not room)
>		allocate another N bytes]
>

No, No. This algorithm takes O(n^^2) copies in the worst case. If you
change your code to:

	while data
	  if not room
	    current_size *= 2
	    ptr = realloc( ptr, curr_size)

(i.e. increase multiplicatively instead of additively) then you will
only need O(n) copies. There is a direct tradeoff between wasted memory
and cycles using this algorithm - if the multiplicative factor is N,
then :
   	copies = 1 + 1/N
	mean wasted memory (fraction) = (N-1)/2N

2 seems to be a good factor to use - you waste about 25% of your
allocated memory and the copy overhead is only equivalent to copying
the data once. If fitting as much data in memory as possible is more
important than having it work right the first time, don't use this
algorithm. Otherwise, it's fine. 

				Peter Desnoyers

chris@mimsy.UUCP (Chris Torek) (05/03/89)

In article <29978@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers)
writes:
-... This [additive] algorithm takes O(n^^2) copies in the worst case.
-If you change your code to:
-
-	while data
-	  if not room
-	    current_size *= 2
-	    ptr = realloc( ptr, curr_size)
-
-... There is a direct tradeoff between wasted memory and cycles using
-this algorithm - if the multiplicative factor is N, then :
-   	copies = 1 + 1/N
-	mean wasted memory (fraction) = (N-1)/2N

This analysis is correct, if you assume no interaction with the memory
allocator itself.  But in fact the current BSD (modified Caltech buddy
system) allocator will waste much more memory than this.  (If you can
reuse the smaller pieces later, they will not be `wasted', just tieing
up resources for a while.  Or is that `waste'? :-) )

I get the feeling that some future BSD will have six different malloc
library routines. . . .
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

henry@utzoo.uucp (Henry Spencer) (05/04/89)

In article <17252@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>I get the feeling that some future BSD will have six different malloc
>library routines. . . .

It would be too much to ask for one *good* one!  :-) :-(
-- 
Mars in 1980s:  USSR, 2 tries, |     Henry Spencer at U of Toronto Zoology
2 failures; USA, 0 tries.      | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/04/89)

In article <1989May3.201118.10221@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>In article <17252@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>I get the feeling that some future BSD will have six different malloc
>>library routines. . . .
>It would be too much to ask for one *good* one!  :-) :-(

Most applications that really care about malloc() efficiency have already
given up and implemented their own technique, using malloc() simply as an
sbrk() surrogate.

tps@chem.ucsd.edu (Tom Stockfisch) (05/04/89)

In article <10202@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <1989May3.201118.10221@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
>>In article <17252@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>>>I get the feeling that some future BSD will have six different malloc
>>>library routines. . . .
>>It would be too much to ask for one *good* one!  :-) :-(

>Most applications that really care about malloc() efficiency have already
>given up and implemented their own technique, using malloc() simply as an
>sbrk() surrogate.

The problem is that a power-of-2 malloc()/realloc() cannot be
regarded as an sbrk() surrogate when considering the
problem of constructing a potentially
long object whose size is it not known until
you are done constructing it.

You waste up to a factor of 4 with a power-of-2 implementation,
not including fragmentation.

-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (05/04/89)

In article <461@chem.ucsd.EDU> tps@chem.ucsd.edu (Tom Stockfisch) writes:
>You waste up to a factor of 4 with a power-of-2 implementation,
>not including fragmentation.

The hypothetical application gloms onto a sufficiently large chunk of
memory and doles it out itself, using whatever suballocation scheme
seems appropriate for the application.  If you mean that
	void *my_heap = malloc( 1024 * 1024 * 16 );
may over-extend the program data break by up to 16Mbytes due to using
a "buddy system" scheme, I see a factor of two (not four) here, but the
problem lies with whoever provided an allocator with such horrible
properties in the C library.  In such a case, if one knows that sbrk()
is available, it should probably be used here instead of malloc().

kpv@ulysses.homer.nj.att.com (Phong Vo[drew]) (05/04/89)

In article <10202@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
> In article <1989May3.201118.10221@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:
> >In article <17252@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
> >>I get the feeling that some future BSD will have six different malloc
> >>library routines. . . .
> >It would be too much to ask for one *good* one!  :-) :-(
> 
> Most applications that really care about malloc() efficiency have already
> given up and implemented their own technique, using malloc() simply as an
> sbrk() surrogate.

The old malloc was based on a simple first-fit strategy with a roving pointer.
It is intolerably slow when a process needs to to many mallocs (say thousands
or tens of thousands). The recent BSD malloc is based on a power of 2 buddy
system. It is fast but wastes lots of space. The effect of wasting space is
that in many systems (like bitmap graphics programs) where many mallocs is
typical, you can run out of space even with virtual memory. Worse than that
is wasted time due to the number of page faults when data are accessed frequently.
This is a subtlety that is frequently missed when people run tests on malloc.
There is a paper that Dave Korn and I presented at the 85 Summer Usenix in
Portland Oregon that gave comparative testing results of many versions of malloc.
Sad to say, we also missed the above subtlety in the paper.
However, one can go a long way toward making a compromise between having good
time behavior and good space behavior. Locally, we've been having good
success with a modified version of the best-fit based algorithm presented
in that paper.
	Phong Vo, ulysses!kpv

henry@utzoo.uucp (Henry Spencer) (05/06/89)

In article <11486@ulysses.homer.nj.att.com> kpv@ulysses.homer.nj.att.com (Phong Vo[drew]) writes:
>... Worse than that
>is wasted time due to the number of page faults when data are accessed frequently.
>This is a subtlety that is frequently missed when people run tests on malloc.

If your system is properly designed, the effect is not to run up the page
fault rate, but to increase the demand for physical memory.  (If the
system is not properly designed, or is short of physical memory, you have
problems that malloc cannot fix.)  Frequently-accessed data should be in
physical memory and hence should not fault.
-- 
Mars in 1980s:  USSR, 2 tries, |     Henry Spencer at U of Toronto Zoology
2 failures; USA, 0 tries.      | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bill@twwells.uucp (T. William Wells) (05/06/89)

In article <29978@apple.Apple.COM> desnoyer@Apple.COM (Peter Desnoyers) writes:
: No, No. This algorithm takes O(n^^2) copies in the worst case. If you
: change your code to:
:
:       while data
:         if not room
:           current_size *= 2
:           ptr = realloc( ptr, curr_size)

For text input that is supposed to come from a user, I often use the
following:

	size += size < 480 ? size + 32 : 100;

It generates these numbers:

	0 32 96 224 400 580 680 ...

The theory is that many inputs are written in very short lines, most
are written in 80+, the vast bulk are written significantly under
256, and once you have lines longer than that, you have bigger
problems than the time wasted in this allocation.

Of course, it isn't a lot different from:

	0 32 64 128 256 512 ...

But it frequently does one less allocation, e.g., when reading filled
text.

If one wanted the benefits of the power of two growth, one could
write it without the conditional.

---
Bill                            { uunet | novavax } !twwells!bill

tps@chem.ucsd.edu (Tom Stockfisch) (05/09/89)

In article <10206@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <461@chem.ucsd.EDU> I (Tom Stockfisch) write:
>>You waste up to a factor of 4 with a power-of-2 implementation,
>>not including fragmentation.

>..., I see a factor of two (not four) here,...

Here is the problem, at least with our BSD malloc():  I am reading in data
that needs to be put in one contiguous array.  I call malloc() with a certain
amount, fill that up, call realloc(), fill that up, etc.  Every time
that I cross a power of two boundary, realloc() puts my old memory on the
free list for the previous power of two and gets a whole new chunk.  Thus,
for a large final array size of n = (2^m) + 1 it has allocated
2^4 + 2^5 + 2^6 + 2^7 + 2^8 + ... + 2^(m+1), which is nearly 2^(m+2), which
is nearly 4*n.

-- 

|| Tom Stockfisch, UCSD Chemistry	tps@chem.ucsd.edu

peter@ficc.uu.net (Peter da Silva) (05/10/89)

I have one question about all this... why this need to store the whole of
some randomly-sized chunk of data in a contiguous block of memory? Why not
use something like clists?
-- 
Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Business: uunet.uu.net!ficc!peter, peter@ficc.uu.net, +1 713 274 5180.
Personal: ...!texbell!sugar!peter, peter@sugar.hackercorp.com.

desnoyer@Apple.COM (Peter Desnoyers) (05/12/89)

In article <4132@ficc.uu.net> peter@ficc.uu.net (Peter da Silva) writes:
>I have one question about all this... why this need to store the whole of
>some randomly-sized chunk of data in a contiguous block of memory? Why not
>use something like clists?
>-- 
>Peter da Silva, Xenix Support, Ferranti International Controls Corporation.

Because for a simple program (read it in, crunch it, write it out) you
can use an allocation scheme that is so simple that it will work the
first time. Also, for some data sets the "natural" representation is a
contiguous array, rather than linked chunks, and your processing code
becomes more complicated when you use a more sophisticated allocation
method. In other words, it's a kludge. Unlike many kludges, however,
the payoff is in robustness and simplicity.

				Peter Desnoyers