[net.unix] using malloc and brk: something to watch out for

dave@smaug.UUCP (Dave Cornutt) (06/17/86)

Keywords: malloc, brk
Summary: beware of stdio doing malloc behind your back
Line eater: uh-huh

If you want to use brk() or sbrk() to do memory allocation, there is
something you have to keep in mind: the standard I/O library does
mallocs to allocate buffers for files that it handles.  The problem
is that, at least on our system (UTX/32, a 4.2BSD derivative), malloc
maintains its own notion of where the top of memory is, and it doesn't
know about brk or sbrk calls done external to it.  (Actually, this makes
sense from a performance standpoint: if malloc had to check the actual
top of memory, that would mean a system call for every malloc.)

What happens is this: say you use sbrk to allocate space at the top of
memory for something (like a indeterminately large array).  You do some
stuff in this space, and then you do some printf's or scanf's on some
file (or some stdio operation, like puts/gets, putc/getc, etc.)  If
the printf/whatever is the first I/O operation done on that file, stdio
will malloc a buffer for it at that time.  Since malloc thinks that the
top of memory is still where it was originally, it will happily allocate
a hunk of your array and let stdio scribble on it.  I ran into this
problem a few months ago, and you would not believe the amount of time
we spent running CPU and memory diags because of it.

The moral of the story is this: (1) once you have done a brk or sbrk,
don't do any mallocs, and (2) if you use stdio and brk/sbrk in a
program together, make sure that all files being handled by stdio have
buffers allocated to them prior to doing any brk/sbrk by either
doing I/O on them (which forces a buffer to be allocated), or by
calling setbuf() to explicitly allocate a buffer (alternatively,
you can set it to unbuffered, although this usually hurts performance).
Don't forget about stdin/out/err.

Now for the flame: There should be some way to confine malloc to a certain
heap space by setting an upper limit on how high it can allocate memory.
This way, the programmer could use malloc and still have clear memory
above the top of the heap space.

Dave Cornutt
Gould Computer Systems 
Ft. Lauderdale, FL

"The opinions expressed herein are not necessarily those of my employer,
not necessarily mine, and probably not necessary."

dave@onfcanim.UUCP (Dave Martindale) (06/20/86)

In article <49@houligan.UUCP> dave@smaug.UUCP (Dave Cornutt) writes:
>Keywords: malloc, brk
>Summary: beware of stdio doing malloc behind your back
>
>If you want to use brk() or sbrk() to do memory allocation, there is
>something you have to keep in mind: the standard I/O library does
>mallocs to allocate buffers for files that it handles.
>
> .....
>
>The moral of the story is this: (1) once you have done a brk or sbrk,
>don't do any mallocs, and (2) if you use stdio and brk/sbrk in a
>program together, make sure that all files being handled by stdio have
>buffers allocated to them prior to doing any brk/sbrk by either
>doing I/O on them (which forces a buffer to be allocated), or by
>calling setbuf() to explicitly allocate a buffer (alternatively,
>you can set it to unbuffered, although this usually hurts performance).
>Don't forget about stdin/out/err.

The problem with doing things this way is 1) it is vulnerable being broken
by the next person who works on the code and doesn't understand the
interaction and 2) it may break when you reload it with a new version of
some library that does a malloc where it formerly used a static buffer.

Much better is to use a single consistent memory allocation scheme.
If you are currently using sbrk just to get a large chunk of memory, just
malloc the chunk instead, and then there will be no conflict.

If you really need the flexibility of having a massive chunk of memory
that you can dynamically grow and shrink, then you do need to use brk;
in that case you can write your own malloc that is compatible with
whatever private memory allocation scheme you need to use, and stdio
will then use your malloc.

Either of these is a lot more robust than having two different memory
allocation strategies that your carefully arrange not to interfere with
each other, you think.

>Now for the flame: There should be some way to confine malloc to a certain
>heap space by setting an upper limit on how high it can allocate memory.
>This way, the programmer could use malloc and still have clear memory
>above the top of the heap space.

I disagree.  Malloc is designed to be the single memory manager for the
vast majority of C programs (and it's more portable than sbrk/brk).
For that it works adequately, and I don't think it should be more
complicated in order to handle an unusual situation like yours.  There
will always be some strange situation it will be unable to handle no
matter how it is written, and you can always supply your own version.

chris@umcp-cs.UUCP (Chris Torek) (06/21/86)

In article <49@houligan.UUCP> dave@smaug.UUCP (Dave Cornutt) writes:
>... at least on our system (UTX/32, a 4.2BSD derivative), malloc
>maintains its own notion of where the top of memory is, and it doesn't
>know about brk or sbrk calls done external to it.  (Actually, this makes
>sense from a performance standpoint: if malloc had to check the actual
>top of memory, that would mean a system call for every malloc.)

Not so; and someone may have changed the allocator for UTX, for I
believe that the standard 4.2 malloc did not mind programs increasing
the break.  It does not require one system call per malloc, only
one extra system call per `morecore' (an internal routine used to
get more space by calling sbrk).

Note, however, that decreasing the break will confuse most mallocs,
including the standard 4.2BSD version.

The `moral' in the quoted article does, with some changes, apply to
any code that is to be called `portable': mix not malloc() and brk(),
lest ye someday sorrow.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

jpl@allegra.UUCP (John P. Linderman) (06/25/86)

Chris Torek notes:
> The `moral' in the quoted article does, with some changes, apply to
> any code that is to be called `portable': mix not malloc() and brk(),
> lest ye someday sorrow.

The immediate corollary is ``Thou shalt not use brk()'', because there
in no way to avoid malloc(), at least if you use that paragon of
portability, the standard i/o package.  With 4.3 and ULTRIX, not only
i/o buffers but the FILE structures themselves are acquired with
malloc.  You can supply your own buffers, but I know of no way to avoid
the malloc for the FILE structures.

But if I can't use brk(), and I have an application that uses a LOT of
memory for a short time, then much less memory for a very long time
(sort, as always, comes to mind), then my implementation will hog
memory and perform poorly on those machines that aren't paged.
Programs don't port just because they compile and run, they have to
perform respectably.  In a very real sense, an application may be less
portable if it is denied the use of brk().

If this were just a problem with sorts and brk, I wouldn't worry much.
The more important, underlying issue, is ``Why should the use of brk
result in reduced portability?''.  I'll accept a response like ``Malloc
is a more general paradigm,'' (and I'll put #ifdef's in my code so it
doesn't rely on the existence of brk).  But I am troubled by responses
like ``Some implementations of malloc fail if brk is invoked outside of
malloc.'' As far as I'm concerned, such an implementation is broken.
There's no hope of writing programs that port to broken systems.  You
can add a lot of complexity to your code, and thereby make it less
likely to work on systems that aren't broken, but you can never
anticipate all the ways a library routine might misbehave.  And if you
are not careful, you can ``enshrine'' bugs in such a way that your
code will break if the bugs are fixed.

Of course, ``broken'' and ``portable'' are squishy terms.  For the most
part, I am quite happy if my programs work on BSD, System V, and ULTRIX
distributions, and, for that reason, I will accommodate ``features''
of these systems that I would regard as ``bugs'' in less widely used
systems.  But, at some point, I have to decide that my programs are
``portable enough'', and not worry about the implications of different
implementations, broken or otherwise.

John P. Linderman  Department of Ported Sorts  allegra!jpl