[comp.sys.sgi] The memory eater strikes back

XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) (10/21/89)

Hello,

  I've posted this one month ago, but I think I've never seen an answer to
the following problem:

  To get best performance in memory allocation I want to use the malloc(3X)
routines by using '-lmalloc' at link time. Now it seems that there is something
wrong with it, because 'free' doesn't seem to work when using '-lmalloc'. The
code fragment attached to this mail does allocation/deallocation of some
ammount of memory in a endless loop. Using the libraries

              -lgl_s -lbsd -lfastm -lm -lc_s            #(make mem)

everything is fine. Using instead the libraries

              -lgl_s -lmalloc -lbsd -lfastm -lm -lc_s   #(make meml)

the process' memory size is increasing and on a 8MB GT the system is after
about 50 steps saturated with really heavy paging. What I really would
like to know is:

a) Is it my fault (using wrong library order?)?
b) Is it a bug?
   - known?
   - fixed when?
c) Is there really a performance gain when using -lmalloc (supposed it
   works properly)?

Any comments are welcome and appreciated.


Regards
Martin Knoblauch

TH-Darmstadt
Physical Chemistry 1
Petersenstrasse 20
D-6100 Darmstadt, FRG

BITNET: <XBR2D96D@DDATHD21>

-------------------------makefile-----------------------------------------
#
#  make - directives
#
CFLAGS = -g  -I/usr/include/bsd
#
# Library Selection
#
LIBRL = -lgl_s -lmalloc -lbsd -lfastm -lm -lc_s
LIBR = -lgl_s -lbsd -lfastm -lm -lc_s
#
#
#
mem:    mem.c
        cc mem.c $(CFLAGS) -o mem $(LIBR)
#
meml:   mem.c
        cc mem.c $(CFLAGS) -o meml $(LIBRL)
#
-------------------------mem.c--------------------------------------------
/*
**      MOLCAD  Version 4.1
**
**      COPYRIGHT AND ALL OTHER RIGHTS RESERVED
**
**      Contact:
**
**       Prof. Dr. J. Brickmann
**       c/o TH - Darmstadt
**       Dept. for Physical Chemistry
**       Petersenstr. 20
**       D-6100 Darmstadt, FRG
**
**       BITNET  : <XBR2D96D@DDATHD21.BITNET>
**
**
**       file    :  mem.c
**       author  :  Martin Knoblauch + Michael Teschner
**       date    :
**       purpose :  memory allocation test
**       comment :
**
**
**
**
*/

#include <stdio.h>
#include <malloc.h>

int acount,dcount;

struct Dot { struct Dot *next;
             float arr[4];
            };

extern struct Dot *mk_Newdot();

main()
{
struct Dot *first,*dot;
int i,j,count;

first = NULL;

for(j=0;j<1000;j++){

    mk_Deldots(first);
    dot = first = NULL;

    count = 0;
    for(i=0;i<5000;i++){
      dot = mk_Newdot(dot);
       count++;
      if( first == NULL )  first = dot;
      }

 printf(" loop %d count %d \n",j,count);
 }

} /* end main */



struct Dot *mk_Newdot(prev)
struct Dot *prev;
{
struct Dot *help;

if ((help = (struct Dot *)malloc(sizeof(struct Dot))) == NULL)
        return(NULL);

if (prev != NULL) prev->next = help;
help->next = NULL;
return(help);
}


mk_Deldots(start)
struct Dot *start;
{
struct Dot *help;

while (start != NULL) {
  help = start->next;
  free(start);
  start = help;
  }
}
--------------------------------------------------------------------------

moraes@CS.TORONTO.EDU (Mark Moraes) (10/22/89)

In comp.sys.sgi you write:

>a) Is it my fault (using wrong library order?)?

Nope. Even if you remove all libraries except for -lmalloc, it still
grows. You can see the problem over only a couple of iterations by
printing the value of sbrk(0) after every loop. The break will grow
steadily when using -lmalloc or amalloc/afree from -lmpc. With libc
malloc, the BSD4.3 malloc or any other working malloc, the value stays
constant after the first iteration.

>b) Is it a bug?
>   - known?
>   - fixed when?

Looks like a bug. Not fixed in Irix 3.2, it seems.

>c) Is there really a performance gain when using -lmalloc (supposed it
>   works properly)?

Not likely if it doesn't free...

The standard libc malloc is about the speed of the "fast" BSD4.3
(Caltech) malloc for your example code (which is straight allocation
followed by free -- not very demanding on most mallocs).  But the
Caltech malloc typically wastes twice as much memory, which can cause
more paging if you use a lot of memory.  (If free() doesn't work in
-lmalloc, it isn't very useful, no matter how fast it is -- on our
Power Iris, it takes about twice as long as the libc malloc...)

The libmpc amalloc and afree show the same behaviour as -lmalloc if you
modify your code to acreate an arena first, and add a grow function.

Stay with the libc malloc unless profiling your application indicates
that malloc is a bottleneck. At that point, consider custom allocation
strategies for the most frequent uses of malloc. (like preallocating
and managing memory pools of frequently used objects, using pages of
memory where only the page is freed, using stack allocators with
mark/release etc)

madd@world.std.com (jim frost) (10/24/89)

About SGI's and memory leakage:

Something to remember is that the SGI graphical object library has
memory leaks.  This is a random fact that I ran into which I though
some people might be interested in.

In article <89Oct21.211825edt.3287@neat.cs.toronto.edu> moraes@CS.TORONTO.EDU (Mark Moraes) writes:
|Stay with the libc malloc unless profiling your application indicates
|that malloc is a bottleneck. At that point, consider custom allocation
|strategies for the most frequent uses of malloc. (like preallocating
|and managing memory pools of frequently used objects [...] )

The libc malloc slows considerably when dealing with many small object
allocations and deallocations (typically a few hundred thousand if I
remember right) where the BSD malloc degrades "reasonably"; pooled
allocations will improve performance dramatically if you are doing
this type of allocation on the SGI.

The libmalloc malloc, even if broken, is good to run some tests with
because it smashes the malloc'ed area; we found many bugs because of
this behavior (and even more when running on a machine which
disallowed null pointer dereferencing :-).

jim frost
software tool & die
madd@std.com

moraes@CSRI.TORONTO.EDU (Mark Moraes) (10/24/89)

 | The libc malloc slows considerably when dealing with many small object
 | allocations and deallocations (typically a few hundred thousand if I
 | remember right) where the BSD malloc degrades "reasonably"; pooled
 | allocations will improve performance dramatically if you are doing
 | this type of allocation on the SGI.

True. I should also amend my earlier comment: For large numbers of
small allocations, -lmalloc does indeed perform much faster than even
the BSD4.3 malloc (and appears to free stuff correctly) For the
specific case posted, it does not free correctly, and runs much
slower.  There are similar cases where the BSD malloc, while not
losing performance in terms of CPU, will gobble up memory and causes
paging activity.  (eg. allocate a 1000 elements of 50 bytes each, free
them all, then allocate a single element of 2000 bytes and watch it
sbrk again, unnecessarily)

 | The libmalloc malloc, even if broken, is good to run some tests with
 | because it smashes the malloc'ed area; we found many bugs beca | use of
 | this behavior (and even more when running on a machine which
 | disallowed null pointer dereferencing :-).

Huh? In Irix3.2, it doesn't necessarily smash the contents of freed
blocks (I assume you mean smash the malloc'ed area on free -- I'd be
rather displeased with a malloc that trashed the contents of the
malloc'ed blocks:-) The following program prints hello world twice
even when compiled with -lmalloc. Change that to "hello
xxxxxxxxxxxxxxxxxxxxxxxxxxxx world" and it will then smash the freed
block.

Smashing the contents of a freed block (among other things) is
desirable behaviour in a debugging malloc -- it degrades performance
enough that you probably don't want it in your final code.

-lmalloc won't work with the old kludge where you were allowed to rely
on a freed block being undamaged till the next malloc. -lmalloc also
returns NULL on malloc(0), following the SVID. Both are good for
people who care about portability.

#include <stdio.h>

#define HELLO "hello world\n"

main()
{
    extern char *malloc();
    char *cp = malloc(sizeof(HELLO));

    strcpy(cp, HELLO);
    fputs(cp, stdout);
    free(cp);
    fputs(cp, stdout);
    exit(0);
}

pj@fjord.sgi.com (10/24/89)

In article <8910210901.aa24434@SMOKE.BRL.MIL> XBR2D96D@DDATHD21.BITNET (Knobi der Rechnerschrat) writes:

>  I want to use the malloc(3X)
> Now it seems that there is something
> wrong with it, because 'free' doesn't seem to work when using '-lmalloc'.

Yes, the 3.1 releases of IRIX did have a memory leakage with alot of
small (under 28 bytes ??) allocs/frees.  This was fixed in release 3.2.

We believe that libmalloc will provide substantially better CPU
performance and perhaps less memory fragmentation.

Considerable work was done on libmalloc for 3.2, and we recommend its
use for examples like yours.

I assume that you have profiled your application, so that you already
know that optimizing malloc is important to your performance.  And I
assume that you have noticed the behavioural differences between libc
malloc and libmalloc, primarily that you should not dereference a
pointer after freeing it when using libmalloc.

				Thanks, take care ...
				Paul Jackson (pj@sgi.com), x1373

zombie@voodoo.UUCP (Mike York) (10/26/89)

In article <89Oct21.211825edt.3287@neat.cs.toronto.edu> moraes@CS.TORONTO.EDU (Mark Moraes) writes:
>Nope. Even if you remove all libraries except for -lmalloc, it still
>grows. You can see the problem over only a couple of iterations by
>printing the value of sbrk(0) after every loop. The break will grow
>steadily when using -lmalloc or amalloc/afree from -lmpc. With libc
>malloc, the BSD4.3 malloc or any other working malloc, the value stays
>constant after the first iteration.
>
>>b) Is it a bug?
>>   - known?
>>   - fixed when?
>
>Looks like a bug. Not fixed in Irix 3.2, it seems.

Actually, it seems that it IS fixed in 3.2:

I'm running it right now on a 4D/70GT with 8MB and Irix 3.2 -- no
problems.  After the first iteration, the value of sbrk(0) remained
constant.  On a 4D/70G with 8MB running Irix 3.1, the value of sbrk
does indeed grow, and after 60 iterations, it REALLY slows down.
However, by inserting mallopt(M_MXFAST, 0) in mem.c before the main
loop, the program works as desired under 3.1.

-- 
Mike York
Boeing Computer Services, Renton, Washington
(206) 234-7724
uw-beaver!ssc-vax!voodoo!zombie