[comp.sys.sun] Any problems with malloc on OS 4.1?

dale@jack.ci.com (Dale Gallaher) (03/19/91)

For the past few week we have been trying to track down a problem in our
application which started when it was used on OS 4.1.  It is very
sensitive to the users environment.  Briefly what happens is that we get a
Segmentation Violation in malloc when it is called from putenv.  This only
happens sometimes and we don't have any problems elsewhere in what is a 9
Meg executable. 

I have replaced putenv with an older version and I still get the problem;
however, when I use an older version of malloc everything works fine.  I
have heard that malloc was basically rewritten for 4.1 but I don't know
this for sure.  Since I don't have sources for the new malloc I cannot
debug through it and I don't really want to debug through the assembler to
figure out what is going on. 

The application I am having the problem with does use sbrk to allocate
some of its own memory.  In discussions with some other people it has been
mentioned that this might cause problems with the new sun malloc. 

If anyone has had or knows of any problems with malloc please let me know;
otherwise, I will try to go through Sun Support for an answer. 

Dale Gallaher
dale@dandelion.ci.com
(508)667-7900 x121

dale@dandelion.ci.com (Dale Gallaher) (03/24/91)

I got a slew of replies on this problem so I thought it would be helpful
if I brought people up-to-date on what I know.  There were many possible
causes of the problem that were proposed from overwrites in the
application code to known Sun bugs with patches for 4.1 and fixed in
4.1.1.  Thanks to all who replied.

First of all the problem does exist in O.S. 4.1.1 also so it is not fixed
there.  If anything it seems to be a little more frequent in O.S. 4.1.1.
As far as overwrites in the application code I think it is rather unlikely
since the code works fine on O.S. 4.0.3 or earlier and runs fine on all
other platforms.  Of course, it is always possible it is just some
overwrite the new Sun malloc is sensitive to.

The most likely problem is very clearly stated in the Sun sbrk man page,
and I quote

"WARNINGS
     Programs combining the brk() and  sbrk()  system  calls  and
     malloc()  will not work.  Many library routines use malloc()
     internally, so use brk() and sbrk() only when you know  that
     malloc() definitely will not be used by any library routine."

Since our application makes heavy use of both malloc and sbrk and the man
page claims programs doing this "WILL NOT WORK",  it seems that this is
very likely to be the problem.  Also, since the crash is always in putenv
which is a library routine which calls malloc, it is even more likely that
this is the problem.

A bigger problem for me is what to do now.  I have temporarily worked
around the problem by using my own version of malloc.  Unfortunately,
maintaining our own malloc could be a problem with future Sun releases for
different reasons. The use of sbrk is essential to the performance of the
application due to its memory management algorithms.  It seems I am kind
of stuck with my own malloc for now.

The main thing I don't understand is how Sun can make two such basic
functions in Unix incompatible within the same program.  I would really
like to hear someone give a good explanation as to why this wasn't a
ridiculous restriction to create.

Dale Gallaher					Cognition Corp.
dale@dandelion.ci.com				
(508)667-7900 x121				Billerica, MA

barmar@bloom-beacon.mit.edu (Barry Margolin) (03/28/91)

In article <2122@brchh104.bnr.ca> dale@dandelion.ci.com (Dale Gallaher) writes:
>The most likely problem is very clearly stated in the Sun sbrk man page,
>and I quote
>
>"WARNINGS
>     Programs combining the brk() and  sbrk()  system  calls  and
>     malloc()  will not work.  Many library routines use malloc()
>     internally, so use brk() and sbrk() only when you know  that
>     malloc() definitely will not be used by any library routine."
...
>The main thing I don't understand is how Sun can make two such basic
>functions in Unix incompatible within the same program.  I would really
>like to hear someone give a good explanation as to why this wasn't a
>ridiculous restriction to create.

This was discussed a week or so ago in comp.lang.c or comp.unix.questions
or something like that.  Most Unix implementations have this limitation,
but they don't all have the warning in their man pages.

The most common problem is that malloc() can't deal with another routine
calling sbrk() with a negative argument, which deallocates data space from
the process.  Malloc's arena data structures may include pointers into the
memory that is released, and it will get a segmentation violation when it
tries to indirect through one of them.

I also wouldn't be surprised if some malloc() implementations assume that
they are the only ones calling brk() or sbrk() to grow the data space, so
they use all the data space for the arena, rather than checking before
each call to sbrk() to see whether some other routine has allocated
intervening memory.

Barry Margolin, Thinking Machines Corp.
barmar@think.com
{uunet,harvard}!think!barmar