dale@jack.ci.com (Dale Gallaher) (03/19/91)
For the past few week we have been trying to track down a problem in our application which started when it was used on OS 4.1. It is very sensitive to the users environment. Briefly what happens is that we get a Segmentation Violation in malloc when it is called from putenv. This only happens sometimes and we don't have any problems elsewhere in what is a 9 Meg executable. I have replaced putenv with an older version and I still get the problem; however, when I use an older version of malloc everything works fine. I have heard that malloc was basically rewritten for 4.1 but I don't know this for sure. Since I don't have sources for the new malloc I cannot debug through it and I don't really want to debug through the assembler to figure out what is going on. The application I am having the problem with does use sbrk to allocate some of its own memory. In discussions with some other people it has been mentioned that this might cause problems with the new sun malloc. If anyone has had or knows of any problems with malloc please let me know; otherwise, I will try to go through Sun Support for an answer. Dale Gallaher dale@dandelion.ci.com (508)667-7900 x121
dale@dandelion.ci.com (Dale Gallaher) (03/24/91)
I got a slew of replies on this problem so I thought it would be helpful if I brought people up-to-date on what I know. There were many possible causes of the problem that were proposed from overwrites in the application code to known Sun bugs with patches for 4.1 and fixed in 4.1.1. Thanks to all who replied. First of all the problem does exist in O.S. 4.1.1 also so it is not fixed there. If anything it seems to be a little more frequent in O.S. 4.1.1. As far as overwrites in the application code I think it is rather unlikely since the code works fine on O.S. 4.0.3 or earlier and runs fine on all other platforms. Of course, it is always possible it is just some overwrite the new Sun malloc is sensitive to. The most likely problem is very clearly stated in the Sun sbrk man page, and I quote "WARNINGS Programs combining the brk() and sbrk() system calls and malloc() will not work. Many library routines use malloc() internally, so use brk() and sbrk() only when you know that malloc() definitely will not be used by any library routine." Since our application makes heavy use of both malloc and sbrk and the man page claims programs doing this "WILL NOT WORK", it seems that this is very likely to be the problem. Also, since the crash is always in putenv which is a library routine which calls malloc, it is even more likely that this is the problem. A bigger problem for me is what to do now. I have temporarily worked around the problem by using my own version of malloc. Unfortunately, maintaining our own malloc could be a problem with future Sun releases for different reasons. The use of sbrk is essential to the performance of the application due to its memory management algorithms. It seems I am kind of stuck with my own malloc for now. The main thing I don't understand is how Sun can make two such basic functions in Unix incompatible within the same program. I would really like to hear someone give a good explanation as to why this wasn't a ridiculous restriction to create. Dale Gallaher Cognition Corp. dale@dandelion.ci.com (508)667-7900 x121 Billerica, MA
barmar@bloom-beacon.mit.edu (Barry Margolin) (03/28/91)
In article <2122@brchh104.bnr.ca> dale@dandelion.ci.com (Dale Gallaher) writes: >The most likely problem is very clearly stated in the Sun sbrk man page, >and I quote > >"WARNINGS > Programs combining the brk() and sbrk() system calls and > malloc() will not work. Many library routines use malloc() > internally, so use brk() and sbrk() only when you know that > malloc() definitely will not be used by any library routine." ... >The main thing I don't understand is how Sun can make two such basic >functions in Unix incompatible within the same program. I would really >like to hear someone give a good explanation as to why this wasn't a >ridiculous restriction to create. This was discussed a week or so ago in comp.lang.c or comp.unix.questions or something like that. Most Unix implementations have this limitation, but they don't all have the warning in their man pages. The most common problem is that malloc() can't deal with another routine calling sbrk() with a negative argument, which deallocates data space from the process. Malloc's arena data structures may include pointers into the memory that is released, and it will get a segmentation violation when it tries to indirect through one of them. I also wouldn't be surprised if some malloc() implementations assume that they are the only ones calling brk() or sbrk() to grow the data space, so they use all the data space for the arena, rather than checking before each call to sbrk() to see whether some other routine has allocated intervening memory. Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar