[comp.sys.sgi] Bus error DURING call to malloc

taylorr@glycine.cs.unc.edu (Russell Taylor) (06/07/90)

	We are running OS 3.2.2 on an IRIS 4D/240GTX.  I ran a program and
got the proverbial 'Bus error (core dumped)' message.  The catch is that
when I run dbx and look for the error, it tells me that the error occured
IN malloc():

...Source (of malloc.c) not available...

	There are several calls to malloc() in the code.  There have been
successful calls before this call is made.  All calls are passed constant
references, and this code compiles and runs correctly on a variety of other
machines (VAX, sun 4, DecStation).

	Is there a known bug (and hopefully fix) for this?

	Thanks,
	Russell Taylor
	taylorr@cs.unc.edu

cycy@isl1.ri.cmu.edu (Scum) (06/08/90)

In article <14525@thorin.cs.unc.edu>, taylorr@glycine.cs.unc.edu (Russell Taylor) writes:
| 	We are running OS 3.2.2 on an IRIS 4D/240GTX.  I ran a program and
| got the proverbial 'Bus error (core dumped)' message.  The catch is that
| when I run dbx and look for the error, it tells me that the error occured
| IN malloc():
| 	There are several calls to malloc() in the code.  There have been
| successful calls before this call is made.  All calls are passed constant
| references, and this code compiles and runs correctly on a variety of other
| machines (VAX, sun 4, DecStation).
| 
| 	Is there a known bug (and hopefully fix) for this?

Try linking with the malloc library. Just use -lmalloc as an argument when you
are linking; this will provide an alternative version of malloc which seems to
work better. This has solved the problem for us in similar cases.

Good luck.
						-- Chris.
-- 

                                       -- Chris. (cycy@isl1.ri.cmu.edu)
"People make me pro-nuclear." -- Margarette Smith

paquette@cpsc.ucalgary.ca (Trevor Paquette) (06/11/90)

In article <14525@thorin.cs.unc.edu>, taylorr@glycine.cs.unc.edu (Russell Taylor) writes:
> 
> 	We are running OS 3.2.2 on an IRIS 4D/240GTX.  I ran a program and
> got the proverbial 'Bus error (core dumped)' message.  The catch is that
> when I run dbx and look for the error, it tells me that the error occured
> IN malloc():
> 
> ....Source (of malloc.c) not available...
> 
> 	There are several calls to malloc() in the code.  There have been
> successful calls before this call is made.  All calls are passed constant
> references, and this code compiles and runs correctly on a variety of other
> machines (VAX, sun 4, DecStation).
> 
> 	Is there a known bug (and hopefully fix) for this?
> 
> 	Thanks,
> 	Russell Taylor
> 	taylorr@cs.unc.edu


    In the files that use malloc add the following..
    #include <malloc.h>

     then when compiling add '-lmalloc' to your list of libraries. I have
  had this problem before and this cleared it up.

   Note: this does not 'fix' the problem.. you are now using a different
     malloc.

   Trev

___________________________________________/No man is a failure who has friends
Trevor Paquette  ICBM:51'03"N/114'05"W|I accept the challange, body and soul,
{ubc-cs,utai,alberta}!calgary!paquette|to seek the knowledge of the ones of old
paquette@cpsc.ucalgary.ca             | - engraved on the Kersa Blade of Esalon

yohn@tumult.asd.sgi.com (Mike Thompson) (06/12/90)

In article <14525@thorin.cs.unc.edu>, taylorr@glycine.cs.unc.edu (Russell Taylor) writes:
> 
> 	We are running OS 3.2.2 on an IRIS 4D/240GTX.  I ran a program and
> got the proverbial 'Bus error (core dumped)' message.  The catch is that
> when I run dbx and look for the error, it tells me that the error occured
> IN malloc():
> 
> ...Source (of malloc.c) not available...
> 
> 	There are several calls to malloc() in the code.  There have been
> successful calls before this call is made.  All calls are passed constant
> references, and this code compiles and runs correctly on a variety of other
> machines (VAX, sun 4, DecStation).
> 
> 	Is there a known bug (and hopefully fix) for this?
> 
> 	Thanks,
> 	Russell Taylor
> 	taylorr@cs.unc.edu

I cannot guarantee that there are no bugs in malloc (I assume you are
getting malloc from libc), but I don't know of any (besides performance
problems when allocating many memory areas).  But I have seen many,
many user programs that bomb in malloc because the user code overran
the memory allocated by a call to malloc.  malloc(strlen(s)) and
copying s is a classic way to get into trouble (user forgets that
strlen does not account for the trailing null character) -- there are
many other possibilities.

Since malloc(3X) -- the malloc in /usr/lib/libmalloc.a -- aligns
requests to eight-byte boundaries and malloc(3C) aligns only to
four-bytes, switching to libmalloc may help if only that it masks gives
the caller a little more unrequested rounding space.

This may be what's happening with the malloc calls on your Vaxen, etc.

Now if your program does make many calls to malloc, it is usually best
to link with libmalloc.  The two mallocs do have slightly different
behavior -- libmalloc will return a null pointer when asked for zero
bytes and will ignore a null pointer on free; libc malloc will not
touch the just-freed space until (at least) the next call to malloc/free.
Usually these behaviors are not a concern.

Mike Thompson

dwatts@ki.UUCP (Dan Watts) (06/12/90)

In article <1990Jun10.211156.16153@calgary.uucp> paquette@cpsc.ucalgary.ca (Trevor Paquette) writes:
>In article <14525@thorin.cs.unc.edu>, taylorr@glycine.cs.unc.edu (Russell Taylor) writes:
>> 
>> 	We are running OS 3.2.2 on an IRIS 4D/240GTX.  I ran a program and
>> got the proverbial 'Bus error (core dumped)' message.  The catch is that
>> when I run dbx and look for the error, it tells me that the error occured
>> IN malloc():
> < stuff deleted >
>   Note: this does not 'fix' the problem.. you are now using a different
>     malloc.

My experience has been that this error is caused by a program writing
outside the bounds of a malloc'd area.  This causes the hidden memory
management headers to get corrupted.  A quick hack would be to add some
constant pad to all malloc's.  Try adding 128 bytes and see if that does
it. Since my mistakes are usually in writing one byte too much, a pad
of 16 works ok.  Note that this hasn't solved the code problem, it's just
defensive programming.  You might also not failures in printf() due to
the same reason.  I usually track this down by putting in calls to malloc()
in other places in the code and try to find the _bad_ code by seeing which
ones work, and which ones die (note to free() the malloc()'d mem just
after getting it).
-- 
#####################################################################
# CompuServe: >INTERNET:uunet.UU.NET!ki!dwatts    Dan Watts         #
# UUCP      : ...!uunet!ki!dwatts                 Ki Research, Inc. #
############### New Dimensions In Network Connectivity ##############

swed@aerospace.aero.org (Gregory D. Swedberg) (06/12/90)

	Are you ever calling realloc?  The IRIS does not seem to
implement it correctly, when reallocing to the a larger size it seems
to just return the original pointer rather than a pointer to a new
larger block.  I have had to give up on realloc on the IRIS.

mds@mds.sgi.com (Mark D. Stadler) (06/13/90)

In article <62083@sgi.sgi.com> yohn@tumult.asd.sgi.com (Mike Thompson) writes:
>In article <14525@thorin.cs.unc.edu>, taylorr@glycine.cs.unc.edu (Russell Taylor) writes:
>> 
>> 	We are running OS 3.2.2 on an IRIS 4D/240GTX.  I ran a program and
>> got the proverbial 'Bus error (core dumped)' message.  The catch is that
>> when I run dbx and look for the error, it tells me that the error occured
>> IN malloc():
>> ...
>> 	There are several calls to malloc() in the code.  There have been
>> successful calls before this call is made.  All calls are passed constant
>> references, and this code compiles and runs correctly on a variety of other
>> machines (VAX, sun 4, DecStation).
>> ...
>> 	Is there a known bug (and hopefully fix) for this?
>
>I cannot guarantee that there are no bugs in malloc (I assume you are
>getting malloc from libc), but I don't know of any (besides performance
>problems when allocating many memory areas).  But I have seen many,
>many user programs that bomb in malloc because the user code overran
>the memory allocated by a call to malloc.  malloc(strlen(s)) and
>copying s is a classic way to get into trouble (user forgets that
>strlen does not account for the trailing null character) -- there are
>many other possibilities.
>
>Since malloc(3X) -- the malloc in /usr/lib/libmalloc.a -- aligns
>requests to eight-byte boundaries and malloc(3C) aligns only to
>four-bytes, switching to libmalloc may help if only that it masks gives
>the caller a little more unrequested rounding space.
>

i've examined a number of malloc() problems throughout the last 7 years
or so, and have always traced the problem back to the application...

there are a couple of good reasons that malloc() usage problems are masked
on a machine and libmalloc basis.

first of all, i know that a number of VMS programs have malloc problems
once they are ported to unix.  the VMS malloc rounds the request up to
the nearest multiple of 512 (page size).  then it skips the next virtual
page.  this turns out to be a great debug tool since you get core dumps
when you hit the next page instead of quietly corrupting some other data
structure.  unfortunately, the granularity is only at the page level,
so small problems are masked and only surface in other environments.
VAX unix may act similar, but i don't know for sure.

the traditional libc malloc approach uses a linked list scheme where the
next pointers are embedded in the memory arena.  if you overwrite a chunk
of malloced memory, you corrupt the linked list and the next call to
malloc() will traverse into the boonies.  the libmalloc approach keeps
the pointers into the memory arena in a separate area and therefore, if
you overwrite a chunk of malloced memory, you may corrupt some other data
structure that doesn't really matter anyway... (at least not at the time).
since the next pointers are saved from corruption, malloc() won't dump
core.  but you still have a problem lurking out there somewhere.

i think i'd stick to the old malloc() and narrow the problem down more.
if you mask this symptom, you will make it even more difficult to isolate
a problem further down the road.

-- mds	[aka Mark D Stadler  mds@sgi.com  ...!uunet!sgi!mds  (415)335-1327]

krk@cs.purdue.EDU (Kevin Kuehl) (06/13/90)

In article <789@ki.UUCP> dwatts@ki.UUCP (Dan Watts) writes:
>in other places in the code and try to find the _bad_ code by seeing which
>ones work, and which ones die (note to free() the malloc()'d mem just

Another thing you can do (if you are fortunate enough to have access
to a Sun4) is to use the `malloc_debug(2)' on a Sun4.  This is one of
the greatest tools I have ever used.   On every call to malloc, it
checks the heap and verifies that it is not corrupted.  If it is
corrupted, the program dumps core so you can find it.

This would be great to have under Irix, don't you think?  Whatd'ya say
at SGI?  I would really appreciate this feature.

Kevin
krk@cs.purdue.edu
..!{decwrl,ucbvax,gatech}!purdue!krk

vjs@rhyolite.wpd.sgi.com (Vernon Schryver) (06/13/90)

In article <10822@medusa.cs.purdue.edu>, krk@cs.purdue.EDU (Kevin Kuehl) writes:
> In article <789@ki.UUCP> dwatts@ki.UUCP (Dan Watts) writes:
>
> Another thing you can do (if you are fortunate enough to have access
> to a Sun4) is to use the `malloc_debug(2)' on a Sun4.  This is one of
> the greatest tools I have ever used.   On every call to malloc, it
> checks the heap and verifies that it is not corrupted [...]
> 
> This would be great to have under Irix, don't you think?  Whatd'ya say
> at SGI?  I would really appreciate this feature.


We've shipped versions of malloc that did this in the distant past.  I
don't think the malloc(3) shipped on 4D's in the last 2-3 years had this.
Howver, consider the mallopt() function and the following paragraph, cut
from a window displaying the IRIX 3.3 malloc(3X) man page:

|    M_DEBUG  Turns debug checking on if value is not equal to 0, otherwise
|             turns debug checking off.  When debugging is on, each call to
|             malloc and free causes the entire malloc arena to be scanned and
|             checked for consistency.  This option may be invoked at any
|             time.  Note that when debug checking is on, the performance of
|             malloc is reduced considerably.

There have been internal discussions about possibly enhancing this feature
in a future release to make it slower and more paranoid.  (This would be good.)

If you complain enough, you might convince the powers that be to ship some
neat "memory-leak" tools developed in the window system wars.

It is worth noting that the semantics of libmalloc and ancient malloc
differ slightly.  Particularly sloppy code has trouble with libmalloc.


Vernon Schryver
vjs@sgi.com

taylorr@pooh.cs.unc.edu (Russell Taylor) (06/13/90)

	I traced the problem down by using the Saber-C product we recently
got for our Suns.  The problem (as many people responded) was that I was
doing strange things to memory that had been gotten via calls to malloc().
	The Saber-C environment checked for the strangeness and showed me
right where it was happening.  Once I fixed it, the problem went away.
	Thank you all for your suggestions!

	Russell Taylor

yohn@tumult.asd.sgi.com (Mike Thompson) (06/14/90)

In article <75367@aerospace.AERO.ORG>, swed@aerospace.aero.org (Gregory D. Swedberg) writes:
> 
> 
> 	Are you ever calling realloc?  The IRIS does not seem to
> implement it correctly, when reallocing to the a larger size it seems
> to just return the original pointer rather than a pointer to a new
> larger block.  I have had to give up on realloc on the IRIS.

Are you implying that realloc isn't returning a large enough buffer for
your (new) request?  The whole purpose of realloc is to avoid copying
data around whenever possible.  To that end, realloc will check to see
if it can grow the current buffer to satisfy the request and just pass
back the same (grown) buffer.  If there isn't enough room to grow the
current buffer, a new buffer will be allocated, the data copied, and
the old buffer released.

If you think that realloc is returning the same buffer and it hasn't
grown the buffer adequately, please call the customer support hot line
immediately with details.

Mike Thompson