[comp.unix.sysv386] malloc

tim@comcon.UUCP (Tim Brown) (12/10/90)

Does anyone know why this code should core dump?

-----------------------
1.
first in main():
	names = NULL;
----------------------
2.
Then:
	if(some_condition && names != NULL)
	{
		free(names);
		names = NULL;
	}
----------------------
3.
Then:
	if(names == NULL)
		if((names = (char *)malloc(BUFF_SIZE)) == NULL)
		{
			perror("malloc");
			exit(errno);
		}
-------------------------

I set this up by setting the char *names equal to NULL at run time and
then when I want to change the memory allocation, I free(names) and
once again set names = NULL, that way I can call the malloc code
repeatedly allocating a different size chunk each time.  I want to be
able to run chunks 2&3 repeatedly.

On my system, ISC2.2, it core dumps on the third time thru.  On an
IBM6000, it works as expected.  I suspect a bug in ISC's malloc.  How
are others doing this?  

It core dumps at the malloc according to sdb.

I know this is comp.lang.c stuff but it seems to possibly be isolated
to ISC.

Thanks for any help.

-- 
Tim Brown            |
Computer Connection  |
uunet!seaeast.wa.com!comcon!tim    |

cpcahil@virtech.uucp (Conor P. Cahill) (12/12/90)

In article <537@comcon.UUCP> tim@comcon.UUCP (Tim Brown) writes:
>Does anyone know why this code should core dump?

  [description of malloc related problem deleted]
>
>On my system, ISC2.2, it core dumps on the third time thru.  On an
>IBM6000, it works as expected.  I suspect a bug in ISC's malloc.  How
>are others doing this?  

I don't suspect a bug in malloc.  instead I expect the problem to be
in your code either with the malloc area that you are talking about, 
or with another area that is being overrun.

I have developed a debugging version of malloc (which was posted to c.s.u
back in may/june) that would probably solve this problem with just a recompile.

If you can't get the library from a nearby archive, send me email and I will
forward a copy to you.

>It core dumps at the malloc according to sdb.

This is probably due to the fact that some malloc memory has been overrun
thereby trashing the malloc chain.

>I know this is comp.lang.c stuff but it seems to possibly be isolated
>to ISC.

I doubt it is tied to ISC.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170

root@dialogic.com (Charlie Root) (04/06/91)

I have run into a problem when using malloc(3C) under Interactive
386/ix 2.2 and Esix.  In one of our library functions we are using
malloc to allocate a structure.  Occasionally we are getting a core
dump from the malloc.  Now, I am assuming that somewhere we have a
rampaging pointer that has trashed the malloc buffer pointers.
However, I have no way of tracing the malloc calls (other than using
sdb on the core dump).  Does anyone know of a package that will allow
me to trace what malloc is doing?  I have gotten a copy of
malloc-trace off of uunet, but that was written for a Sun, and I
haven't tried using it yet.


-- 
Dan Rich                    | drich@dialogic.com  || ...!uunet!dialogic!drich
UNIX Systems Administrator  | "Danger, you haven't seen the last of me!"
Dialogic Corporation        |    "No, but the first of you turns my stomach!"
(201) 334-1268 x213         | -- The Firesign Theatre's Nick Danger

cpcahil@virtech.uucp (Conor P. Cahill) (04/07/91)

root@dialogic.com (Charlie Root) writes:
>I have run into a problem when using malloc(3C) under Interactive
>386/ix 2.2 and Esix.  In one of our library functions we are using
>malloc to allocate a structure.  Occasionally we are getting a core
>dump from the malloc.  Now, I am assuming that somewhere we have a
>rampaging pointer that has trashed the malloc buffer pointers.

I put together a malloc debugging library that was posted to c.s.u last
year.  If you don't have access to it (or if you want a more up to date
copy - I sent r$ two patches last july which still haven't been posted)
send me email and I will forward it to you.

The readme from the package follows:

# (c) Copyright 1990 Conor P. Cahill. (uunet!virtech!cpcahil) 
# You may copy, distribute, and use this software as long as this
# copyright statement is not removed.

This package is a collection of routines which are a drop-in replacement
for the malloc(3), memory(3), string(3), and bstring(3) library functions.

The purpose of these programs is to aid the development and/or debugging
of programs using these functions by providing a high level of consistancy
checking whenever a malloc pointer is used.  Due to this increased 
level of consistancy checking, these functions have a considerably larger
overhead than the standard functions, but the extra checking should be
well worth it in a development environment.

To use these functions all you need to do is compile the library and
include it on your loader command line.  You do not need to recompile
your code, only a relink is necessary.  

Features of this library:

 1. The malloced area returned from each call to malloc is filled with
    non-null bytes.  This should catch any use of uninitialized malloc
    area.  The fill pattern for malloced area is 0x01.

 2. When free is called numerous validity checks are made on the 
    pointer it is passed.  In addition, the data in the malloc block
    beyound the size requested on the initial malloc is checked to 
    verify that it is still filled with the original fill characters.

	This is usefull for catching things like:

		ptr = malloc(5);
		ptr[5] = '\0';

		/*
		 * You should not that this will be caught when it is
		 * freed not when it is done
		 */

    And finally, the freed block is filled with a different fill pattern
    so that you can easily determine if you are still using free'd space.
    The fill pattern for free'd areas is 0x02.

	This is usefull for catching things like:

		ptr = malloc(20);

		bptr = ptr+10;

		/* do something usefule with bptr */

		free(ptr);

		/* 
		 * now try to do something useful with bptr, it should
		 * be trashed enough that it would cause real problems
		 * and when you went to debug the problem it would be
		 * filled with 0x02's and you would then know to look 
		 * for something free'ing what bptr points to.
		 */
		

 3. Whenever a bstring(3)/string(3)/memory(3) function is called, it's 
    parameters are checked as follows:

	If they point somewhere in the malloc arena
		If the operation goes beyond requested malloc space
			call malloc_warning()

	This is usefull for catching things like:

		ptr = malloc(5);
		strcpy(ptr,"abcde");
			
	
 4. Malloc_warning() and malloc_fatal() are used when an error condition
    is detected.  If the error is severe, malloc_fatal is called.  
    Malloc_warning is used otherwise.  The decision about what is fatal
    and what is a warning was made somewhat arbitrarily.

    Warning messages include:

	Calling free with a bad pointer
        Calling a bstring/string/memory (3) function which will go beyond
	    the end of a malloc block (Note that the library function is
            not modified to refuse the operation.  If malloc warnings are
	    in the default IGNORE case, the operation will continue and 
	    at some point cause a real problem).

    Fatal errors are:

	Detectable corruption to the malloc chain.
	

 5. The operations to perform when an error is detected are specified at
    run time by the use of environment variables.

	MALLOC_WARN - specifies the warning error message handling
	MALLOC_FATAL - specifies the fatal error handling


	When one of these error conditions occur you will get an error
	message and the handler will execute based upon what setting
	is in the environment variables.  Currently understood settings
	are as follows:

		  0 - continue operations
		  1 - drop core and exit
		  2 - just exit
		  3 - drop core, but continue executing.  Core files will
	 		be placed into core.[PID].[counter] i.e: core.00123.001
		128 - dump malloc chain and continue
		129 - dump malloc chain, dump core, and exit
		130 - dump malloc chain, exit
		131 - dump malloc chain, dump core, continue processing
		

	There is an additional environment variable MALLOC_ERRFILE which
	is used to indicate the name of the file for error message output.

	For example, to set up the session to generate a core file for
	every malloc warning, to drop core and exit on a malloc fatal, and 
	to log all messages to the file "malloc_log" do the following:

		MALLOC_WARN=131
		MALLOC_FATAL=1
		MALLOC_ERRFILE=malloc_log

		export MALLOC_WARN MALLOC_FATAL MALLOC_ERRFILE

 6. The function malloc_dump() is available to dump the malloc chain whenever
    you might want.  It's only argument is a file descriptor to use to write
    the data.  Review the code if you need to know what data is printed.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170

rwhite@nusdecs.uucp (Robert White) (04/08/91)

Reguarding core dump durring malloc.

This has happened to me, but times that it happened I found the program
to be at fault and not malloc.  (This has happened to me under AT&T
SVR3 on the 3B and 386 implementations)  If you mis-use a chunk of malloced
memory (e.g. write sizeof()+n bytes to the porinter address instead of
limiting writes to sizeof(), or mangle/munge pointer derefrencing before
a write) you can damage the allocation pool structures maintained by the
malloc library.  The next time (or then Nth time) you malloc after that
the structure-traversal-to-find-a-sufficent-size-hole-in-the-pool part
of the allocation can go springing off into places it should not be.
Reading those places are fine (isn't virtual memory wounderful) but
when it traverses the garbage and "finds" the aparence of a whole it
trys to modify the placement structures to allocate the memory.  One
of two things result:

	1)  If the region is within the legally writeable space of the
	process image you get damaged data.  A condition that can be very
	hard to detect as it can take the form of bad function return
	addresses.

	2)  If the region is within a protection area (your code region,
	a shared library map into you process space, the system call entry
	area, constant data space [and/or however those sort of things
	are implemented in your implementation]) you will get a memory
	protection fault (and hence an immediate core dump) durring the
	allocation call.

In short, before you go trying to reverse-engineer your malloc(3) library
you should review the pointer usages in all your source and home-grown
libraries.  Functions most likley to blame are things like strcat, getstr, 
and the like.  Anyplace you pass a pointer to an aray that will be written on
without the size of the aray you should be suspicious.
-- 
Robert C. White Jr.    |  The degree to which a language may be
Network Administrator  |   classified as a "living" language
National University    |  is best expressed as the basic ratio
crash!nusdecs!rwhite   |   of its speakers to its linguists.

cpcahil@virtech.uucp (Conor P. Cahill) (04/08/91)

rwhite@nusdecs.uucp (Robert White) writes:
>In short, before you go trying to reverse-engineer your malloc(3) library
>you should review the pointer usages in all your source and home-grown
>libraries.  Functions most likley to blame are things like strcat, getstr, 

If you read his message again, you will see that he knew that it was probably
a problem in his code, but the standard malloc did not have enough debugging
capabilities to track this down.

>and the like.  Anyplace you pass a pointer to an aray that will be written on
>without the size of the aray you should be suspicious.

Yes and it may take you a long time to track down (especially if it 
is not all your own code).  That is why I put together the debugging version
of the library.  It makes tracking down malloc problems much much easier.
-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170

drich@dialogic.com (Dan Rich) (04/10/91)

It looks like we may have a solution to our malloc() problems.  We
managed to track it a little further using malloc(3X), and the
debug-malloc library.  Apparently, there is a malloc somewhere in a
signal handler.  And, if a signal occurs during a malloc elsewhere in
our code, the signal handler malloc does a very good job of destroying
the malloc pointers in the application.

So, it looks like the solution to this problem is to not put mallocs
in your signal handlers.  :-(

Thanks to everyone who offered suggestions.  They helped to track this
one down!

-- 
Dan Rich                    | drich@dialogic.com  || ...!uunet!dialogic!drich
UNIX Systems Administrator  | "Danger, you haven't seen the last of me!"
Dialogic Corporation        |    "No, but the first of you turns my stomach!"
(201) 334-1268 x213         | -- The Firesign Theatre's Nick Danger

cpcahil@virtech.uucp (Conor P. Cahill) (04/10/91)

drich@dialogic.com (Dan Rich) writes:
>So, it looks like the solution to this problem is to not put mallocs
>in your signal handlers.  :-(

Signal handlers, like the low level kernel stuff, must ensure that 
they don't do something that will effect the outside world without
ensuring that they cannot be interrupted.  This includes mallocs, 
changes to global data (especially pointers), etc.

The kernel's solution is to lock out interrupts that may collide.  C
programs can do the same with signals (put the problem signals in
a hold status - see sigset()).  However, you still end up with the 
limitation that must be very carefull about modifying global
pointers.

The complete answer to the malloc problem would include changes to 
malloc that locked out problem signals while the malloc was being 
performed.

Remember, signal handlers can be called when you code is at 
any location (although because of the way the kernel implements them
they will usually be called near a system call).

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170

jones@acsu.buffalo.edu (terry a jones) (04/11/91)

In article <1991Apr10.144136.13350@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes:
>drich@dialogic.com (Dan Rich) writes:
>>So, it looks like the solution to this problem is to not put mallocs
>>in your signal handlers.  :-(
>
>Signal handlers, like the low level kernel stuff, must ensure that 
>they don't do something that will effect the outside world without
>ensuring that they cannot be interrupted.  This includes mallocs, 
>changes to global data (especially pointers), etc.
>


	Or put another way, make sure that your interrupt level code never
calls routines that are not re-entrant, period.  You may get away with it
99.99% of the time, but calling a non re-entrant version of malloc() in an
interrupt thread that has interrupted another thread that was itself 
executing malloc() can give you big trouble.


	Terry


Terry Jones   				{rutgers,uunet}!acsu.buffalo.edu!jones
SUNY at Buffalo ECE Dept.	  or: rutgers!ub!jones, jones@acsu.buffalo.edu

john@jwt.UUCP (John Temples) (04/11/91)

In article <70183@eerie.acsu.Buffalo.EDU> jones@acsu.buffalo.edu (terry a jones) writes:
>	Or put another way, make sure that your interrupt level code never
>calls routines that are not re-entrant

Is it documented anywhere which system calls are reentrant?  I seem to
recall a thread in another newsgroup about what you can safely to do in
a signal handler, and some people were saying "nothing other than
modifying a global flag and calling signal() to reset the handler."
-- 
John W. Temples -- john@jwt.UUCP (uunet!jwt!john)

cpcahil@virtech.uucp (Conor P. Cahill) (04/11/91)

john@jwt.UUCP (John Temples) writes:

>In article <70183@eerie.acsu.Buffalo.EDU> jones@acsu.buffalo.edu (terry a jones) writes:
>>	Or put another way, make sure that your interrupt level code never
>>calls routines that are not re-entrant

>Is it documented anywhere which system calls are reentrant?  I seem to
>recall a thread in another newsgroup about what you can safely to do in
>a signal handler, and some people were saying "nothing other than
>modifying a global flag and calling signal() to reset the handler."

You can do other things.  But you must ensure that the code you are
executing is

		1) re-entrant

or

		2) it is code that is not normally executed by the rest
		   of your program and you lock out other interrupts while
		   running it.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170

rstevens@noao.edu (Rich Stevens) (04/12/91)

>Is it documented anywhere which system calls are reentrant?

The POSIX.1 standard (Dec. 1990) lists the *safe* functions.
Top of p. 55.  This is the only list I've seen.

	Rich Stevens  (rstevens@noao.edu)

moore@forty2.enet.dec.com (Paul Moore) (05/23/91)

I've recently had this error occuring when malloc is called running an
application on ISC SVR3.2 (observed from the sdb debugger):

   memory fault (11) (sig 11)

The man page for signal(3) indicates that this is a segmentation violation. 

The problem only occurs when malloc() had been previously called in the code
execution path; it doesn't appear when this code path isn't executed.

The problem doesn't appear at all when I run the very same application on
Ultrix.

Any ideas, anyone?

- Paul

cpcahil@virtech.uucp (Conor P. Cahill) (05/24/91)

moore@forty2.enet.dec.com (Paul Moore) writes:

>I've recently had this error occuring when malloc is called running an
>application on ISC SVR3.2 (observed from the sdb debugger):
>   memory fault (11) (sig 11)
>The problem only occurs when malloc() had been previously called in the code
>execution path; it doesn't appear when this code path isn't executed.

>Any ideas, anyone?

My first bet would be that you are overrunning the malloc data that you
allocate (i.e. writing to 10 bytes when you only allocated 8).  Second
guess is that you are expecting the data to be cleared - which it isn't.

To track this down, you should get ahold of the malloc debugging package
that I put together (was posted to c.s.u last year, send email if you 
want an upto date copy).  With that package, the problem will probably
be caught in the function that is overrunniing, and almost certainly
at the subsequent malloc call.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170

toma@swsrv1.cirr.com (Tom Armistead) (05/24/91)

In article <1991May23.094026.18969@hollie.rdg.dec.com> moore@forty2.enet.dec.com (Paul Moore) writes:
>I've recently had this error occuring when malloc is called running an
>application on ISC SVR3.2 (observed from the sdb debugger):
>
>   memory fault (11) (sig 11)
>
>The man page for signal(3) indicates that this is a segmentation violation. 
>
>The problem only occurs when malloc() had been previously called in the code
>execution path; it doesn't appear when this code path isn't executed.
>
>The problem doesn't appear at all when I run the very same application on
>Ultrix.
>
>Any ideas, anyone?
>
>- Paul

These type of errors are usually caused by 1 of 2 things (or both).
  1. Freeing an un-malloc'd or already-free'd pointer.
  2. Overwriting the end of a malloc'd area.

In either case, it's usually on the next malloc() call that you get the core
dump (sometimes *several* malloc's later).

And these types of errors are generally a bitch to find...

Tom
-- 
Tom Armistead - Software Services - 2918 Dukeswood Dr. - Garland, Tx  75040
===========================================================================
toma@swsrv1.cirr.com                {egsner,letni,ozdaltx,void}!swsrv1!toma

jeff@uf.msc.umn.edu (Jeff Turner) (05/25/91)

In article <1991May23.094026.18969@hollie.rdg.dec.com> moore@forty2.enet.dec.com (Paul Moore) writes:
>I've recently had this error occuring when malloc is called running an
>application on ISC SVR3.2 (observed from the sdb debugger):
>
>   memory fault (11) (sig 11)
>
>The man page for signal(3) indicates that this is a segmentation violation. 
>
>The problem only occurs when malloc() had been previously called in the code
>execution path; it doesn't appear when this code path isn't executed.
>
>The problem doesn't appear at all when I run the very same application on
>Ultrix.
>
>Any ideas, anyone?
>
>- Paul
>

The frequent cause of malloc problems that I have observed are from programmers
malloc'ing a buffer for a string based on the string's strlen() (rather than
its real length), and then copying the string into it (which can overwrite
malloc's tables).

What I mean is simply that if you are going to malloc a buffer for a string,
you have to have to make sure you allocate room for the zero byte that 
terminates the string:

Wrong:
	cp = "string";
	new_cp = malloc(strlen(cp));
	strcpy(new_cp, cp);

Right:
	cp = "string";
	new_cp = malloc(strlen(cp)+1);
	strcpy(new_cp, cp);

The fact that the problem goes away when you change hardware platforms
suggests it might be something as simple as what I described.  Different
hardware platforms (for their own reasons) will sometimes pad your request out
to some memory specific alignment (e.g CRAYs pad out to an 8-byte word).
So, if you ask for 4 bytes, and malloc gives you 8, you won't get caught if
you write 1 byte past what you asked for.  However, if you ask for 8 (and 
get 8) you cannot write to the next byte without stomping on malloc's
information.  Likewise, if your take you code to another machine that pads 
mallocs out to 4 byte alignements, the use of the 5th byte will stomp on
malloc's tables (i.e. this is how the same code could produce different
results on different machines).

Most of people I have seen do this know better, but they make the mistake
anyway.  For most people, it is more of a typo than a programming error.

Hope this helps, at least it is something to look for.

-Jeff
---
Jeff Turner                           EMAIL: jeff@msc.edu
Minnesota Supercomputer Center, Inc.  VOICE: (612) 626-0544
Minneapolis, Minnesota  55415           FAX: (612) 624-6550

campbell@redsox.bsw.com (Larry Campbell) (05/26/91)

In article <4161@uc.msc.umn.edu> jeff@uf.UUCP (Jeff Turner) writes:
-
-Wrong:
-	cp = "string";
-	new_cp = malloc(strlen(cp));
-	strcpy(new_cp, cp);
-
-Right:
-	cp = "string";
-	new_cp = malloc(strlen(cp)+1);
-	strcpy(new_cp, cp);

Better:
	cp = "string";
	new_cp = strdup(cp);

-- 
Larry Campbell             The Boston Software Works, Inc., 120 Fulton Street
campbell@redsox.bsw.com    Boston, Massachusetts 02109 (USA)