tim@comcon.UUCP (Tim Brown) (12/10/90)
Does anyone know why this code should core dump? ----------------------- 1. first in main(): names = NULL; ---------------------- 2. Then: if(some_condition && names != NULL) { free(names); names = NULL; } ---------------------- 3. Then: if(names == NULL) if((names = (char *)malloc(BUFF_SIZE)) == NULL) { perror("malloc"); exit(errno); } ------------------------- I set this up by setting the char *names equal to NULL at run time and then when I want to change the memory allocation, I free(names) and once again set names = NULL, that way I can call the malloc code repeatedly allocating a different size chunk each time. I want to be able to run chunks 2&3 repeatedly. On my system, ISC2.2, it core dumps on the third time thru. On an IBM6000, it works as expected. I suspect a bug in ISC's malloc. How are others doing this? It core dumps at the malloc according to sdb. I know this is comp.lang.c stuff but it seems to possibly be isolated to ISC. Thanks for any help. -- Tim Brown | Computer Connection | uunet!seaeast.wa.com!comcon!tim |
cpcahil@virtech.uucp (Conor P. Cahill) (12/12/90)
In article <537@comcon.UUCP> tim@comcon.UUCP (Tim Brown) writes: >Does anyone know why this code should core dump? [description of malloc related problem deleted] > >On my system, ISC2.2, it core dumps on the third time thru. On an >IBM6000, it works as expected. I suspect a bug in ISC's malloc. How >are others doing this? I don't suspect a bug in malloc. instead I expect the problem to be in your code either with the malloc area that you are talking about, or with another area that is being overrun. I have developed a debugging version of malloc (which was posted to c.s.u back in may/june) that would probably solve this problem with just a recompile. If you can't get the library from a nearby archive, send me email and I will forward a copy to you. >It core dumps at the malloc according to sdb. This is probably due to the fact that some malloc memory has been overrun thereby trashing the malloc chain. >I know this is comp.lang.c stuff but it seems to possibly be isolated >to ISC. I doubt it is tied to ISC. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
root@dialogic.com (Charlie Root) (04/06/91)
I have run into a problem when using malloc(3C) under Interactive 386/ix 2.2 and Esix. In one of our library functions we are using malloc to allocate a structure. Occasionally we are getting a core dump from the malloc. Now, I am assuming that somewhere we have a rampaging pointer that has trashed the malloc buffer pointers. However, I have no way of tracing the malloc calls (other than using sdb on the core dump). Does anyone know of a package that will allow me to trace what malloc is doing? I have gotten a copy of malloc-trace off of uunet, but that was written for a Sun, and I haven't tried using it yet. -- Dan Rich | drich@dialogic.com || ...!uunet!dialogic!drich UNIX Systems Administrator | "Danger, you haven't seen the last of me!" Dialogic Corporation | "No, but the first of you turns my stomach!" (201) 334-1268 x213 | -- The Firesign Theatre's Nick Danger
cpcahil@virtech.uucp (Conor P. Cahill) (04/07/91)
root@dialogic.com (Charlie Root) writes: >I have run into a problem when using malloc(3C) under Interactive >386/ix 2.2 and Esix. In one of our library functions we are using >malloc to allocate a structure. Occasionally we are getting a core >dump from the malloc. Now, I am assuming that somewhere we have a >rampaging pointer that has trashed the malloc buffer pointers. I put together a malloc debugging library that was posted to c.s.u last year. If you don't have access to it (or if you want a more up to date copy - I sent r$ two patches last july which still haven't been posted) send me email and I will forward it to you. The readme from the package follows: # (c) Copyright 1990 Conor P. Cahill. (uunet!virtech!cpcahil) # You may copy, distribute, and use this software as long as this # copyright statement is not removed. This package is a collection of routines which are a drop-in replacement for the malloc(3), memory(3), string(3), and bstring(3) library functions. The purpose of these programs is to aid the development and/or debugging of programs using these functions by providing a high level of consistancy checking whenever a malloc pointer is used. Due to this increased level of consistancy checking, these functions have a considerably larger overhead than the standard functions, but the extra checking should be well worth it in a development environment. To use these functions all you need to do is compile the library and include it on your loader command line. You do not need to recompile your code, only a relink is necessary. Features of this library: 1. The malloced area returned from each call to malloc is filled with non-null bytes. This should catch any use of uninitialized malloc area. The fill pattern for malloced area is 0x01. 2. When free is called numerous validity checks are made on the pointer it is passed. In addition, the data in the malloc block beyound the size requested on the initial malloc is checked to verify that it is still filled with the original fill characters. This is usefull for catching things like: ptr = malloc(5); ptr[5] = '\0'; /* * You should not that this will be caught when it is * freed not when it is done */ And finally, the freed block is filled with a different fill pattern so that you can easily determine if you are still using free'd space. The fill pattern for free'd areas is 0x02. This is usefull for catching things like: ptr = malloc(20); bptr = ptr+10; /* do something usefule with bptr */ free(ptr); /* * now try to do something useful with bptr, it should * be trashed enough that it would cause real problems * and when you went to debug the problem it would be * filled with 0x02's and you would then know to look * for something free'ing what bptr points to. */ 3. Whenever a bstring(3)/string(3)/memory(3) function is called, it's parameters are checked as follows: If they point somewhere in the malloc arena If the operation goes beyond requested malloc space call malloc_warning() This is usefull for catching things like: ptr = malloc(5); strcpy(ptr,"abcde"); 4. Malloc_warning() and malloc_fatal() are used when an error condition is detected. If the error is severe, malloc_fatal is called. Malloc_warning is used otherwise. The decision about what is fatal and what is a warning was made somewhat arbitrarily. Warning messages include: Calling free with a bad pointer Calling a bstring/string/memory (3) function which will go beyond the end of a malloc block (Note that the library function is not modified to refuse the operation. If malloc warnings are in the default IGNORE case, the operation will continue and at some point cause a real problem). Fatal errors are: Detectable corruption to the malloc chain. 5. The operations to perform when an error is detected are specified at run time by the use of environment variables. MALLOC_WARN - specifies the warning error message handling MALLOC_FATAL - specifies the fatal error handling When one of these error conditions occur you will get an error message and the handler will execute based upon what setting is in the environment variables. Currently understood settings are as follows: 0 - continue operations 1 - drop core and exit 2 - just exit 3 - drop core, but continue executing. Core files will be placed into core.[PID].[counter] i.e: core.00123.001 128 - dump malloc chain and continue 129 - dump malloc chain, dump core, and exit 130 - dump malloc chain, exit 131 - dump malloc chain, dump core, continue processing There is an additional environment variable MALLOC_ERRFILE which is used to indicate the name of the file for error message output. For example, to set up the session to generate a core file for every malloc warning, to drop core and exit on a malloc fatal, and to log all messages to the file "malloc_log" do the following: MALLOC_WARN=131 MALLOC_FATAL=1 MALLOC_ERRFILE=malloc_log export MALLOC_WARN MALLOC_FATAL MALLOC_ERRFILE 6. The function malloc_dump() is available to dump the malloc chain whenever you might want. It's only argument is a file descriptor to use to write the data. Review the code if you need to know what data is printed. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
rwhite@nusdecs.uucp (Robert White) (04/08/91)
Reguarding core dump durring malloc. This has happened to me, but times that it happened I found the program to be at fault and not malloc. (This has happened to me under AT&T SVR3 on the 3B and 386 implementations) If you mis-use a chunk of malloced memory (e.g. write sizeof()+n bytes to the porinter address instead of limiting writes to sizeof(), or mangle/munge pointer derefrencing before a write) you can damage the allocation pool structures maintained by the malloc library. The next time (or then Nth time) you malloc after that the structure-traversal-to-find-a-sufficent-size-hole-in-the-pool part of the allocation can go springing off into places it should not be. Reading those places are fine (isn't virtual memory wounderful) but when it traverses the garbage and "finds" the aparence of a whole it trys to modify the placement structures to allocate the memory. One of two things result: 1) If the region is within the legally writeable space of the process image you get damaged data. A condition that can be very hard to detect as it can take the form of bad function return addresses. 2) If the region is within a protection area (your code region, a shared library map into you process space, the system call entry area, constant data space [and/or however those sort of things are implemented in your implementation]) you will get a memory protection fault (and hence an immediate core dump) durring the allocation call. In short, before you go trying to reverse-engineer your malloc(3) library you should review the pointer usages in all your source and home-grown libraries. Functions most likley to blame are things like strcat, getstr, and the like. Anyplace you pass a pointer to an aray that will be written on without the size of the aray you should be suspicious. -- Robert C. White Jr. | The degree to which a language may be Network Administrator | classified as a "living" language National University | is best expressed as the basic ratio crash!nusdecs!rwhite | of its speakers to its linguists.
cpcahil@virtech.uucp (Conor P. Cahill) (04/08/91)
rwhite@nusdecs.uucp (Robert White) writes: >In short, before you go trying to reverse-engineer your malloc(3) library >you should review the pointer usages in all your source and home-grown >libraries. Functions most likley to blame are things like strcat, getstr, If you read his message again, you will see that he knew that it was probably a problem in his code, but the standard malloc did not have enough debugging capabilities to track this down. >and the like. Anyplace you pass a pointer to an aray that will be written on >without the size of the aray you should be suspicious. Yes and it may take you a long time to track down (especially if it is not all your own code). That is why I put together the debugging version of the library. It makes tracking down malloc problems much much easier. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
drich@dialogic.com (Dan Rich) (04/10/91)
It looks like we may have a solution to our malloc() problems. We managed to track it a little further using malloc(3X), and the debug-malloc library. Apparently, there is a malloc somewhere in a signal handler. And, if a signal occurs during a malloc elsewhere in our code, the signal handler malloc does a very good job of destroying the malloc pointers in the application. So, it looks like the solution to this problem is to not put mallocs in your signal handlers. :-( Thanks to everyone who offered suggestions. They helped to track this one down! -- Dan Rich | drich@dialogic.com || ...!uunet!dialogic!drich UNIX Systems Administrator | "Danger, you haven't seen the last of me!" Dialogic Corporation | "No, but the first of you turns my stomach!" (201) 334-1268 x213 | -- The Firesign Theatre's Nick Danger
cpcahil@virtech.uucp (Conor P. Cahill) (04/10/91)
drich@dialogic.com (Dan Rich) writes: >So, it looks like the solution to this problem is to not put mallocs >in your signal handlers. :-( Signal handlers, like the low level kernel stuff, must ensure that they don't do something that will effect the outside world without ensuring that they cannot be interrupted. This includes mallocs, changes to global data (especially pointers), etc. The kernel's solution is to lock out interrupts that may collide. C programs can do the same with signals (put the problem signals in a hold status - see sigset()). However, you still end up with the limitation that must be very carefull about modifying global pointers. The complete answer to the malloc problem would include changes to malloc that locked out problem signals while the malloc was being performed. Remember, signal handlers can be called when you code is at any location (although because of the way the kernel implements them they will usually be called near a system call). -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
jones@acsu.buffalo.edu (terry a jones) (04/11/91)
In article <1991Apr10.144136.13350@virtech.uucp> cpcahil@virtech.uucp (Conor P. Cahill) writes: >drich@dialogic.com (Dan Rich) writes: >>So, it looks like the solution to this problem is to not put mallocs >>in your signal handlers. :-( > >Signal handlers, like the low level kernel stuff, must ensure that >they don't do something that will effect the outside world without >ensuring that they cannot be interrupted. This includes mallocs, >changes to global data (especially pointers), etc. > Or put another way, make sure that your interrupt level code never calls routines that are not re-entrant, period. You may get away with it 99.99% of the time, but calling a non re-entrant version of malloc() in an interrupt thread that has interrupted another thread that was itself executing malloc() can give you big trouble. Terry Terry Jones {rutgers,uunet}!acsu.buffalo.edu!jones SUNY at Buffalo ECE Dept. or: rutgers!ub!jones, jones@acsu.buffalo.edu
john@jwt.UUCP (John Temples) (04/11/91)
In article <70183@eerie.acsu.Buffalo.EDU> jones@acsu.buffalo.edu (terry a jones) writes: > Or put another way, make sure that your interrupt level code never >calls routines that are not re-entrant Is it documented anywhere which system calls are reentrant? I seem to recall a thread in another newsgroup about what you can safely to do in a signal handler, and some people were saying "nothing other than modifying a global flag and calling signal() to reset the handler." -- John W. Temples -- john@jwt.UUCP (uunet!jwt!john)
cpcahil@virtech.uucp (Conor P. Cahill) (04/11/91)
john@jwt.UUCP (John Temples) writes: >In article <70183@eerie.acsu.Buffalo.EDU> jones@acsu.buffalo.edu (terry a jones) writes: >> Or put another way, make sure that your interrupt level code never >>calls routines that are not re-entrant >Is it documented anywhere which system calls are reentrant? I seem to >recall a thread in another newsgroup about what you can safely to do in >a signal handler, and some people were saying "nothing other than >modifying a global flag and calling signal() to reset the handler." You can do other things. But you must ensure that the code you are executing is 1) re-entrant or 2) it is code that is not normally executed by the rest of your program and you lock out other interrupts while running it. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
rstevens@noao.edu (Rich Stevens) (04/12/91)
>Is it documented anywhere which system calls are reentrant?
The POSIX.1 standard (Dec. 1990) lists the *safe* functions.
Top of p. 55. This is the only list I've seen.
Rich Stevens (rstevens@noao.edu)
moore@forty2.enet.dec.com (Paul Moore) (05/23/91)
I've recently had this error occuring when malloc is called running an application on ISC SVR3.2 (observed from the sdb debugger): memory fault (11) (sig 11) The man page for signal(3) indicates that this is a segmentation violation. The problem only occurs when malloc() had been previously called in the code execution path; it doesn't appear when this code path isn't executed. The problem doesn't appear at all when I run the very same application on Ultrix. Any ideas, anyone? - Paul
cpcahil@virtech.uucp (Conor P. Cahill) (05/24/91)
moore@forty2.enet.dec.com (Paul Moore) writes: >I've recently had this error occuring when malloc is called running an >application on ISC SVR3.2 (observed from the sdb debugger): > memory fault (11) (sig 11) >The problem only occurs when malloc() had been previously called in the code >execution path; it doesn't appear when this code path isn't executed. >Any ideas, anyone? My first bet would be that you are overrunning the malloc data that you allocate (i.e. writing to 10 bytes when you only allocated 8). Second guess is that you are expecting the data to be cleared - which it isn't. To track this down, you should get ahold of the malloc debugging package that I put together (was posted to c.s.u last year, send email if you want an upto date copy). With that package, the problem will probably be caught in the function that is overrunniing, and almost certainly at the subsequent malloc call. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
toma@swsrv1.cirr.com (Tom Armistead) (05/24/91)
In article <1991May23.094026.18969@hollie.rdg.dec.com> moore@forty2.enet.dec.com (Paul Moore) writes: >I've recently had this error occuring when malloc is called running an >application on ISC SVR3.2 (observed from the sdb debugger): > > memory fault (11) (sig 11) > >The man page for signal(3) indicates that this is a segmentation violation. > >The problem only occurs when malloc() had been previously called in the code >execution path; it doesn't appear when this code path isn't executed. > >The problem doesn't appear at all when I run the very same application on >Ultrix. > >Any ideas, anyone? > >- Paul These type of errors are usually caused by 1 of 2 things (or both). 1. Freeing an un-malloc'd or already-free'd pointer. 2. Overwriting the end of a malloc'd area. In either case, it's usually on the next malloc() call that you get the core dump (sometimes *several* malloc's later). And these types of errors are generally a bitch to find... Tom -- Tom Armistead - Software Services - 2918 Dukeswood Dr. - Garland, Tx 75040 =========================================================================== toma@swsrv1.cirr.com {egsner,letni,ozdaltx,void}!swsrv1!toma
jeff@uf.msc.umn.edu (Jeff Turner) (05/25/91)
In article <1991May23.094026.18969@hollie.rdg.dec.com> moore@forty2.enet.dec.com (Paul Moore) writes: >I've recently had this error occuring when malloc is called running an >application on ISC SVR3.2 (observed from the sdb debugger): > > memory fault (11) (sig 11) > >The man page for signal(3) indicates that this is a segmentation violation. > >The problem only occurs when malloc() had been previously called in the code >execution path; it doesn't appear when this code path isn't executed. > >The problem doesn't appear at all when I run the very same application on >Ultrix. > >Any ideas, anyone? > >- Paul > The frequent cause of malloc problems that I have observed are from programmers malloc'ing a buffer for a string based on the string's strlen() (rather than its real length), and then copying the string into it (which can overwrite malloc's tables). What I mean is simply that if you are going to malloc a buffer for a string, you have to have to make sure you allocate room for the zero byte that terminates the string: Wrong: cp = "string"; new_cp = malloc(strlen(cp)); strcpy(new_cp, cp); Right: cp = "string"; new_cp = malloc(strlen(cp)+1); strcpy(new_cp, cp); The fact that the problem goes away when you change hardware platforms suggests it might be something as simple as what I described. Different hardware platforms (for their own reasons) will sometimes pad your request out to some memory specific alignment (e.g CRAYs pad out to an 8-byte word). So, if you ask for 4 bytes, and malloc gives you 8, you won't get caught if you write 1 byte past what you asked for. However, if you ask for 8 (and get 8) you cannot write to the next byte without stomping on malloc's information. Likewise, if your take you code to another machine that pads mallocs out to 4 byte alignements, the use of the 5th byte will stomp on malloc's tables (i.e. this is how the same code could produce different results on different machines). Most of people I have seen do this know better, but they make the mistake anyway. For most people, it is more of a typo than a programming error. Hope this helps, at least it is something to look for. -Jeff --- Jeff Turner EMAIL: jeff@msc.edu Minnesota Supercomputer Center, Inc. VOICE: (612) 626-0544 Minneapolis, Minnesota 55415 FAX: (612) 624-6550
campbell@redsox.bsw.com (Larry Campbell) (05/26/91)
In article <4161@uc.msc.umn.edu> jeff@uf.UUCP (Jeff Turner) writes:
-
-Wrong:
- cp = "string";
- new_cp = malloc(strlen(cp));
- strcpy(new_cp, cp);
-
-Right:
- cp = "string";
- new_cp = malloc(strlen(cp)+1);
- strcpy(new_cp, cp);
Better:
cp = "string";
new_cp = strdup(cp);
--
Larry Campbell The Boston Software Works, Inc., 120 Fulton Street
campbell@redsox.bsw.com Boston, Massachusetts 02109 (USA)