toma@killer.UUCP (Tom Armistead) (05/05/88)
I've been having an *interesting* thing happen under System 5, release 3.1 Unix, on a 3b2/310 and was wondering if I could get any insight from any one as to the problem (or cause of)??? I am getting signal 10 (buss error) in the middle of a malloc call. It doesn't happen under any regular set of circumstances as far as I can tell. From sdb I can tell that that everything was set up ok, (but how can you mess up on a malloc call?) One case in mind was: structptr = (struct st *)malloc( sizeof( struct st ) ); structptr was NULL before the call and was still after the crash. The size of the struct is 230 bytes. We don't have kernel source so I am not able to go into malloc except for dissasembly. The instruction the thing dies on is a BITW (I think?) maybe something like: BITW 0(%r7),1 The system has 2meg of memory and at the time of the crash is running 20 process and very heavily loaded. This has happend in about half of those 20 processes at random times in random places in the code (C of cource). All the processes use malloc, realloc and free a WHOLE lot. Call me crazy, but shouldn't malloc just return me an error if there are problems??? Any ideas??? This this is about to drive me to COBOL!!! Any help would be GREATLY appreciated! Thanks, Tom ---- UUCP: ...!ihnp4!killer!toma -- ------------- Tom Armistead UUCP: ...!ihnp4!killer!toma
gwyn@brl-smoke.ARPA (Doug Gwyn ) (05/05/88)
In article <3989@killer.UUCP> toma@killer.UUCP (Tom Armistead) writes: >Call me crazy, but shouldn't malloc just return me an error if there >are problems??? Since I haven't finished writing /dev/telepathy, I can't remotely debug your application, but in general problems like the one you report result from a bug in the application code. malloc() maintains a linked list of storage (heap) blocks with "busy" bit indicators attached to the block headers (in addition to the links). If your application scribbles on one of these headers, or if it even frees an already-free block, then a later (perhaps MUCH later) invocation of malloc() can run amok. Source code licensees can recompile libc/gen/malloc.c to enable a slew of debugging checks that often detect such heap abuse early. This really should be provided as a separate /usr/lib/*.[oa] for binary customers to use, but it probably hasn't been. One trick you can try is to provide your own simple memory allocator called MyAlloc()/MyFree() that performs stringent consistency checks but uses malloc() to obtain an initial large chunk of heap space to be subdivided and reallocated by your allocator. Then recompile your application with "-Dmalloc=MyAlloc -Dfree=MyFree" in CFLAGS in your Makefile, link it with your debugging-allocator object, and see what turns up. Note that the C library will continue to use the real malloc(), but presumably it knows what it's doing and will use it correctly. (The only way I think this checking could fail is if the malloc arena corruption is due to abusing some other C library routine.) Good luck.
friedl@vsi.UUCP (Stephen J. Friedl) (05/06/88)
In article <3989@killer.UUCP>, toma@killer.UUCP (Tom Armistead) writes: > I am getting signal 10 (buss error) in the middle of a malloc call. > It doesn't happen under any regular set of circumstances as far as I can > tell. From sdb I can tell that that everything was set up ok, (but how can > you mess up on a malloc call?) It is almost certainly a corruption of malloc's arena pointers by a program bug. Malloc keeps its blocks in a linked list, and the word just before its return to you points to the *next* area: +---------+ | pointer |--->-\ +---------+ | malloc return-->| | | | Your | | | memory | | | chunk | v | here | | | | | +---------+ | | |<----/ If these pointers get messed up (easy to do, just overwrite a chunk or free() a random pointer), it becomes a core-dump party. > The instruction > the thing dies on is a BITW (I think?) maybe something like: > BITW 0(%r7),1 The low bit of the "pointer" above indicates whether the block is free or busy. This instruction is almost certainly testing this bit on a crazy, overwritten, invalid pointer. > All the processes use malloc, realloc and free a WHOLE lot. Oh boy :-(. The bummer here is that the failure happens long after the corruption occurs, and these can be the most difficult bugs to track down. The best bet (on the 3B2, at least), is to use the specialized malloc(3x) functions with the -lmalloc library. These are implemented differently and may help the bugs show up in different ways. If life gets really rough you can write a routine that will run through the malloc chain looking for problems. This will help track down where a random memory write is trashing the malloc chains: checkmalloc(); crazy_function(); checkmalloc(); If the first passes and the second doesn't, you're getting closer. Good luck. -- Steve Friedl V-Systems, Inc. (714) 545-6442 3B2-kind-of-guy friedl@vsi.com {backbones}!vsi.com!friedl attmail!vsi!friedl
gandalf@csli.STANFORD.EDU (Juergen Wagner) (05/06/88)
The most likely source of strange effects like the ones you describe are some strcpy/strcat/scanf/fgets/... which write beyond the end of some string of chars. However, this does not necessarily have to be the cause of your problems. Consider also the following: o functions using up more arguments than provided, o functions called with a variable number of args but popping them with the wrong size. o scanf/sscanf reads double into floats. o some buffer isn't large enough. All this might not clobber the malloc area but the call stack, in which case you may detect that much later, and in an unexpected manner. Some time ago, somebody posted a malloc package with debugging aids, and I'll be glad to forward it to you. -- Juergen "Gandalf" Wagner, gandalf@csli.stanford.edu Center for the Study of Language and Information (CSLI), Stanford CA
lm@arizona.edu (Larry McVoy) (05/06/88)
In article <3989@killer.UUCP> toma@killer.UUCP (Tom Armistead) writes: >I've been having an *interesting* thing happen under System 5, release 3.1 >Unix, on a 3b2/310 and was wondering if I could get any insight from >any one as to the problem (or cause of)??? I dunno if this is it or not, but I have found out (the hard way) that malloc keeps info in the memory it allocates (actually: is about to allocate). The bottom line is that if you overrun a malloced area you will cause crashes that seem to stem from inside the malloc lib itself. My problems occured on a Vax running 4.3+NFS. -- "Peace and Unity - Neon Prophet, Tucson AZ" Larry McVoy lm@arizona.edu or ...!{uwvax,sun}!arizona.edu!lm
toma@killer.UUCP (Tom Armistead) (05/08/88)
I want to thank all of you for the responses... Summary of my Quest... I went through that stuff with a FINE tooth comb and didn't find didley - (excuse the Texas accent...) After much time on the phone with an AT&T techie, I was told that they had experienced problems in malloc (core dumps and such) when it had been used excessively with small blocks, as I was doing, and that this was something to do with the free memory stuff getting garbled up... He told me to use the special malloc(3X) library with '-lmalloc' on the cc command line. I did this and the problem has gone away!!! "Thank you Antie Em, I'm not CRAZY!!!" Thanks again for all the help... Tom --- UUCP: ...!ihnp4!killer!toma -- ------------- Tom Armistead UUCP: ...!ihnp4!killer!toma
dce@mips.COM (David Elliott) (05/08/88)
In article <4016@killer.UUCP> toma@killer.UUCP (Tom Armistead) writes: >He told me to use the special malloc(3X) library with >'-lmalloc' on the cc command line. I did this and the problem has gone >away!!! "Thank you Antie Em, I'm not CRAZY!!!" I hate to burst your bubble, Tom, but this doesn't really show that the standard libc malloc() is broken. It turns out that malloc(3X) is slightly different. For example, if you malloc() 0 bytes, one of the mallocs returns NULL and one returns a pointer. I have also seen cases (some compiler product) where using -lmalloc fixed a bug (similar to your case), and it turned out that there was actually a bug in the code. Sure, it may be that you've triggered a rare bug in malloc(), but malloc() is used so much that it really should be bomb-proof. -- David Elliott dce@mips.com or {ames,prls,pyramid,decwrl}!mips!dce
jfh@rpp386.UUCP (John F. Haugh II) (05/10/88)
In article <2149@quacky.mips.COM> dce@mips.COM (David Elliott) writes: >In article <4016@killer.UUCP> toma@killer.UUCP (Tom Armistead) writes: >>He told me to use the special malloc(3X) library with >>'-lmalloc' on the cc command line. I did this and the problem has gone >>away!!! "Thank you Antie Em, I'm not CRAZY!!!" > >I hate to burst your bubble, Tom, but this doesn't really show that the >standard libc malloc() is broken. [ and later he goes to say it also doesn't prove you don't have a bug in your code. ] below is some code i use to check the consistency of mallocs in a large database i am working on. it checks the number of malloc/free pairs, and the leading edge for consistency. this code is copyright john f. haugh ii, 1987, 1988, all rights reserved (by the way ;-) [ and for the `but your code has X problem' people - it works on everything it has been run on. trouble is getting it to run on a 9370 and a PC/XT without major changes. ] static int d_malcnt; char *x_malloc (size) int size; { char *cp; char **tp; if (! (cp = malloc (size + sizeof (char *)))) abort (); d_malcnt++; tp = (char **) cp; *tp = &cp[sizeof (char *)]; return (*tp); } x_free (cp) char *cp; { char **tp; if (cp == (char *) 0) abort (); tp = (char **) &cp[- sizeof (char *)]; if (*tp != cp) abort (); *tp = (char *) 0; free (tp); if (! d_malcnt--) abort (); } using this particular code (with the comments still present no less ;-> has helped me locate a countless number of bugs in the code. adding code to check for the upper edge would help some too. - john. -- John F. Haugh II | "You see, I want a lot. Perhaps I want every River Parishes Programming | -thing. The darkness that comes with every UUCP: ihnp4!killer!rpp386!jfh | infinite fall and the shivering blaze of DOMAIN: jfh@rpp386 | every step up ..." -- Rainer Maria Rilke
fox@alice.marlow.reuters.co.uk (Paul Fox) (05/16/88)
In article <1620@rpp386.UUCP> jfh@rpp386.UUCP (The Beach Bum) writes: >In article <2149@quacky.mips.COM> dce@mips.COM (David Elliott) writes: > >below is some code i use to check the consistency of mallocs in a large >database i am working on. Oh well, I may as well post some code ... the following is my front end to malloc/free/realloc. I use this to ensure that I do not corrupt my memory areas, or try to free something thats never been allocated. This code is portable - but requires you to call chk_alloc/chk_free and chk_realloc although #defines could be used avoid changing existing code. If this code is used, then any memory freed by a normal free() must have been allocated by a malloc() (not chk_alloc()). Its usually best to recompile everything and link in the library. If a section 3 function calls malloc() it will bypass chk_alloc, and so the freeing of this memory must be done by free() (not chk_free()). -----cut here------ # include <stdio.h> extern char *malloc(); # define MAGIC 0x464f5859L /* FOXY */ # define FREED 0x46524545L /* FREE */ int cnt_alloc = 0; char * chk_alloc(n) { register char *cp = malloc(n + 4); register long *lp = (long *) cp; if (lp) { *lp++ = MAGIC; cnt_alloc++; } return (char *) lp; } char * chk_realloc(ptr, n) char *ptr; { char *realloc(); long *lp = (long *) ptr; if (*--lp != MAGIC) chk_failed("Realloc non-alloced memory."); lp = (long *) realloc((char *) lp, n+4); return (char *) (lp + 1); } chk_free(ptr) char *ptr; { long *lp = (long *) ptr; if (*--lp == FREED) chk_failed("Trying to free already freed memory."); if (*lp != MAGIC) chk_failed("Freeing non-alloced memory."); cnt_alloc--; *lp = FREED; free((char *) lp); } chk_failed(str) char *str; { fprintf(stderr, "CHK_ALLOC: %s\r\n", str); abort(); } ---- cut here ------ ===================== // o All opinions are my own. (O) ( ) The powers that be ... / \_____( ) o \ | /\____\__/ _/_/ _/_/ UUCP: fox@alice.marlow.reuters.co.uk
chris@mimsy.UUCP (Chris Torek) (05/18/88)
In article <350@alice.marlow.reuters.co.uk> fox@alice.marlow.reuters.co.uk
(Paul Fox) provides a simple checking version of malloc. Note that
it assumes that one can store a single `long' in the address returned
by malloc, and increment the result by the size of that long, and that
the resulting pointer is still `well aligned'. As far as I can tell
there is no way to avoid some similar assumption. It might be nice
to have an include file with a macro or function that aligns a pointer:
#include <align.h>
...
void *p, *q;
q = align(p);
/* or possibly better */
void *p; int off;
off = align_off(p);
where <align.h> might read
#define align(p) ((void *)(((long)(p) + 3) & ~3))
for a machine with four-byte alignment, or
#define align_off(p) ((8 - ((long)(p) & 7)) & 7)
for a machine with eight-byte alignment, and maybe even
int align_off(void *p);
for a machine with wacky alignment constraints.
--
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris