chuqui@nsc.UUCP (Chuq Von Rospach) (04/23/84)
I am having a h*ll of a time tracking down a problem in /bin/sh. It seems to be endemic with 4BSD, and incorporating the appropriate changes from SYS V.2 don't help it either. On some occasions the shell goes into a hard loop in the allocation routines (blok.c, routine alloc()) on a line which says 'WHILE !busy(q = p->word) DO p->word = q->word OD'. (Yes, that is C code, folks... well, kinda...). The allocation routines seem to assume that sbrk return values aligned on word (possible double word) boundries, and use the lowest bit of the address to signify whether or not the block of memory contains anything of use. Evidently in certain circumstances it gets confused and loops through the list forever. Has anyone else ever seen and/or fixed this bug before? This has been popping up on our Genix system (on the 16032 chip), and I don't know whether or not Vaxen or PDP's are prone. Suggestions/help? chuq -- From under the bar at Callahan's: Chuq Von Rospach {amd70,fortune,hplabs,menlo70}!nsc!chuqui (408) 733-2600 x242 Never give your heart to a stranger, unless you are sure that you are dead.
willcox@ccvaxa.UUCP (04/24/84)
#R:nsc:-87900:ccvaxa:5200003:000:1537 ccvaxa!willcox Apr 24 16:29:00 1984 This just came up recently in net.unix-wizards (I think it was), though the problem reported was a segmentation violation instead of a loop. Your problem stems from the way that sh allocates memory. It assumes that it can use as much memory as it wants, and does so until it gets a memory fault (or segmentation violation, or whatever your machine calls it). It catches the resulting signal, and only then, in the signal handler, does it do an sbrk() to get more memory. The trouble is in the assumption that that the instruction that caused the trap will be restarted. This is true on the VAX and PDP-11, but not on some other machines, e.g. the Gould Concept series, or 68000. Since the instruction that usually gets the trap is setting up the free memory list, said list gets garbled. On some machines, this causes the behavior you saw. On others you see other strange results. We circumvented the problem by putting code into the kernel to back up the PC on the appropriate trap, thus ensuring that the offending instruction would, in fact, be re-executed. A cleaner and more permanent solution would have been to fix sh, but we didn't want to have to deal with the psuedo-Algol code, and were worried (without justification, it turned out) that other utilities would make the same broken assumptions about faults. Welcome to the world of Algol-C. ------------- David Willcox (217) 384-8500 Compion Corp., The Software Subsidiary of Gould, Inc. Unet: ...!uiucdcs!ccvaxa!willcox Mail: 1101 E. University; Urbana, IL 61801