[net.bugs] help with fixing a shell bug?

chuqui@nsc.UUCP (Chuq Von Rospach) (04/23/84)

I am having a h*ll of a time tracking down a problem in /bin/sh. It seems
to be endemic with 4BSD, and incorporating the appropriate changes from 
SYS V.2 don't help it either. On some occasions the shell goes into a hard
loop in the allocation routines (blok.c, routine alloc()) on a line which
says 'WHILE !busy(q = p->word) DO p->word = q->word OD'. (Yes, that is C
code, folks... well, kinda...). The allocation routines seem to assume that
sbrk return values aligned on word (possible double word) boundries, and
use the lowest bit of the address to signify whether or not the block of
memory contains anything of use. Evidently in certain circumstances it gets
confused and loops through the list forever.

Has anyone else ever seen and/or fixed this bug before? This has been
popping up on our Genix system (on the 16032 chip), and I don't know
whether or not Vaxen or PDP's are prone. Suggestions/help? 

chuq
-- 
From under the bar at Callahan's:		Chuq Von Rospach
{amd70,fortune,hplabs,menlo70}!nsc!chuqui	(408) 733-2600 x242

Never give your heart to a stranger, unless you are sure that you are dead.

willcox@ccvaxa.UUCP (04/24/84)

#R:nsc:-87900:ccvaxa:5200003:000:1537
ccvaxa!willcox    Apr 24 16:29:00 1984

This just came up recently in net.unix-wizards (I think it was), though
the problem reported was a segmentation violation instead of a loop.
Your problem stems from the way that sh allocates memory.  It assumes
that it can use as much memory as it wants, and does so until it gets a
memory fault (or segmentation violation, or whatever your machine calls
it).  It catches the resulting signal, and only then, in the signal
handler, does it do an sbrk() to get more memory.  The trouble is in
the assumption that that the instruction that caused the trap will be
restarted.  This is true on the VAX and PDP-11, but not on some other
machines, e.g. the Gould Concept series, or 68000.  Since the
instruction that usually gets the trap is setting up the free memory
list, said list gets garbled.

On some machines, this causes the behavior you saw.  On others
you see other strange results.

We circumvented the problem by putting code into the kernel to back up
the PC on the appropriate trap, thus ensuring that the offending
instruction would, in fact, be re-executed.  A cleaner and more
permanent solution would have been to fix sh, but we didn't want to
have to deal with the psuedo-Algol code, and were worried (without
justification, it turned out) that other utilities would make the same
broken assumptions about faults.

Welcome to the world of Algol-C.

-------------
David Willcox
(217) 384-8500
Compion Corp., The Software Subsidiary of Gould, Inc.

Unet:	...!uiucdcs!ccvaxa!willcox
Mail:	1101 E. University; Urbana, IL 61801