tom@enmasse.UUCP (Tom Murdock) (05/15/86)
Has anyone out there had a problem with the Motorola System V version 2 shell (/bin/sh) getting Memory fault - core dumped messages when trying to run the :mkcmd script to make certain unix utilities. It seems likely that other large recursive shell scripts might also run into the same problem (I think I have seen it with lorder). Running with a large environment may help reproduce this problem. I would be interested if anyone else has seen this bug because I am trying to determine whether the problem is a generic sh bug as I suspect or just one specific to our hardware or System V implementation. The symptoms seem to be that 1) You are using your stack when you try and get more memory with the addblok() routine in blok.c. 2) The value used to set your brk() limit to is a random value from the new memory. 3) If this value is some large garbage value (sometimes it is 0 or a small number) your sbrk will fail but due to no error checking the shell thinks it has successfully added a large block of memory. 4) The shell will then get a memory fault if it tries to access memory beyond its real limit. If it always stays within its real limit it will run fine and the error is not noticed. Putting debugging statements in our version 1 shell also indicates that it occasionally uses the garbage value(s) and thinks its sbrk limit is super high, although it doesn't seem to exceed its real limit to get the error as often as the new version. Another possible symptom of this problem is that we have occasionally seen run away shells that have acquired huge amounts of memory. It seems feasible that they found a garbage value that was large but within the systems limit, and they run for a while well holding a huge chunk of memory, causing massive swapping activity, etc... The related code is in blok.c. It seems that when the stack is in use a pointer is set up beyond the stack. This pointer is later used by all paths of the code presuming that the value at its address is a value that the shell should own up to (i.e. should be sbrk()'ed to). My theory is that this value is not set up when the pointer is gotten from beyond the stack and the correct fix is to simply make sure it is set up or to just skip the sbrk() in this case. This fix seems to correct this problem although I am unsure enough of what this code is really trying to do that I am unsure of whether I have broken something else. If someone has a good idea of how this routine is supposed to work, I would like some feedback on whether this is the correct fix.