[comp.unix.aux] collapsing forks

pechner@ddtg.com (Michael Pechner) (05/25/91)

I am having a very strange problem with A/UX.  I can cause a fork to 
"sort of" fail by having a huge automatic variable.  Here are two 
programs to show my point.  The only difference between the two programs 
are that junk is an automatic in the version that fails, and a global static 
in the version that works.

The working program:

	#include<signal.h>
	static char junk[256000];
	void catch()
	{
		printf("signal caught pid %d\n", getpid());
	}
	main(){
		int ret;
		int pip[2];
		char buf[3];

		printf("pipe call %d\n", pipe(pip));
		ret=fork();
		printf("fork just occured  %d \n", ret);
		if((ret=fork()) == 0){ /*child */
			signal(SIGALRM, catch);
			alarm(10);

			printf("return from child write %d\n", write(pip[1], "a", 1));
			sleep(2);
			exit(1);
		}
		else if(ret > 0){ /* parent */
			signal(SIGALRM, catch);
			alarm(10);
			printf("return from parent read %d\n", read (pip[0], buf, 1));
			printf("parent read %c\n", buf[0]);
			sleep(2);
			exit(1);
		}
	}

The output:
	pipe call 0
	fork just occured  0 
	fork just occured  991 
	return from child write 1
	return from parent read 1
	parent read a
	return from child write 1
	return from parent read 1
	parent read a

The non-working program.  The only difference is that "junk" is 
an automatic variable.

	#include<signal.h>
	void catch()
	{
		printf("signal caught pid %d\n", getpid());
	}
	main(){
		int ret;
		int pip[2];
		char buf[3];
		char junk[256000];

		printf("pipe call %d\n", pipe(pip));
		ret=fork();
		printf("fork just occured  %d \n", ret);
		if((ret=fork()) == 0){ /*child */
			signal(SIGALRM, catch);
			alarm(10);

			printf("return from child write %d\n", write(pip[1], "a", 1));
			sleep(2);
			exit(1);
		}
		else if(ret > 0){ /* parent */
			signal(SIGALRM, catch);
			alarm(10);
			printf("return from parent read %d\n", read (pip[0], buf, 1));
			printf("parent read %c\n", buf[0]);
			sleep(2);
			exit(1);
		}
	}


The output:
	pipe call 0
	fork just occured  1013 
	signal caught pid 1011
	return from parent read -1
	parent read 


Notice that on the failed call, the fork returns a valid pid to the parent.
The child does not return.
The child process collapses immediately.

Can anybody explain this to me?


-- 
pechner@mikey.ddtg.com (Michael Pechner)  | Pizza Probably The Worlds Most
DuPont Design Technologies Group          | Perfect Food.
Santa Clara, Ca                           | Carbo, Meat, Dairy, And Veggie 
                                          | All Food Groups In One.

mycroft@kropotki.gnu.ai.mit.edu (Charles Hannum) (05/30/91)

In article <1991May24.215945.13707@ddtg.com> pechner@ddtg.com (Michael Pechner) writes:

   I am having a very strange problem with A/UX.  I can cause a fork to 
   "sort of" fail by having a huge automatic variable.  Here are two 
   programs to show my point.  The only difference between the two programs 
   are that junk is an automatic in the version that fails, and a global static 
   The working program:

	   static char junk[256000];

   The non-working program.  The only difference is that "junk" is 
   an automatic variable.

	   main(){
		   /*...*/
		   char junk[256000];
	   }

   Notice that on the failed call, the fork returns a valid pid to the parent.
   The child does not return.
   The child process collapses immediately.

   Can anybody explain this to me?


I believe so.  B-)

In the first example, 'junk' is indeed being allocated at the top level, as a
global variable.  This means the space is allocated in a separate data
segment when the program is started.  Since presumably your machine has enough
RAM and/or virtual memory, this works.

In the second example, 'junk' is indeed being allocated as an automatic
variable.  Most C compilers cause this data to be placed on the stack.  For
some reason, tossing 250K on the stack at once gives A/UX indigestion, and
your program dies.  (As an aside:  On what systems, using what operating
systems, *will* this work, and on which ones will it crash the kernel?  B-) )

mouse@thunder.mcrcim.mcgill.edu (der Mouse) (06/02/91)

In article <MYCROFT.91May29194828@kropotki.gnu.ai.mit.edu>, mycroft@kropotki.gnu.ai.mit.edu (Charles Hannum) writes:
> In article <1991May24.215945.13707@ddtg.com> pechner@ddtg.com (Michael Pechner) writes:
>> I am having a very strange problem with A/UX.  I can cause a fork to
>> "sort of" fail by having a huge automatic variable.
>>	   main(){
>>		   char junk[256000];
>>		   /*...*/
>>	   }
> In the second example, 'junk' is indeed being allocated as an
> automatic variable.  Most C compilers cause this data to be placed on
> the stack.  For some reason, tossing 250K on the stack at once gives
> A/UX indigestion, and your program dies.

This is generally an interaction of wild-pointer detection and
automatic stack growth.  I can't speak about A/UX specifically, but I
have seen systems that do act as I'm about to describe.

The problem is that we want to grow the stack automatically as
necessary.  This is done by the memory-fault handler: it simply grows
the stack to encompass the out-of-bounds access.

But there's a conflict: an access through a random pointer shouldn't
result in the stack being grown by gigabytes to encompass the
reference.  So there's a (relatively) small window.  Stack accesses off
the end of the stack, but not too far off the end of the stack, cause
stack growth; accesses too far beyond the end of the stack produce
memory violation faults.

So that quarter-meg of stack growth in one jump is probably enough that
the next stack access oversteps this window.

One possible patch is

	char junk[256000];
	int i;
	for (i=0;i<256000;i+=1000) junk[i] = 0;

where you may need to declare i before junk, and you may need to access
junk backwards, depending on your compiler and your machine.  (The
declaration order depends on how the compiler arranges variables on the
stack; the access order depends on whether your stack grows up or
down.)  This generates accesses that extend the stack gradually instead
of all at once.  Of course, if the function entry prologue code does
stack accesses after allocating automatic variable space, you're sunk
without a trace no matter what you try.

					der Mouse

			old: mcgill-vision!mouse
			new: mouse@larry.mcrcim.mcgill.edu