[net.bugs.4bsd] Rogue looping

clives@sri-unix (08/19/82)

Rogue of any version can be apparently hung by anything which hogs I/O - like
more rogue games.  The same thing happens if there's a big job on and rogue's
running niced.

It tends to be a chain reaction; one runaway tty-less shell is sufficient to
lock up all rogue games, leading to punchouts.

The games don't go away with punchout because job control/tty driver/shell
bugs prevent the game from seeing the HUP.

The following program will eliminate almost all the trouble. The reason
it isn't 100 percent effective is that it's always possible to prevent a
niced job from getting *any* cycles - the 4.1 priority scheduler does not work
as advertised. 

We have had no problems since implementing this except for the times
our Develcon data switch has become distracted and dropped terminals
in some cute manner which doesn't get the SIGHUP sent.

I hope 4.2 is better designed.


			Clive Steward
			Tektronix, Beaverton




/**************************************************************************

This program runs games niced at a given level, with dependable actions
upon hangup.

It begins an attempt to play a game by forking a new process, creating a 
new process control group for it (and any future friends), then passing
the control terminal to that group. It then attempts to overlay the game
into being using an exec.

In the event that the game can't be executed, the reason is reported, and 
the control terminal passed back to the waiting process. That process is 
killed by a SIGINT, which lets the shell be properly informed, but without
enticing it to write extraneous messages to the operator. The child dies of
it's own accord. This subterfuge is used  to avoid a multitude of ways in
which simple death of the child at this point can produce a hang.

If the exec is successful, the game runs with the control terminal, and
the parent process sleeps, waiting for chances to housekeep as follows:

It passes signals other than SIGHUP which affect the game back up
to the shell, thus making job control perform as it should.
If the shell tries to continue the process, the control terminal is 
repassed and the game and friends restarted.

If the game quits because of an uncaught SIGHUP, this program detects
that and clears the shell, then exits itself.

If the game exits, it checks to see if the terminal still exists.
If not, we're hung up, whatever the reason, and so leave cleanly 
just as above. If the terminal is still there, we simply exit,
passing the exit status of the game up to the parent shell.

Whether or not the shell has finished it's job control process, if
this program is stopped, a hangup should hit either game or waiting process
depending on control terminal manipulation by the (c)shell. Either way,
clean exit will be arranged, which is something the shell doesn't manage itself 
even on simple jobs if ^Z has been used (probably in other situations as well).

Usage: games [gamename [gameargs]]

Compile: cc -O games.c -o games -ljobs

Installation: (Careful please!)

The games program operates setgid, leaving the uid set to the calling
user. It's done this way to maintain access control to the games, while
allowing the user to own all the running processes. This considerably simplifies
the use of signals, and means the user can see the processes with an ordinary
ps command, that w and ps -a* commands will show true runners of games,
and that if there is some screwup, the owner can externally kill his
own processes by sending signals to them.

There must be one consistent group owner for the games themselves, the
games/lib directory, and the lib contents.

The historic group owner seems to be daemon.

The games themselves must be group executable (710), the games/lib directory
must be 775, and the lib contents must permit the group to do whatever
the owner can. (It varies). All this is not presently nor typically consistent,
and will probably have to be checked on with each new release.

After /usr/bin/games executable group ownership is set, it's permission should
be set to 2711 (setgid).

/usr/bin/games should be chown and chgrp to daemon.

Notas Bene:

This program uses V7 4.1 job control and process group functions heavily.
It creates and manages a new process group much in the manner of csh.
Thus it is likely to need recompilation with any new releases,
(it would have with 4.0 to 4.1), and may not work even then.

As mentioned above, the group ownerships of the games, directories, and lib
contents will probably have to be cleaned up with any new release as well.

If a new release should provide signals and shells which handle hangups cleanly
under all conditions (i.e. kill all outstanding jobs, then die without traces),
then the games program could revert to the nice() function followed by 
a simple execvp overlay of itself.

Clive Steward 23 April 1982

***************************************************************************/

#include <stdio.h>
#include <signal.h>
#include <wait.h>
#include <sys/ioctl.h>
#include <sgtty.h>

int childpgrp,waitpgrp;	/* external so onchild() can see */

main(argc,argv)

int argc;
char *argv[];

{
int onchild(),blowaway();

char commandloc[256]; /* might as well be as big as line buffer */

if(argc == 1 ) {
	printf("These games are available:\n\n");
	system("ls /usr/games");
	printf("\nusage: games [name]\n\n");
	exit(1);
}
++argv; /* start at argv[1], the name of the game */
nice(16); /* this nice level will be passed to game too */
strcpy(commandloc,"/usr/games/");
strcat(commandloc,*argv);
sigsys(SIGTTIN,SIG_IGN); /* don't want applesauce from pgrp changes, ioctls */
sigsys(SIGTTOU,SIG_IGN); /* ditto */
sigsys(SIGCHLD,SIG_IGN); /* for the moment, just to ward away stutters */
waitpgrp = getpgrp(0); /* do it this way so prog will work with /bin/bsh too */
if((childpgrp = fork()) == 0) {
	childpgrp = getpid(); /* number for a new process group */
	ioctl(0,TIOCSPGRP,&childpgrp); /* tell tty driver it's the one first */
	setpgrp(0,childpgrp); /* then start using it */
	execvp(commandloc,argv); /* execvp also covers shell scripts */

	/* code below reached only if can't exec game. */

	perror("games"); /* tell the reason why not */

	ioctl(0,TIOCSPGRP,&waitpgrp); /* this first so bsh can inherit tty */

	/* next line necessary to prevent lockup due to csh picking off
	returns to wait3 if we are not sole job in the job control queue. */

	kill(getppid(),SIGINT); /* unannounced sure bump for waiting parent */

	_exit(-1);
	}

if (childpgrp < 1)  { /* fork didn't happen */
	fprintf(stderr,"games: couldn't fork to start game.\n");
	exit(-1);
	}

sigsys(SIGHUP,blowaway); /* HUP here probably only if shell does job control */

loop:
sigsys(SIGCHLD,onchild); /* this is the whole point right here */
pause(); /* sleep until we hear signal that something happened */
ioctl(0,TIOCSPGRP,&childpgrp); /* if we get back, give tty to game again, */
killpg(childpgrp,SIGCONT); /* then restart game and any friends */
goto loop; /* finally set up to rest again */
}

onchild()
{
union wait w;
struct sgttyb ttystuff;

ioctl(0,TIOCSPGRP,&waitpgrp); /* first so bsh can inherit tty if we leave */

if(wait3(&w.w_status,WUNTRACED|WNOHANG,0) == childpgrp) {

	if (WIFEXITED(w)) {

	/* The game has normally exited if we get here.
	We check tty speed; if it's zero the tty has been hung up
	(which can occur if game catches SIGHUP), so we blow away.
	Otherwise, we just exit, leaving a copy of the game exit status
	for anyone who cares. */

		ioctl(0,TIOCGETP,&ttystuff); /* get the tty description */
		if(ttystuff.sg_ispeed == B0) blowaway(); /* assume hup */
		else exit(w.w_status);
		}
	if (WIFSTOPPED(w)) {

	/* this is stopped condition, as in  ^Z, thus shell wants the
	reason passed back up to it. So we do it to ourself */

		if(w.w_stopsig == SIGHUP) blowaway();
		else kill(getpid(),w.w_stopsig); /* pass sig on */
		}
	else if (WIFSIGNALED(w))  {

	/* here the job has been terminated.
	(WIFSIGNALLED is a silly name for the condition)
	Shell wants to be told how it happened, so we do it to ourself */
	
		if(w.w_termsig == SIGHUP) blowaway();
		else kill(getpid(),w.w_termsig); /* pass sig on */
		}
}
return; /* after any results from signal */
}

blowaway()
{
killpg(childpgrp,SIGKILL); /* make sure game and friends are gone */
killpg(getppid(),SIGKILL); /* this time make shell (& friends?) go away */
exit(0); /* then leave quietly if we aren't already gone */
}

bstempleton (08/22/82)

The code published has some merit, but the following should be noted:

1) Does it setgid(getgid()) anywhere?  We don't want the game running
setgid to daemon or games if it doesn't know what it is doing.  For example,
it might create shells for people like rogue.

2) Our rogue hang problem had nothing to do with the process not getting
any cycles.  These hung guys were getting almost all the cycles in the machine.
One burned up around 7 hours of cpu until killed, and this on a 780.
They only responded to kill -9
We don't run rogue niced anyway.