[net.games.rogue] Rogue looping

bstempleton (08/12/82)

At our site, a 780 running Rogue 5.2, we have run into an odd problem
with Rogue.  Some strange bug is causing rogue to hang, furthermore in
such a way that only a kill -9 will get rid of it.  This means that
naive users have disconnected their terminals from the game, and let it
run racking up cpu.  A few of these racked up 12 hours of wasted cpu, so
Rogue has been taken off the system until a fix is released for this.

Does anybody else out there have any comment on this matter.  I think I
have a fix for the problem, but it is difficult to test.  There is some
bad interaction between rogue and the tty driver which my be caused by
people trying to exploit the suspend/hup bug detailed on the net.
It is possible to get rogue hung by use of this bug, but this may not be
the only way.

steve (08/12/82)

There is a way to stop people from disconnecting rogue and abusing scrolls. We
added 'signal(SIGTSTP, SIG_IGN)' to mach_dep.c so that the cntl-z interrupt
is ignored. It may help keep people from leaving a disconnected rogue hanging
on the system and for those who are not creative cheaters (read rascal-users
here) it will force them to play rogue the way it was intended to be played, as
a game not as a piece of software to be broken.

Steve Bramblett
decvax!ittvax!ittapp!steve

bstempleton (08/15/82)

Well, we have found one of the causes of the Rogue 5.2 cpu loop, and
it is a pretty nasty bug.  Rogue would just die if somebody tried it on
an Ann Arbour Ambassador terminal with the default aaa termcap we use
here.  The problem is that this termcap contains very long initialization
strings in the ti and te fields of the termcap.  For some reason a long
string here causes rogue to die completely, the first time you type
anything.  We fixed it by making sure that anybody who played rogue on
an aaa got their termtype switched to a special termcap for rogue only.

Ken or Curses people - you might want to look into this bug.

It was also possible to hang if you tried the old suspend/hup restore
trick.  Thus ctrl-z was disabled.

By the way, does anybody know if the ctrl-z trick was the bug talked about
when 5.2 came out that let you get  100 hit points at level 1?  Clearly
it can be done this way, just wait for a potion of raise level and
drink it as many times as you like.  Do this enough and you can wipe away
umber hulks in a swipe or two.

clives@sri-unix (08/19/82)

Rogue of any version can be apparently hung by anything which hogs I/O
- like more rogue games.  The same thing happens if there's a big job
on and rogue's running niced.

It tends to be a chain reaction; one runaway tty-less shell is
sufficient to lock up all rogue games, leading to punchouts.

The games don't go away with punchout because job control/tty
driver/shell bugs prevent the game from seeing the HUP.

The following program will eliminate almost all the trouble. The reason
it isn't 100 percent effective is that it's always possible to prevent
a niced job from getting *any* cycles - the 4.1 priority scheduler does
not work as advertised.

We have had no problems since implementing this except for the times
our Develcon data switch has become distracted and dropped terminals in
some cute manner which doesn't get the SIGHUP sent.

I hope 4.2 is better designed.


			Clive Steward
			Tektronix, Beaverton




/**************************************************************************

This program runs games niced at a given level, with dependable actions
upon hangup.

It begins an attempt to play a game by forking a new process, creating
a new process control group for it (and any future friends), then
passing the control terminal to that group. It then attempts to overlay
the game into being using an exec.

In the event that the game can't be executed, the reason is reported,
and the control terminal passed back to the waiting process. That
process is killed by a SIGINT, which lets the shell be properly
informed, but without enticing it to write extraneous messages to the
operator. The child dies of it's own accord. This subterfuge is used to
avoid a multitude of ways in which simple death of the child at this
point can produce a hang.

If the exec is successful, the game runs with the control terminal, and
the parent process sleeps, waiting for chances to housekeep as follows:

It passes signals other than SIGHUP which affect the game back up to
the shell, thus making job control perform as it should.  If the shell
tries to continue the process, the control terminal is repassed and the
game and friends restarted.

If the game quits because of an uncaught SIGHUP, this program detects
that and clears the shell, then exits itself.

If the game exits, it checks to see if the terminal still exists.  If
not, we're hung up, whatever the reason, and so leave cleanly just as
above. If the terminal is still there, we simply exit, passing the exit
status of the game up to the parent shell.

Whether or not the shell has finished it's job control process, if this
program is stopped, a hangup should hit either game or waiting process
depending on control terminal manipulation by the (c)shell. Either way,
clean exit will be arranged, which is something the shell doesn't
manage itself even on simple jobs if ^Z has been used (probably in
other situations as well).

Usage: games [gamename [gameargs]]

Compile: cc -O games.c -o games -ljobs

Installation: (Careful please!)

The games program operates setgid, leaving the uid set to the calling
user. It's done this way to maintain access control to the games, while
allowing the user to own all the running processes. This considerably
simplifies the use of signals, and means the user can see the processes
with an ordinary ps command, that w and ps -a* commands will show true
runners of games, and that if there is some screwup, the owner can
externally kill his own processes by sending signals to them.

There must be one consistent group owner for the games themselves, the
games/lib directory, and the lib contents.

The historic group owner seems to be daemon.

The games themselves must be group executable (710), the games/lib
directory must be 775, and the lib contents must permit the group to do
whatever the owner can. (It varies). All this is not presently nor
typically consistent, and will probably have to be checked on with each
new release.

After /usr/bin/games executable group ownership is set, it's permission
should be set to 2711 (setgid).

/usr/bin/games should be chown and chgrp to daemon.

Notas Bene:

This program uses V7 4.1 job control and process group functions
heavily.  It creates and manages a new process group much in the manner
of csh.  Thus it is likely to need recompilation with any new releases,
(it would have with 4.0 to 4.1), and may not work even then.

As mentioned above, the group ownerships of the games, directories, and
lib contents will probably have to be cleaned up with any new release
as well.

If a new release should provide signals and shells which handle hangups
cleanly under all conditions (i.e. kill all outstanding jobs, then die
without traces), then the games program could revert to the nice()
function followed by a simple execvp overlay of itself.

Clive Steward 23 April 1982

***************************************************************************/

#include <stdio.h> #include <signal.h> #include <wait.h> #include
<sys/ioctl.h> #include <sgtty.h>

int childpgrp,waitpgrp; /* external so onchild() can see */

main(argc,argv)

int argc; char *argv[];

{ int onchild(),blowaway();

char commandloc[256]; /* might as well be as big as line buffer */

if(argc == 1 ) {
	printf("These games are available:\n\n");
	system("ls /usr/games");
	printf("\nusage: games [name]\n\n");
	exit(1); } ++argv; /* start at argv[1], the name of the game */
nice(16); /* this nice level will be passed to game too */
strcpy(commandloc,"/usr/games/"); strcat(commandloc,*argv);
sigsys(SIGTTIN,SIG_IGN); /* don't want applesauce from pgrp changes,
ioctls */ sigsys(SIGTTOU,SIG_IGN); /* ditto */ sigsys(SIGCHLD,SIG_IGN);
/* for the moment, just to ward away stutters */ waitpgrp = getpgrp(0);
/* do it this way so prog will work with /bin/bsh too */ if((childpgrp
= fork()) == 0) {
	childpgrp = getpid(); /* number for a new process group */
	ioctl(0,TIOCSPGRP,&childpgrp); /* tell tty driver it's the one
first */
	setpgrp(0,childpgrp); /* then start using it */
	execvp(commandloc,argv); /* execvp also covers shell scripts */

	/* code below reached only if can't exec game. */

	perror("games"); /* tell the reason why not */

	ioctl(0,TIOCSPGRP,&waitpgrp); /* this first so bsh can inherit
tty */

	/* next line necessary to prevent lockup due to csh picking off
	returns to wait3 if we are not sole job in the job control
queue. */

	kill(getppid(),SIGINT); /* unannounced sure bump for waiting
parent */

	_exit(-1);
	}

if (childpgrp < 1)  { /* fork didn't happen */
	fprintf(stderr,"games: couldn't fork to start game.\n");
	exit(-1);
	}

sigsys(SIGHUP,blowaway); /* HUP here probably only if shell does job
control */

loop:  sigsys(SIGCHLD,onchild); /* this is the whole point right here
*/ pause(); /* sleep until we hear signal that something happened */
ioctl(0,TIOCSPGRP,&childpgrp); /* if we get back, give tty to game
again, */ killpg(childpgrp,SIGCONT); /* then restart game and any
friends */ goto loop; /* finally set up to rest again */ }

onchild() { union wait w; struct sgttyb ttystuff;

ioctl(0,TIOCSPGRP,&waitpgrp); /* first so bsh can inherit tty if we
leave */

if(wait3(&w.w_status,WUNTRACED|WNOHANG,0) == childpgrp) {

	if (WIFEXITED(w)) {

	/* The game has normally exited if we get here.
	We check tty speed; if it's zero the tty has been hung up
	(which can occur if game catches SIGHUP), so we blow away.
	Otherwise, we just exit, leaving a copy of the game exit status
	for anyone who cares. */

		ioctl(0,TIOCGETP,&ttystuff); /* get the tty description
*/
		if(ttystuff.sg_ispeed == B0) blowaway(); /* assume hup
*/
		else exit(w.w_status);
		}
	if (WIFSTOPPED(w)) {

	/* this is stopped condition, as in  ^Z, thus shell wants the
	reason passed back up to it. So we do it to ourself */

		if(w.w_stopsig == SIGHUP) blowaway();
		else kill(getpid(),w.w_stopsig); /* pass sig on */
		}
	else if (WIFSIGNALED(w))  {

	/* here the job has been terminated.
	(WIFSIGNALLED is a silly name for the condition)
	Shell wants to be told how it happened, so we do it to ourself
*/

		if(w.w_termsig == SIGHUP) blowaway();
		else kill(getpid(),w.w_termsig); /* pass sig on */
		} } return; /* after any results from signal */ }

blowaway() { killpg(childpgrp,SIGKILL); /* make sure game and friends
are gone */ killpg(getppid(),SIGKILL); /* this time make shell (&
friends?) go away */ exit(0); /* then leave quietly if we aren't
already gone */ }