[comp.sys.mac.programmer] Getting Rid of Your Hangups

ephraim@think.COM (ephraim vishniac) (04/06/88)

While writing low-level SCSI software for the Mac (which I don't do
any more), I was continually caught by the conflicting needs for speed
and safety.  Fast software makes a minimal number of checks for
unusual conditions; safe software allows the possibility of disaster
at every step.  The Apple SCSI software is a reasonably successful
compromise: it runs with passable speed, but it can get hung up under
unusual conditions.

It turns out not to be necessary to sacrifice speed for safety.  If
you don't need to know about peculiar conditions quickly, you can
easily make your code both quick and safe from hanging conditions.
Here's the scoop.

--------------------- Cut Here --------------------

			How Not To Hang
			Vishniac - 9/29/87

This note describes a method by which device drivers and other
low-level software can avoid "hanging" (looping indefinitely) without
the overhead of constantly checking for unusual exit conditions.  The
method has two actors: an endangered task (in danger of hanging, that
is); and a VBL task that guards it (the "watchdog").  The actors
communicate through a data structure which includes an absolute JSR to
the VBL code, the VBL task data, and two additional pointers: the
stored PC pointer and the timeout code pointer.  It looks like this:

	JSR	xxxx		; absolute address of the VBL code is
				; calculated when the VBL is installed
	VBL queue entry		; a standard item - see Inside Mac
	StoredPCPtr		; initially zero, use explained below
	TimeoutCodePtr		; initially zero, use explained below

Before the watchdog is ever needed, the VBL task is installed with
some long timeout (e.g., 2^16-1 ticks).  The VBL code looks like this:

	POP.L	A0		; A0 = our data structure pointer
				; (This is a very handy method for
				;  VBLs to find their data, since
				;  the VBL manager doesn't pass any
				;  useful parameters to us.)
	set long VBL count	; keep ourselves in the VBL queue
	MOVE.L	TimeoutCodePtr(A0),D0
				; D0 = exit address, if any
	BEQ	VBLdone		; exit if we're not needed here

	; The endangered task is hung.  Help him out.
	CLR.L	TimeoutCodePtr(A0)
				; Make it clear that we've been here
	MOVE.L	StoredPCPtr(A0),A0
				; Pointer to stacked PC of hung task
	MOVE.L	D0,(A0)		; Modify stacked PC to free task
VBLdone
	RTS

The endangered task has the following form:

	PetDog			; make sure the watchdog won't go off soon
	UnleashDog		; arm the watchdog task
	dangerous stuff		; we might hang here
	LeashDog		; disarm the watchdog task

"PetDog" consists of setting an appropriate timeout in the VBL task.
"UnleashDog" means set StoredPCPtr to -4(SP) and set TimeoutCodePtr to
the address of the code to be executed in case of timeout.  "LeashDog"
is simply CLR.L TimeoutCodePtr.  If the dangerous section involves a
major loop (such as an unrolled loop of 8 or 16 SCSI buss transfers),
one can use a relatively short timeout by petting the dog at the end
of each iteration of the outer loop.  Be careful, however, not to pet
the dog while looping unproductively.  Within the loop, one can use
otherwise dangerous constructs such as waiting indefinitely for DRQ or
REQ to be active.

The "dangerous stuff" cannot use the stack freely.  It cannot push
data on the stack, but it can call safe (non-hanging) subroutines that
don't take stacked parameters.  As the astute reader has figured out
by now, the watchdog does his stuff by replacing the return address of
the hung task with the address of the timeout exit when a timeout
occurs.  On return from the timer interrupt, control goes to the
timeout exit instead of returning to the hung task.

Some nice features of this method are its low overhead and freedom
from hardware dependencies.  The VBL task need be installed only once
in the life of the device handler.  After that, it keeps itself alive
with a very long timeout, imposing minimal cost.  For correct
operation, the PC of the interrupted code must be the first thing
pushed in case of an interrupt.  So far as I know, this is a safe
assumption across all existing and planned members of the 680xx
processor family.  There's no other dependency on stack formats.

------------------------ Stop cutting here ------------------------


Ephraim Vishniac					  ephraim@think.com
Thinking Machines Corporation / 245 First Street / Cambridge, MA 02142-1214

     On two occasions I have been asked, "Pray, Mr. Babbage, if you put
     into the machine wrong figures, will the right answers come out?"