ephraim@think.COM (ephraim vishniac) (04/06/88)
While writing low-level SCSI software for the Mac (which I don't do any more), I was continually caught by the conflicting needs for speed and safety. Fast software makes a minimal number of checks for unusual conditions; safe software allows the possibility of disaster at every step. The Apple SCSI software is a reasonably successful compromise: it runs with passable speed, but it can get hung up under unusual conditions. It turns out not to be necessary to sacrifice speed for safety. If you don't need to know about peculiar conditions quickly, you can easily make your code both quick and safe from hanging conditions. Here's the scoop. --------------------- Cut Here -------------------- How Not To Hang Vishniac - 9/29/87 This note describes a method by which device drivers and other low-level software can avoid "hanging" (looping indefinitely) without the overhead of constantly checking for unusual exit conditions. The method has two actors: an endangered task (in danger of hanging, that is); and a VBL task that guards it (the "watchdog"). The actors communicate through a data structure which includes an absolute JSR to the VBL code, the VBL task data, and two additional pointers: the stored PC pointer and the timeout code pointer. It looks like this: JSR xxxx ; absolute address of the VBL code is ; calculated when the VBL is installed VBL queue entry ; a standard item - see Inside Mac StoredPCPtr ; initially zero, use explained below TimeoutCodePtr ; initially zero, use explained below Before the watchdog is ever needed, the VBL task is installed with some long timeout (e.g., 2^16-1 ticks). The VBL code looks like this: POP.L A0 ; A0 = our data structure pointer ; (This is a very handy method for ; VBLs to find their data, since ; the VBL manager doesn't pass any ; useful parameters to us.) set long VBL count ; keep ourselves in the VBL queue MOVE.L TimeoutCodePtr(A0),D0 ; D0 = exit address, if any BEQ VBLdone ; exit if we're not needed here ; The endangered task is hung. Help him out. CLR.L TimeoutCodePtr(A0) ; Make it clear that we've been here MOVE.L StoredPCPtr(A0),A0 ; Pointer to stacked PC of hung task MOVE.L D0,(A0) ; Modify stacked PC to free task VBLdone RTS The endangered task has the following form: PetDog ; make sure the watchdog won't go off soon UnleashDog ; arm the watchdog task dangerous stuff ; we might hang here LeashDog ; disarm the watchdog task "PetDog" consists of setting an appropriate timeout in the VBL task. "UnleashDog" means set StoredPCPtr to -4(SP) and set TimeoutCodePtr to the address of the code to be executed in case of timeout. "LeashDog" is simply CLR.L TimeoutCodePtr. If the dangerous section involves a major loop (such as an unrolled loop of 8 or 16 SCSI buss transfers), one can use a relatively short timeout by petting the dog at the end of each iteration of the outer loop. Be careful, however, not to pet the dog while looping unproductively. Within the loop, one can use otherwise dangerous constructs such as waiting indefinitely for DRQ or REQ to be active. The "dangerous stuff" cannot use the stack freely. It cannot push data on the stack, but it can call safe (non-hanging) subroutines that don't take stacked parameters. As the astute reader has figured out by now, the watchdog does his stuff by replacing the return address of the hung task with the address of the timeout exit when a timeout occurs. On return from the timer interrupt, control goes to the timeout exit instead of returning to the hung task. Some nice features of this method are its low overhead and freedom from hardware dependencies. The VBL task need be installed only once in the life of the device handler. After that, it keeps itself alive with a very long timeout, imposing minimal cost. For correct operation, the PC of the interrupted code must be the first thing pushed in case of an interrupt. So far as I know, this is a safe assumption across all existing and planned members of the 680xx processor family. There's no other dependency on stack formats. ------------------------ Stop cutting here ------------------------ Ephraim Vishniac ephraim@think.com Thinking Machines Corporation / 245 First Street / Cambridge, MA 02142-1214 On two occasions I have been asked, "Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out?"