rws@mit-bold.ARPA (04/28/84)
From: "Robert W. Scheifler" <rws@mit-bold.ARPA> Description: We have been developing an in-process debugger, using the VAX BPT instruction and the trace bit mechanism, with the signals handled in the same (user) process. This works fine by itself. We are also using keyboard generated SIGQUITs to interrupt the program and get the debugger's attention, so the user can poke around and then ultimately continue execution. This also works fine by itself. However, combining the two mechanisms got us into trouble. The basic problem is that SIGTRAP needs to be handled synchronously, but SIGQUIT (and others) can preempt it. Signals are not handled on a first-come-first server basis, but on a "find-first-set" basis, which means lowest signal number first. So the scenario is this. A BPT trap takes place, and a psignal(SIGTRAP) takes place in trap(). Just then the user types the quit character, and you eventually get to ttyinput(), which does a gsignal(SIGQUIT). Now we continue on inside trap(), doing "if (ISSIG(p)) psig()", and the signal chosen is SIGQUIT, surprise. So we hack the stack for SIGQUIT, and go off to the first instruction of the signal trampoline code. However, the psignal() back when did an aston(), so at this point we take the AST, and we are back in trap() doing another "if (ISSIG(p)) psig()", and so we hack the stack for SIGTRAP, only now, lo and behold, the PC is no longer at the BPT instruction, but at the start of the signal trampoline code instead, which is mighty confusing. But, you say, the solution is of course to mask out SIGTRAP inside of SIGQUIT. But, I say, there are two problems with this. The first, which I can live with, is that then the SIGQUIT handler can't be debugged. The bigger problem is that it still doesn't work. There is an "extraneous" REI in the signal trampoline code (that I have complained about before for a different reason). This REI is executed on the way out of a handler, and is a one instruction bridge back to user code that gets executed WITHOUT the signal mask defined by the handler. So even if you mask SIGTRAP inside SIGQUIT, you simply change the PC at the time of the SIGTRAP to be at the REI rather than the CALLS in the trampoline code. Our solution to this problem was to notice that, if the SIGTRAP handler does nothing, the BPT instruction will be executed again and we will get another trap. So, we don't mask SIGTRAP inside SIGQUIT, and in the SIGTRAP handler we check the PC, and if it's in the trampoline code, we just return and let the BPT execute again. Having taken a BPT, we need to reinstall the actual instruction, execute it using the T-bit, and then reinsert the BPT instruction. Once again, the PC you get in the SIGTRAP handler can be bogus. Just returning won't work, however, because the T-bit has been cleared and you won't get another trap. Fortunately, the debugger can know it is expecting a T-bit trap, and can save away the correct PC, and ignore the PC reported by the kernel. Actually, as it turns out, you CAN get multiple SIGTRAPs from setting the T-bit. I don't think this was intended. Fix is provided below. Repeat-By: See above. Fix: In trap(), in trap.c, change case T_TRCTRAP+USER: /* trace trap */ locr0[PS] &= PSL_T; to case T_TRCTRAP+USER: /* trace trap */ locr0[PS] &= ~(PSL_T|PSL_TP); In sendsig(), in machdep.c, change: regs[PS] &= ~(PSL_CM|PSL_FPD); to regs[PS] &= ~(PSL_CM|PSL_FPD|PSL_T|PSL_TP); I didn't bother to figure out if both changes are necessary, but they can't hurt.