Leisner.Henr@xerox.com (Marty) (05/04/88)
I was just working on putting in Andy's deadlock fixes (posted 3/10/88) and I wondered about something: In kernel/system/cause_sig() the following code is executed: rp = proc_addr(proc_nr); if (rp->p_pending == 0) sig_procs++; /* incr if a new proc is now pending */ rp->p_pending |= 1 << (sig_nr - 1); inform(MM_PROC_NR); /* see if MM is free */ Now fs does a sys_kill (which envokes the system task). I typically use a callgate to just trap into the kernel, as long as synchronization isn't a problem. However, it seems to me a number of tasks/processes can get into cause_sig from different places (clock, tty and I'm calling it for SIGFPE, SIGSEGV and SIGILL). The signals are unraveled via inform() which contains: if (rp->p_pending != 0) { m.m_type = KSIG; m.PROC1 = rp - proc - NR_TASKS; m.SIG_MAP = rp->p_pending; sig_procs--; if (mini_send(HARDWARE, proc_nr, &m) != OK) panic("can't inform MM", NO_NUM); rp->p_pending = 0; /* the ball is now in MM's court */ return; } It seems possible for signal's to get lost here. It seems like manipulated p_pending and sig_procs should be treated like critical sections, so I'm thinking of: lock(); if (rp->p_pending == 0) sig_procs++; /* incr if a new proc is now pending */ rp->p_pending |= 1 << (sig_nr - 1); unlock(); and lock(); if (rp->p_pending != 0) { m.m_type = KSIG; m.PROC1 = rp - proc - NR_TASKS; m.SIG_MAP = rp->p_pending; sig_procs--; rp->p_pending = 0; /* the ball is now in MM's court */ unlock(); if (mini_send(HARDWARE, proc_nr, &m) != OK) panic("can't inform MM", NO_NUM); return; } unlock(); I haven't done an exhaustive analysis on the consequences, but something in there just don't seem right. The other alternative would be to have a cause_sig entry in the system task and let the message passing allow only one at a time to be processed. Comments? marty ARPA: leisner.henr@xerox.com GV: leisner.henr NS: martin leisner:henr801c:xerox UUCP: nsc!nscimg!amps!marty
brucee@runx.ips.oz (Bruce Evans) (05/08/88)
Leisner.Henr@xerox.com (Marty) writes: > However, it seems to me a number of tasks/processes can get into cause_sig > from > different places (clock, tty and I'm calling it for SIGFPE, SIGSEGV and > SIGILL). > It seems like manipulated p_pending and sig_procs should be treated like > critical sections, so I'm thinking of: > [locks inside kernel/system/cause_sig() and kernel/system/inform()] In the standard kernel, there is no problem with the call to cause_sig() *itself* because (1) It is only called by tasks (just clock and tty). (2) The scheduler lets tasks run to completion. (3) The scheduler does not use the critical variables (sig_procs and proc->p_pending). For the same reasons, cause_sig() may be safely called from a hardware signal handler which goes through interrupt() to a task (though a hardware signal from inside the kernel is probably a bug). Direct calls from mm and fs are not safe since these processes have lower priorities than tasks and may be interrupted. Similarly, it is not safe to introduce new task priorities (e.g., associated with multiple interrupt priorites) without fixing the call to cause_sig(), and possibly every other PUBLIC procedure in the kernel ... The "itself" hedge above is because there seems to be a bug with the call to mini_send() from inside inform(). In Marty's locking scheme (and the original kernel), this seems to run with interrupts ENABLED. It is the only call to mini_send() from outside proc.c. The calls from inside proc.c are properly protected since they are inside interrupt routines which run entirely with interrupts disabled. I locked this call to mini_send() some time ago for completely different reasons! (To allow most of the other explicit locks in proc.c to be removed, where there was already an implicit lock.) I had been troubled by poor response to the DEL key and fixed it without finding the exact cause. Maybe this was it? To summarise, there should be a lock()/restore() around the call to mini_send() inside inform(). I don't like this as a final solution, because interrupts may stay off too long. My kernel runs syscall() (and so many calls to mini_send()) with interrupts enabled. Interrupts which occur while syscall() is busy are treated specially. This does wonders for the serial driver. In a perfect world, everything would run with interrupts enabled and be protected by the tasking mechanism. Bruce Evans Internet: brucee@runx.ips.oz.au UUCP: uunet!runx.ips.oz.au!brucee