[comp.os.minix] cause_sig problems?

Leisner.Henr@xerox.com (Marty) (05/04/88)

I was just working on putting in Andy's deadlock fixes (posted 3/10/88) and I
wondered about something:

In kernel/system/cause_sig() the following code is executed:

  rp = proc_addr(proc_nr);
  if (rp->p_pending == 0) sig_procs++;	/* incr if a new proc is now pending */
  rp->p_pending |= 1 << (sig_nr - 1);
  inform(MM_PROC_NR);		/* see if MM is free */

Now fs does a sys_kill (which envokes the system task).  I typically use a
callgate to just trap into the kernel, as long as synchronization isn't a
problem.

However, it seems to me a number of tasks/processes can get into cause_sig from
different places (clock, tty and I'm calling it for SIGFPE, SIGSEGV and SIGILL).

The signals are unraveled via inform() which contains:
	if (rp->p_pending != 0) {
		m.m_type = KSIG;
		m.PROC1 = rp - proc - NR_TASKS;
		m.SIG_MAP = rp->p_pending;
		sig_procs--;
		if (mini_send(HARDWARE, proc_nr, &m) != OK) 
			panic("can't inform MM", NO_NUM);
		rp->p_pending = 0;	/* the ball is now in MM's court */
		return;
	}

It seems possible for signal's to get lost here.

It seems like manipulated p_pending and sig_procs should be treated like
critical sections, so I'm thinking of:

	lock();
	if (rp->p_pending == 0) sig_procs++;	/* incr if a new proc is now pending */
 	 rp->p_pending |= 1 << (sig_nr - 1);
	 unlock();

and
	lock();
	if (rp->p_pending != 0) {
		m.m_type = KSIG;
		m.PROC1 = rp - proc - NR_TASKS;
		m.SIG_MAP = rp->p_pending;
		sig_procs--;
		rp->p_pending = 0;	/* the ball is now in MM's court */
		unlock();
		if (mini_send(HARDWARE, proc_nr, &m) != OK) 
			panic("can't inform MM", NO_NUM);
		
		return;
	}
	unlock();
	
	
I haven't done an exhaustive analysis on the consequences, but something in
there just don't seem right.

The other alternative would be to have a cause_sig entry in the system task and
let the message passing allow only one at a time to be processed.  
Comments?

marty
ARPA:	leisner.henr@xerox.com
GV:  leisner.henr
NS:  martin leisner:henr801c:xerox
UUCP:	nsc!nscimg!amps!marty

brucee@runx.ips.oz (Bruce Evans) (05/08/88)

Leisner.Henr@xerox.com (Marty) writes:

> However, it seems to me a number of tasks/processes can get into cause_sig
> from
> different places (clock, tty and I'm calling it for SIGFPE, SIGSEGV and
> SIGILL).

> It seems like manipulated p_pending and sig_procs should be treated like
> critical sections, so I'm thinking of:

> [locks inside kernel/system/cause_sig() and kernel/system/inform()]

In the standard kernel, there is no problem with the call to cause_sig()
*itself* because
(1) It is only called by tasks (just clock and tty).
(2) The scheduler lets tasks run to completion.
(3) The scheduler does not use the critical variables (sig_procs and
	proc->p_pending).
For the same reasons, cause_sig() may be safely called from a hardware signal
handler which goes through interrupt() to a task (though a hardware signal
from inside the kernel is probably a bug).

Direct calls from mm and fs are not safe since these processes have lower
priorities than tasks and may be interrupted. Similarly, it is not safe
to introduce new task priorities (e.g., associated with multiple interrupt
priorites) without fixing the call to cause_sig(), and possibly every other
PUBLIC procedure in the kernel ...

The "itself" hedge above is because there seems to be a bug with the call to
mini_send() from inside inform(). In Marty's locking scheme (and the original
kernel), this seems to run with interrupts ENABLED. It is the only call to
mini_send() from outside proc.c. The calls from inside proc.c are properly
protected since they are inside interrupt routines which run entirely with
interrupts disabled.

I locked this call to mini_send() some time ago for completely different
reasons! (To allow most of the other explicit locks in proc.c to be removed,
where there was already an implicit lock.) I had been troubled by poor
response to the DEL key and fixed it without finding the exact cause. Maybe
this was it?

To summarise, there should be a lock()/restore() around the call to mini_send()
inside inform(). I don't like this as a final solution, because interrupts
may stay off too long. My kernel runs syscall() (and so many calls to
mini_send()) with interrupts enabled. Interrupts which occur while syscall()
is busy are treated specially. This does wonders for the serial driver. In a
perfect world, everything would run with interrupts enabled and be protected
by the tasking mechanism.

Bruce Evans
Internet: brucee@runx.ips.oz.au    UUCP: uunet!runx.ips.oz.au!brucee