[comp.unix.wizards] wakeup

tron@mrecvax.UUCP (Carlos Mendioroz) (11/25/88)

A few days ago, I posted an article describing a problem
with a driver that didn't return from a sleep(), though
the call to wakeup was performed.

The process remained in a curious state. It was reported by
ps as runnable (no sleep chan), but the run list pointer
was 0. It also didn't respond to any signal.

A point to note is that the driver's interrupt routine 
priority is 7, and that this routine is calling wakeup()
to awaken the sleeping process.

Theory:
Sleep & wakeup both call spl6() to ensure secure access to
the process queues, (Well, this is not theory, the calls are
there...) and it is possible for the driver's device to 
interrupt as a wakeup() is running, isn't it ?

As this driver (as SCO xenix serial driver) is running with
prio 7, it's not blocked by spl6() and then it may interfere
with the running wakeup by, say, runnig another wakeup.

Solution: (?)
Not to call any wakeup at prio 7, that is, put every driver
interrupt routine that calls wakeup at prio 6 or below.

I would be glad to hear some guru opinion on the topic.
-- 
Carlos G. Mendioroz  <tron@mrecvax.mrec.ar>  
Work: +54 (1) 313-8082  Fax: +54 (1) 311-1249
Home: +54 (1) 71-3473 ; Malabia 2659 11 B, Buenos Aires, 1425 ARGENTINA

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (11/26/88)

In article <455@mrecvax.UUCP> tron@mrecvax.UUCP (Carlos Mendioroz) writes:
>Theory:
>Sleep & wakeup both call spl6() to ensure secure access to
>the process queues, (Well, this is not theory, the calls are
>there...) and it is possible for the driver's device to 
>interrupt as a wakeup() is running, isn't it ?
>
>As this driver (as SCO xenix serial driver) is running with
>prio 7, it's not blocked by spl6() and then it may interfere
>with the running wakeup by, say, runnig another wakeup.
>
>Solution: (?)
>Not to call any wakeup at prio 7, that is, put every driver
>interrupt routine that calls wakeup at prio 6 or below.
>
>I would be glad to hear some guru opinion on the topic.

This problem does exist when writing drivers under SCO Xenix. Specifically
things like:

	somewhere in driver, task side, eg read

		x = spl(5);
		state |= IAmAsleep;
		sleep(...)
		splx(x);

	somewhere in interrupt routine, ie received interrupt

		if (state&IAmAsleep) {
			state &= ~IAmAsleep;
			wakeup(...)
		}

The problem arises when the interrupt has a priority higher than the
protection level. There is a small window between when the task routine sets
the flag and actually performs the sleep call and is placed on the queue. If
the interrupt occurs there it see's the flag set and calls wakeup *before*
the process actually goes to sleep and because the flag gets reset nobody
ever try's to wake him up again.

SCO's line discipline routines protect things at spl5. While serial interrupts 
are handled at spl7 and the clock (and all poll routines) run at spl6.

The fix for this is to periodically (from a poll routine for example)
perform the wakeup call. This typically works because the task side are
typcially coded like:

		while ( CheckForCondition()) {
			state &=...
			wakeup(...)
		)

So periodically waking them up does no harm, just is slightly inefficent.

Don't know if this is the problem that you are encountering, and if it is it
would be hard to fix without access to source. It would be possible to add a
special driver which just had a poll routine which did a wakeup() on the 
appropriate variable address if you can figure out what it is.

-- 
Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl     Vancouver,BC,604-937-7532

chris@mimsy.UUCP (Chris Torek) (11/26/88)

In article <1974@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:
>This problem does exist when writing drivers under SCO Xenix. ...
>SCO's line discipline routines protect things at spl5. While serial interrupts 
>are handled at spl7 and the clock (and all poll routines) run at spl6.

This is grotesque.  There are three `good' ways to fix it:

1. Change the hardware.

2. Have the polling code test at IPL7.

3. Use the hardware interrupt to schedule a software interrupt, where
   the software interrupt is at a priority not more than IPL5.  If
   appropriate (for serial port efficiency reasons), use a multi-level
   scheme a la DZ `pseudo-dma' on VAXen.  Also (again for efficiency),
   end the hardware interrupt with the sequence

	if (upon_return_ipl_will_be() > IPL5) {
		/* must be running off some higher priority code */
		schedule_software_interrupt(params);
	} else {
		/* just lower priority and go */
		(void) spl5();
		do_tty_interrupt(params);
	}
	return;

   where the software interrupt handler is

	(void) spl5();
	do_tty_interrupt(params);
	(void) splsoftint();

   The spl5()/splsoftint() is necessary iff software interrupts run
   below ipl5 (preferable) rather than at ipl5.  If software interrupts
   always run higher than ipl5, this scheme is pointless.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (11/26/88)

In article <14722@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In article <1974@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:
>>This problem does exist when writing drivers under SCO Xenix. ...
>>SCO's line discipline routines protect things at spl5. While serial interrupts 
>>are handled at spl7 and the clock (and all poll routines) run at spl6.
>
>This is grotesque.  There are three `good' ways to fix it:

I agree. I didn't say I liked it. Thats the way that it works on the (almost
current) version of SCO 286/386 that I've been working with.

Unfortunately you're stuck with being unable to change the spl protection in
the line discipline routines, or safely changing the interrupt level of the
clock (and therefore poll) and if you want to ensure that you don't loose
serial interrupts, running them at spl7.

By rights SCO's scheme as designed would work well if they simply raised the
spl level in the line discipline routines to spl6. And you make the
assumption/design criteria the the interrupt routines don't access any
shared data strutures. I.e. for serial drivers only filling/emptying a
private ring buffer. 

In point of fact even as is the end result is pretty bomb proof if inelegant
and slightly inefficent.


-- 
Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl     Vancouver,BC,604-937-7532

lars@myab.se (Lars Pensj|) (11/29/88)

In article <1974@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:
>		x = spl(5);
>		state |= IAmAsleep;
>		sleep(...)
>		splx(x);

Just a note about this. It should be

		x = spl(5);
		state |= IAmAsleep;
!		while (state & IAmAsleep)
!		    sleep(...);
		splx(x);

because sleep() may wake through other wakeup() with unknown codes.

--------------
Lars Pensj|
lars@myab.se
-- 
    Lars Pensj|
    lars@myab.se

der@sfmag.UUCP (D.Rorke) (12/01/88)

> 
> A few days ago, I posted an article describing a problem
> with a driver that didn't return from a sleep(), though
> the call to wakeup was performed.
> 
> The process remained in a curious state. It was reported by
> ps as runnable (no sleep chan), but the run list pointer
> was 0. It also didn't respond to any signal.
> 
> A point to note is that the driver's interrupt routine 
> priority is 7, and that this routine is calling wakeup()
> to awaken the sleeping process.
> 
> Theory:
> Sleep & wakeup both call spl6() to ensure secure access to
> the process queues, (Well, this is not theory, the calls are
> there...) and it is possible for the driver's device to 
> interrupt as a wakeup() is running, isn't it ?
> 
> As this driver (as SCO xenix serial driver) is running with
> prio 7, it's not blocked by spl6() and then it may interfere
> with the running wakeup by, say, runnig another wakeup.
> 
> Solution: (?)
> Not to call any wakeup at prio 7, that is, put every driver
> interrupt routine that calls wakeup at prio 6 or below.
> 
> I would be glad to hear some guru opinion on the topic.


Arghh.

Most current implementations of wakeup() are not reentrant.  I assume
yours is not if it's bothering to do an spl6().  Non reentrant
wakeup() implementations should set the interrupt level to the highest
possible level while executing in a critical section.  As you noted,
a wakeup() that sets some intermediate interrupt level can be 
interrupted by an interrupt handler that could potentially call wakeup().
This could cause the problem you observed.  It could also panic your
system if the sleep queues are implemented as linked lists.

The solution you propose above is OK but a better solution (if you have
source) is to fix wakeup() to set the highest interrupt level supported
by the hardware.


Dave Rorke
attunix!der


> -- 
> Carlos G. Mendioroz  <tron@mrecvax.mrec.ar>  
> Work: +54 (1) 313-8082  Fax: +54 (1) 311-1249
> Home: +54 (1) 71-3473 ; Malabia 2659 11 B, Buenos Aires, 1425 ARGENTINA

*** REPLACE THIS LINE WITH YOUR MESSAGE ***

der@sfmag.UUCP (D.Rorke) (12/01/88)

> > Theory:
> > Sleep & wakeup both call spl6() to ensure secure access to
> > the process queues, (Well, this is not theory, the calls are
> > there...) and it is possible for the driver's device to 
> > interrupt as a wakeup() is running, isn't it ?
> > 
> > As this driver (as SCO xenix serial driver) is running with
> > prio 7, it's not blocked by spl6() and then it may interfere
> > with the running wakeup by, say, runnig another wakeup.
> > 
> > Solution: (?)
> > Not to call any wakeup at prio 7, that is, put every driver
> > interrupt routine that calls wakeup at prio 6 or below.
> > 
> > I would be glad to hear some guru opinion on the topic.
> 
> 
> Arghh.
> 
> Most current implementations of wakeup() are not reentrant.  I assume
> yours is not if it's bothering to do an spl6().  Non reentrant
> wakeup() implementations should set the interrupt level to the highest
> possible level while executing in a critical section.  As you noted,
> a wakeup() that sets some intermediate interrupt level can be 
> interrupted by an interrupt handler that could potentially call wakeup().
> This could cause the problem you observed.  It could also panic your
> system if the sleep queues are implemented as linked lists.
> 
> The solution you propose above is OK but a better solution (if you have
> source) is to fix wakeup() to set the highest interrupt level supported
> by the hardware.
> 
> 
> Dave Rorke
> attunix!der
> 
> 
> > -- 
> > Carlos G. Mendioroz  <tron@mrecvax.mrec.ar>  
> > Work: +54 (1) 313-8082  Fax: +54 (1) 311-1249
> > Home: +54 (1) 71-3473 ; Malabia 2659 11 B, Buenos Aires, 1425 ARGENTINA
> 


A clarification of my response above.  I said that the solution
that Carlos proposed was OK.  It is not however, sufficient to simply
set the interrupt priority level low before calling wakeup in the
interrupt routine, if the device interrupts at a level greater than
the level set in wakeup().  For example, in the case he cites above
you couldn't just set the interrupt level down to 6 at the beginning
of the serial driver interrupt routine if the hardware interrupt
actually comes in at level 7.  This won't solve the problem and
can create additional problems because the interrupt routine itself
may not be reentrant, and setting the level of the interrupt routine
lower than the level of the corresponding hardware interrupt could
cause the interrupt routine to be re-entered.

What you can do (assuming you can't fix wakeup) is make sure you
don't have any devices configured on your system which:


a) interrupt at a level greater than the level set in wakeup()

and
 
b) invoke interrupt handlers that call wakeup()


Of course similar problems can exist with any kernel function
that manipulates global data without blocking all interrupts
that could result in interrupt handlers manipulating the
same global data.


Dave Rorke
attunix!der

chapman@sco.COM (Brian Chapman) (12/01/88)

In article <455@mrecvax.UUCP> tron@mrecvax.UUCP (Carlos Mendioroz) writes:
< 
< Sleep & wakeup both call spl6() to ensure secure access to
< the process queues, (Well, this is not theory, the calls are
< there...) and it is possible for the driver's device to 
< interrupt as a wakeup() is running, isn't it ?
< 
< As this driver (as SCO xenix serial driver) is running with
< prio 7, it's not blocked by spl6() and then it may interfere
< with the running wakeup by, say, runnig another wakeup.
< 
< Solution: (?)
< Not to call any wakeup at prio 7, that is, put every driver
< interrupt routine that calls wakeup at prio 6 or below.

BTW just in case anyone is wondering, SCO's serial driver
does *NOT* call wakeup from it's level 7 interrupt routine.
-- 
	Pay no attention to the man behind the curtain.
Brian Chapman		uunet!sco!chapman	SCO UNIX 3.2 Development

karl@ddsw1.MCS.COM (Karl Denninger) (12/03/88)

In article <602@sysco> chapman@sco.COM (Brian Chapman) writes:
>
>BTW just in case anyone is wondering, SCO's serial driver
>does *NOT* call wakeup from it's level 7 interrupt routine.

BTW: Just in case anyone cares, the reason people are investigating
     alternate drivers for SCO Xenix and non-intelligent boards is that the
     "stock" driver provided, while quite adaptable (and possessed with good
     overall performance): 

	1) Has never managed to get bi-directional ports w/internal locking
	   working (ala: open "non modem" port, modem port is blocked, etc.
	   It's been described here and is done by many other firms; email
	   me for full details).  Thus we have to use brain-dead schemes which 
	   work (some of the time) like ungetty, and I have to deal with
	   customers (and our gear) that has deadlocked lines once in a while.

	2) Has a broken RTS flow control capability.

(No flames for the "non-standard" in RS232.  Broken in the above means the 
 Telebit won't talk to it when RTSFLOW is selected; you get low RTS and 
 that's all folks!)

This (and the inclusion of '286-compiled utilities!) is really the only 
"nasty" complaint I have had with your software in the year or so I've been 
running it.  Other minor complaints include locking discrepancies with the 
SVID, but we've worked around those (and they're addressed in 2.3, from what 
I hear).

If SCO would fix the serial driver problems, and offer a reasonably-
inexpensive (ie: not a $600 update-the-entire-system fix) way to get the 
fix (aka: the "fix disks" you guys do now) then the stock driver would be 
completely adaquate.  If you additionally enabled FIFOs if present on the 
UARTs you'd have a bombshell.

At present, the 2.2.x /386 serial support out of the box is only passable
without external augmentation (ie: smart board w/driver).  It appears that
some internal design decisions in the kernel make it more difficult to get
reasonable operation from drivers that must operate at high priority as
well; note though that I'm only a decent driver-hacker (as opposed to a guru).  We have had some interesting problems playing with replacements for the
stock stuff - _nothing_ is reliable enough for me to release yet (it's not
our code; but we're hacking on it).

We've not seen 2.3 yet; the last time I checked an update from 2.2.x was not
available, nor was cost of same, although I could obtain a "new" 2.3 (at
full cost, of course).

--
Karl Denninger (karl@ddsw1.MCS.COM, ddsw1!karl)
Data: [+1 312 566-8912], Voice: [+1 312 566-8910]
Macro Computer Solutions, Inc.    	"Quality solutions at a fair price"