tron@mrecvax.UUCP (Carlos Mendioroz) (11/25/88)
A few days ago, I posted an article describing a problem with a driver that didn't return from a sleep(), though the call to wakeup was performed. The process remained in a curious state. It was reported by ps as runnable (no sleep chan), but the run list pointer was 0. It also didn't respond to any signal. A point to note is that the driver's interrupt routine priority is 7, and that this routine is calling wakeup() to awaken the sleeping process. Theory: Sleep & wakeup both call spl6() to ensure secure access to the process queues, (Well, this is not theory, the calls are there...) and it is possible for the driver's device to interrupt as a wakeup() is running, isn't it ? As this driver (as SCO xenix serial driver) is running with prio 7, it's not blocked by spl6() and then it may interfere with the running wakeup by, say, runnig another wakeup. Solution: (?) Not to call any wakeup at prio 7, that is, put every driver interrupt routine that calls wakeup at prio 6 or below. I would be glad to hear some guru opinion on the topic. -- Carlos G. Mendioroz <tron@mrecvax.mrec.ar> Work: +54 (1) 313-8082 Fax: +54 (1) 311-1249 Home: +54 (1) 71-3473 ; Malabia 2659 11 B, Buenos Aires, 1425 ARGENTINA
sl@van-bc.UUCP (pri=-10 Stuart Lynne) (11/26/88)
In article <455@mrecvax.UUCP> tron@mrecvax.UUCP (Carlos Mendioroz) writes: >Theory: >Sleep & wakeup both call spl6() to ensure secure access to >the process queues, (Well, this is not theory, the calls are >there...) and it is possible for the driver's device to >interrupt as a wakeup() is running, isn't it ? > >As this driver (as SCO xenix serial driver) is running with >prio 7, it's not blocked by spl6() and then it may interfere >with the running wakeup by, say, runnig another wakeup. > >Solution: (?) >Not to call any wakeup at prio 7, that is, put every driver >interrupt routine that calls wakeup at prio 6 or below. > >I would be glad to hear some guru opinion on the topic. This problem does exist when writing drivers under SCO Xenix. Specifically things like: somewhere in driver, task side, eg read x = spl(5); state |= IAmAsleep; sleep(...) splx(x); somewhere in interrupt routine, ie received interrupt if (state&IAmAsleep) { state &= ~IAmAsleep; wakeup(...) } The problem arises when the interrupt has a priority higher than the protection level. There is a small window between when the task routine sets the flag and actually performs the sleep call and is placed on the queue. If the interrupt occurs there it see's the flag set and calls wakeup *before* the process actually goes to sleep and because the flag gets reset nobody ever try's to wake him up again. SCO's line discipline routines protect things at spl5. While serial interrupts are handled at spl7 and the clock (and all poll routines) run at spl6. The fix for this is to periodically (from a poll routine for example) perform the wakeup call. This typically works because the task side are typcially coded like: while ( CheckForCondition()) { state &=... wakeup(...) ) So periodically waking them up does no harm, just is slightly inefficent. Don't know if this is the problem that you are encountering, and if it is it would be hard to fix without access to source. It would be possible to add a special driver which just had a poll routine which did a wakeup() on the appropriate variable address if you can figure out what it is. -- Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl Vancouver,BC,604-937-7532
chris@mimsy.UUCP (Chris Torek) (11/26/88)
In article <1974@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes: >This problem does exist when writing drivers under SCO Xenix. ... >SCO's line discipline routines protect things at spl5. While serial interrupts >are handled at spl7 and the clock (and all poll routines) run at spl6. This is grotesque. There are three `good' ways to fix it: 1. Change the hardware. 2. Have the polling code test at IPL7. 3. Use the hardware interrupt to schedule a software interrupt, where the software interrupt is at a priority not more than IPL5. If appropriate (for serial port efficiency reasons), use a multi-level scheme a la DZ `pseudo-dma' on VAXen. Also (again for efficiency), end the hardware interrupt with the sequence if (upon_return_ipl_will_be() > IPL5) { /* must be running off some higher priority code */ schedule_software_interrupt(params); } else { /* just lower priority and go */ (void) spl5(); do_tty_interrupt(params); } return; where the software interrupt handler is (void) spl5(); do_tty_interrupt(params); (void) splsoftint(); The spl5()/splsoftint() is necessary iff software interrupts run below ipl5 (preferable) rather than at ipl5. If software interrupts always run higher than ipl5, this scheme is pointless. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
sl@van-bc.UUCP (pri=-10 Stuart Lynne) (11/26/88)
In article <14722@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes: >In article <1974@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes: >>This problem does exist when writing drivers under SCO Xenix. ... >>SCO's line discipline routines protect things at spl5. While serial interrupts >>are handled at spl7 and the clock (and all poll routines) run at spl6. > >This is grotesque. There are three `good' ways to fix it: I agree. I didn't say I liked it. Thats the way that it works on the (almost current) version of SCO 286/386 that I've been working with. Unfortunately you're stuck with being unable to change the spl protection in the line discipline routines, or safely changing the interrupt level of the clock (and therefore poll) and if you want to ensure that you don't loose serial interrupts, running them at spl7. By rights SCO's scheme as designed would work well if they simply raised the spl level in the line discipline routines to spl6. And you make the assumption/design criteria the the interrupt routines don't access any shared data strutures. I.e. for serial drivers only filling/emptying a private ring buffer. In point of fact even as is the end result is pretty bomb proof if inelegant and slightly inefficent. -- Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl Vancouver,BC,604-937-7532
lars@myab.se (Lars Pensj|) (11/29/88)
In article <1974@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes: > x = spl(5); > state |= IAmAsleep; > sleep(...) > splx(x); Just a note about this. It should be x = spl(5); state |= IAmAsleep; ! while (state & IAmAsleep) ! sleep(...); splx(x); because sleep() may wake through other wakeup() with unknown codes. -------------- Lars Pensj| lars@myab.se -- Lars Pensj| lars@myab.se
der@sfmag.UUCP (D.Rorke) (12/01/88)
> > A few days ago, I posted an article describing a problem > with a driver that didn't return from a sleep(), though > the call to wakeup was performed. > > The process remained in a curious state. It was reported by > ps as runnable (no sleep chan), but the run list pointer > was 0. It also didn't respond to any signal. > > A point to note is that the driver's interrupt routine > priority is 7, and that this routine is calling wakeup() > to awaken the sleeping process. > > Theory: > Sleep & wakeup both call spl6() to ensure secure access to > the process queues, (Well, this is not theory, the calls are > there...) and it is possible for the driver's device to > interrupt as a wakeup() is running, isn't it ? > > As this driver (as SCO xenix serial driver) is running with > prio 7, it's not blocked by spl6() and then it may interfere > with the running wakeup by, say, runnig another wakeup. > > Solution: (?) > Not to call any wakeup at prio 7, that is, put every driver > interrupt routine that calls wakeup at prio 6 or below. > > I would be glad to hear some guru opinion on the topic. Arghh. Most current implementations of wakeup() are not reentrant. I assume yours is not if it's bothering to do an spl6(). Non reentrant wakeup() implementations should set the interrupt level to the highest possible level while executing in a critical section. As you noted, a wakeup() that sets some intermediate interrupt level can be interrupted by an interrupt handler that could potentially call wakeup(). This could cause the problem you observed. It could also panic your system if the sleep queues are implemented as linked lists. The solution you propose above is OK but a better solution (if you have source) is to fix wakeup() to set the highest interrupt level supported by the hardware. Dave Rorke attunix!der > -- > Carlos G. Mendioroz <tron@mrecvax.mrec.ar> > Work: +54 (1) 313-8082 Fax: +54 (1) 311-1249 > Home: +54 (1) 71-3473 ; Malabia 2659 11 B, Buenos Aires, 1425 ARGENTINA *** REPLACE THIS LINE WITH YOUR MESSAGE ***
der@sfmag.UUCP (D.Rorke) (12/01/88)
> > Theory: > > Sleep & wakeup both call spl6() to ensure secure access to > > the process queues, (Well, this is not theory, the calls are > > there...) and it is possible for the driver's device to > > interrupt as a wakeup() is running, isn't it ? > > > > As this driver (as SCO xenix serial driver) is running with > > prio 7, it's not blocked by spl6() and then it may interfere > > with the running wakeup by, say, runnig another wakeup. > > > > Solution: (?) > > Not to call any wakeup at prio 7, that is, put every driver > > interrupt routine that calls wakeup at prio 6 or below. > > > > I would be glad to hear some guru opinion on the topic. > > > Arghh. > > Most current implementations of wakeup() are not reentrant. I assume > yours is not if it's bothering to do an spl6(). Non reentrant > wakeup() implementations should set the interrupt level to the highest > possible level while executing in a critical section. As you noted, > a wakeup() that sets some intermediate interrupt level can be > interrupted by an interrupt handler that could potentially call wakeup(). > This could cause the problem you observed. It could also panic your > system if the sleep queues are implemented as linked lists. > > The solution you propose above is OK but a better solution (if you have > source) is to fix wakeup() to set the highest interrupt level supported > by the hardware. > > > Dave Rorke > attunix!der > > > > -- > > Carlos G. Mendioroz <tron@mrecvax.mrec.ar> > > Work: +54 (1) 313-8082 Fax: +54 (1) 311-1249 > > Home: +54 (1) 71-3473 ; Malabia 2659 11 B, Buenos Aires, 1425 ARGENTINA > A clarification of my response above. I said that the solution that Carlos proposed was OK. It is not however, sufficient to simply set the interrupt priority level low before calling wakeup in the interrupt routine, if the device interrupts at a level greater than the level set in wakeup(). For example, in the case he cites above you couldn't just set the interrupt level down to 6 at the beginning of the serial driver interrupt routine if the hardware interrupt actually comes in at level 7. This won't solve the problem and can create additional problems because the interrupt routine itself may not be reentrant, and setting the level of the interrupt routine lower than the level of the corresponding hardware interrupt could cause the interrupt routine to be re-entered. What you can do (assuming you can't fix wakeup) is make sure you don't have any devices configured on your system which: a) interrupt at a level greater than the level set in wakeup() and b) invoke interrupt handlers that call wakeup() Of course similar problems can exist with any kernel function that manipulates global data without blocking all interrupts that could result in interrupt handlers manipulating the same global data. Dave Rorke attunix!der
chapman@sco.COM (Brian Chapman) (12/01/88)
In article <455@mrecvax.UUCP> tron@mrecvax.UUCP (Carlos Mendioroz) writes:
<
< Sleep & wakeup both call spl6() to ensure secure access to
< the process queues, (Well, this is not theory, the calls are
< there...) and it is possible for the driver's device to
< interrupt as a wakeup() is running, isn't it ?
<
< As this driver (as SCO xenix serial driver) is running with
< prio 7, it's not blocked by spl6() and then it may interfere
< with the running wakeup by, say, runnig another wakeup.
<
< Solution: (?)
< Not to call any wakeup at prio 7, that is, put every driver
< interrupt routine that calls wakeup at prio 6 or below.
BTW just in case anyone is wondering, SCO's serial driver
does *NOT* call wakeup from it's level 7 interrupt routine.
--
Pay no attention to the man behind the curtain.
Brian Chapman uunet!sco!chapman SCO UNIX 3.2 Development
karl@ddsw1.MCS.COM (Karl Denninger) (12/03/88)
In article <602@sysco> chapman@sco.COM (Brian Chapman) writes: > >BTW just in case anyone is wondering, SCO's serial driver >does *NOT* call wakeup from it's level 7 interrupt routine. BTW: Just in case anyone cares, the reason people are investigating alternate drivers for SCO Xenix and non-intelligent boards is that the "stock" driver provided, while quite adaptable (and possessed with good overall performance): 1) Has never managed to get bi-directional ports w/internal locking working (ala: open "non modem" port, modem port is blocked, etc. It's been described here and is done by many other firms; email me for full details). Thus we have to use brain-dead schemes which work (some of the time) like ungetty, and I have to deal with customers (and our gear) that has deadlocked lines once in a while. 2) Has a broken RTS flow control capability. (No flames for the "non-standard" in RS232. Broken in the above means the Telebit won't talk to it when RTSFLOW is selected; you get low RTS and that's all folks!) This (and the inclusion of '286-compiled utilities!) is really the only "nasty" complaint I have had with your software in the year or so I've been running it. Other minor complaints include locking discrepancies with the SVID, but we've worked around those (and they're addressed in 2.3, from what I hear). If SCO would fix the serial driver problems, and offer a reasonably- inexpensive (ie: not a $600 update-the-entire-system fix) way to get the fix (aka: the "fix disks" you guys do now) then the stock driver would be completely adaquate. If you additionally enabled FIFOs if present on the UARTs you'd have a bombshell. At present, the 2.2.x /386 serial support out of the box is only passable without external augmentation (ie: smart board w/driver). It appears that some internal design decisions in the kernel make it more difficult to get reasonable operation from drivers that must operate at high priority as well; note though that I'm only a decent driver-hacker (as opposed to a guru). We have had some interesting problems playing with replacements for the stock stuff - _nothing_ is reliable enough for me to release yet (it's not our code; but we're hacking on it). We've not seen 2.3 yet; the last time I checked an update from 2.2.x was not available, nor was cost of same, although I could obtain a "new" 2.3 (at full cost, of course). -- Karl Denninger (karl@ddsw1.MCS.COM, ddsw1!karl) Data: [+1 312 566-8912], Voice: [+1 312 566-8910] Macro Computer Solutions, Inc. "Quality solutions at a fair price"