[comp.unix.wizards] Surprising fact about STREAMS service modules

larry@hcr.UUCP (Larry Philps) (12/23/88)

Well, I just learned something about the STREAMS mechanism today.  Since I
found it quite surprising, I thought I should mention it to the world before
some other poor sucker gets surprised also.

As anyone who has perused the STREAMS code at all knows, all the
non-interruptable parts of it run at "splstr", which on most systems is
spl5.  This is lower than the buffer cache, but high enough to block all
devices, most importantly the terminals and the network.  After looking
carefully at the STREAMS programming manuals I can't find any reference to
the spl at which the driver service modules are called.  Silly me, I just
assumed that it would be at splstr.  However, check out the following code
fragment from io/stream.c.

queuerun()
{
	....

	s = splstr();
	...
	if (q->q_qinfo->qi_srvp) {
		spl1();
		(*q->q_qinfo->qi_srvp)(q);
		splstr();
	}
	...
	splx(s);
}

Note the spl1()!  Anybody else out there surprised?  We were seeing some
really bizarre behaviour out of RFS when under heavy load the server would
reverse the order of a DUCOPYOUT and a DUREAD packet.  RFS produces no
diagnostics whatsoever when the out of sequence DUCOPYOUT arrives and the
only visible effect was that part of the read returned all zeros.

We tracked this down using a network analyzer to capture a packet trace and
then analyzed the RFS headers.  After much work I decided that "This can
never happen, but it did - well ... at least it can never happen if the code
is never reentered".  After that it took every little time to discover the
preceeding fragment.

So, the moral of this story is, if you have a STREAMS driver that talks to
any hardware, be sure to protect its code with an splstr() or else write that
code so that it can be reentered.

---
Larry Philps                             HCR Corporation
130 Bloor St. West, 10th floor           Toronto, Ontario.  M5S 1N5
(416) 922-1937                           {utzoo,utcsri,ihnp4}!hcr!larry

paul@taniwha.UUCP (Paul Campbell) (12/25/88)

In article <4385@hcr.UUCP> larry@hcr.UUCP (Larry Philps) writes:
>Well, I just learned something about the STREAMS mechanism today.  Since I
>found it quite surprising, I thought I should mention it to the world before
>some other poor sucker gets surprised also.
>...
>the spl at which the driver service modules are called.  Silly me, I just
>assumed that it would be at splstr.  However, check out the following code
>fragment from io/stream.c.

I've always known about this but then I taught myself from the sources
(no manuals aroud then ...). I've always assumed that there are only
two things one can assume:

	- that no other service routine can be called while one is running
	  (this should not apply in MP systems .... I always thought that
	  multiple service routines could/would/should be running but none
	  in the same stream [or its stream head code]).
	
	- that timeouts will not go off while service routines are being run
	  (again this will probably not apply to MP systems).

What you CAN'T assume is that put routines will not be called while a service
routine is being run, one might be called from an interrupt service routine.

Something else that's always assumed is that ISRs are all called at IPLs
<= splstr (most systems I've worked with had splstr == 6 which is >= spltty).
I have worked on one system where the device we were working with interrupted 
at > splstr (it had to - it had horrible real-time constraints). The games
we had to play to make putq() safe were pretty bad!

	Paul

PS:	has anyone done a 'real' MP streams implementation? What sort of
	scheduling policies did you use? How did code have to change?
	Which of the above assumptions still apply? (This would probably
	make a good Usenix paper).

-- 
Paul Campbell			..!{unisoft|mtxinu}!taniwha!paul (415)420-8179
Taniwha Systems Design, Oakland CA

 	"Read my lips .... no GNU taxes"

smb@ulysses.homer.nj.att.com (Steven M. Bellovin) (12/26/88)

A persistent bug source, in many versions on the UNIX system, is that
splN() calls, other than splx(), unconditionally set the interrupt
level to the indicated value, rather than *raising* it to that value.
It is almost never correct to lower an interrupt level, except via
splx() or spl0().  Sun has gotten some of those right; the symbolic
spl() calls (splimp(), etc.) check the current level before raising it.
4.3bsd is wrong; I haven't been able to check 4.3 Tahoe yet.

chris@mimsy.UUCP (Chris Torek) (12/26/88)

In article <11047@ulysses.homer.nj.att.com> smb@ulysses.homer.nj.att.com
(Steven M. Bellovin) writes:
>A persistent bug source, in many versions on the UNIX system, is that
>splN() calls, other than splx(), unconditionally set the interrupt
>level to the indicated value, rather than *raising* it to that value.
>It is almost never correct to lower an interrupt level, except via
>splx() or spl0().  Sun has gotten some of those right; the symbolic
>spl() calls (splimp(), etc.) check the current level before raising it.
>4.3bsd is wrong; I haven't been able to check 4.3 Tahoe yet.

Alas, some VAX hardware (and some other hardware) requires this sort of
foolery.  The Qbus interrupt system is such that a device that is
supposed to operate at IPL15 (such as a DHV11) may in fact interrupt at
IPL17, if there is a device behind it on the bus that requests at BR7.
It is therefore good for performance to explicitly lower the interrupt
priority in the service routines for Qbus devices.  the UDA50 driver,
for instance, works on the RQDX controllers as well, so udaintr()
begins with

	#ifdef QBA	/* was #ifdef VAX630 */
		(void) spl5();
	#endif

A more flexible approach might be to define `raisepl' inlines (perhaps
`rpl'?), to be used in appropriate places.  Or simply use the current
system and declare that all kernel hackers must keep their priorities
straight :-) .
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

gwyn@smoke.BRL.MIL (Doug Gwyn ) (12/27/88)

In article <15167@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>Alas, some VAX hardware (and some other hardware) requires this sort of
>foolery.  The Qbus interrupt system is such that a device that is
>supposed to operate at IPL15 (such as a DHV11) may in fact interrupt at
>IPL17, if there is a device behind it on the bus that requests at BR7.

I thought the interrupt vector was supposed to establish the actual
kernel service IPL.  (Or was I thinking about some other system?)
Anyway, it would seem to be a good idea, given that the device drivers
seem to think they know what the "proper" IPL is supposed to be.

kre@cs.mu.oz.au (Robert Elz) (12/27/88)

In article <11047@ulysses.homer.nj.att.com> smb@ulysses.homer.nj.att.com
(Steven M. Bellovin) writes:
>A persistent bug source, in many versions on the UNIX system, is that
>splN() calls, other than splx(), unconditionally set the interrupt
>level to the indicated value, rather than *raising* it to that value.

In article <15167@mimsy.UUCP>, chris@mimsy.UUCP (Chris Torek) describes
a use for deliberately lowering the IPL, then continues:
> A more flexible approach might be to define `raisepl' inlines (perhaps
> `rpl'?), to be used in appropriate places.  Or simply use the current
> system and declare that all kernel hackers must keep their priorities
> straight :-) .

Chris's apparently tounge-in-cheek remark/pun is probably really the
way that it has to be.  Preventing splN() from lowering the IPL only
solves half of the problem.

Eg: if you're in a disc driver intr routine at IPL 6, and you want to
protect a call of some single use routine, and you call spl5()
(incorrectly lowering the IPL, and perhaps permitting another level 6
interrupt, maybe even from the same controller, and generally weraking
havoc), having spl5() notice that you were at IPL 6, and thus leaving
you at level 6 will help a little.

But what of some other routine that had (correctly) raised its IPL to
level 5, and called the same routine, and was half way through when
your controller interrupted at level 6 .. the interrupt routine's call
of the that single-use routine just scrambled the system, at whatever IPL
that call actually runs.

Using interrupt lockout to implement monitors (the way that single
processor unix always has) simply requires care and planning by the
programmers, there's no way to allow sloppy programming and keep this
scheme working.

With the move to multi-processor kernels, all of these problems should
vanish, there should be no reason for any uses of splN() routines at
all, something much more powerful is required.

If you want to protect yourself against sloppy programming, the proper
technique is to make the splN() routines panic if there's an attempt to
lower priority, and perhaps add some loweripl() routines for the odd
cases where lowering the IPL is a desired result (to compensate for
broken hardware, or whatever).

kre