[comp.unix.microport] Losing interrupts?

nelson@sun.soe.clarkson.edu (Russ Nelson) (09/30/88)

I notice that the section of the Runtime System manual that deals with
Writing Device Drivers and interrupts says that interrupts can be
lost.  Is this true?  If so, does Microport consider it a bug (i.e.
will it be fixed?)
--
--russ (nelson@clutx [.bitnet | .clarkson.edu])
To surrender is to remain in the hands of barbarians for the rest of my life.
To fight is to leave my bones exposed in the desert waste.

dave@micropen (David F. Carlson) (10/01/88)

In article <NELSON.88Sep29160014@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes:
> I notice that the section of the Runtime System manual that deals with
> Writing Device Drivers and interrupts says that interrupts can be
> lost.  Is this true?  If so, does Microport consider it a bug (i.e.
> will it be fixed?)
> --russ (nelson@clutx [.bitnet | .clarkson.edu])

The problem is not Microport's:  its the d*mn IBM PC/AT interrupt
controller (aka Intel 8259.)  The problem is not solvable in software
alone, thus Microport is not to blame.  It was nice of them to tell
you that it is a problem though so you won't pull your hair out trying
to figure our why.  It is good device driver design to *assume* you
will lose a critical interrupt so your design can cover its ass with
a polling.  If the "next" interrupt time is known, a callout can be done
to "simulate" the missing interrupt.  The rule for device drivers anywhere
is that there is no such thing as reliable interrupts.

-- 
David F. Carlson, Micropen, Inc.
micropen!dave@ee.rochester.edu

"The faster I go, the behinder I get." --Lewis Carroll

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (10/01/88)

In article <553@micropen> dave@micropen (David F. Carlson) writes:
>In article <NELSON.88Sep29160014@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes:
>> I notice that the section of the Runtime System manual that deals with
>> Writing Device Drivers and interrupts says that interrupts can be
>> lost.  Is this true?  If so, does Microport consider it a bug (i.e.
>> will it be fixed?)
>> --russ (nelson@clutx [.bitnet | .clarkson.edu])
>
>The problem is not Microport's:  its the d*mn IBM PC/AT interrupt
>controller (aka Intel 8259.)  The problem is not solvable in software
>alone, thus Microport is not to blame.  It was nice of them to tell
>you that it is a problem though so you won't pull your hair out trying
>to figure our why.  It is good device driver design to *assume* you
>will lose a critical interrupt so your design can cover its ass with
>a polling.  If the "next" interrupt time is known, a callout can be done
>to "simulate" the missing interrupt.  The rule for device drivers anywhere
>is that there is no such thing as reliable interrupts.
>

You're right, that problem is not Microports or on the 386 more generically
the System V port.

However we take note that SCO mysteriously looses far less than Microport.
The reason of course is that Microport spends a *lot* more time at spl7 than
SCO does. This exacerbates the problem that you mention.  

In my experience system interrupt overhead and time lost through use of spl7
is the primary cause of lost interrupts. At least when they are lost due to
a large influx of them.

SCO apparently has spent much time and effort in finding all places where
spl7 is needed and *not* needed and has reduced the amount of time when they
lock them out. 

For example with the tty drivers, try the following:

	stty 19200 -ixoff -echo; cat > /tmp/test

then from your terminal emulator program dump about 100kb of data to the
system. 

Even on an idle 386 system with System V you will see very few lines in the
destination file which are correct. On the same 386 running SCO you will
loose very few characters. 

SCO has also cleaned up interrupt servicing a bit. My rough guestimate for
servicing a serial tx interrupt on a 386 is 300-350 micro seconds for SCO
versus 400-450 micro seconds for System V.

Of those figures the actual overhead for the interrupt servicing is probably
about 150 vs 200 with the balance being spent in the actual serial driver
interrupt routine.

Let's take a poll. Will anyone using a Trailblazer on a Dumb serial port
under any type of Unix system send me a message on how successful it runs uucp
at 9600 or 19200.  What is your normal operating parameters.

I'll summarize if enough people reply.

As a further example; here on van-bc I can run two Trailblazers fairly
successfully at 9600 on dumb ports (van-bc is a 10Mhz 68010 based system). I
cannot however run one at 19200. Unfortunately somewhere in the kernel
someone is raising spl7 at odd intervals causing the uucp connection to drop
a character and time out. 

Running two at 9600 works because even though the net throughput is the same 
as one at 19200, and the system overhead is actually higher, the time to 
fill the three character buffer in the 8274 is twice as long at 9600. 
About three milli seconds versus one and a half. I can pull the stuff out 
very quickly but someone is getting spl7 for something right around the 
one and a half milli second range, and sometimes the driver just can't 
quite get the data out before that next character arrives. But there is 
plenty of time to get it out when running at 9600, the 8274's buffers 
havn't even filled up yet.

It's real close too, uucp will generally run for about five to fifteen
seconds before loosing that character.


-- 
Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl     Vancouver,BC,604-937-7532

hedrick@athos.rutgers.edu (Charles Hedrick) (10/04/88)

But SCO is based on Xenix.  I don't know how much traditional Xenix
code is present now compared with code from ATT's System V, but at
least the developers had Xenix available to steal from.  So they may
not have had to go through the System V kernel from scratch, "cleaning
up" interrupt handling.  Rather, they may simply have adopted Xenix
methodology for dealing with them.  Xenix was designed from the
beginning for relatively slow machines and support of funny devices,
so it is reasonable to think that it might have better interrupt
latency than SV.  If the problems are in the base SV/286 or 386
kernel, i.e. the part from ATT/Intel, it may not be practical for
Microport to fix it.  I've recently been working on Minix a lot to get
serial I/O to work there.  The fixes were in general not to the RS232
driver, but throughout the kernel to keep down the size of locked code
segments.  Also some adjustment of buffer sizes, again not at the
driver level.  I would expect something similar in Unix.  Microport
may be unable/unwilling to make changes throughout the ATT-maintained
portion of the kernel.  Everybody keeps yelling about the serial
device drivers as if the problem could be fixed there.  I really doubt
it.

mike@cimcor.mn.org (Michael Grenier) (10/04/88)

From article <Oct.3.14.16.01.1988.28689@athos.rutgers.edu>, by hedrick@athos.rutgers.edu (Charles Hedrick):
> driver level.  I would expect something similar in Unix.  Microport
> may be unable/unwilling to make changes throughout the ATT-maintained
> portion of the kernel.  Everybody keeps yelling about the serial
> device drivers as if the problem could be fixed there.  I really doubt
> it.

Actually, I believe the problems with the lost interrupts could be
fixed in the serial device drivers. Right now, as each character forces
an interrupt, the interrupt routine looks at the Interrupt Identification
register and decides if a character is available to be read or not. If
it is, the character is NOT simply put into a buffer but is also
passed though the line discipline routines (clists and stuff) while
running at spl7 (all other interrupts are turned off). This takes
a considerable amount of time and should not be used within an interrupt
routine. You can read a better discussion about clists in real time 
device drivers in the book "Writing UNIX Device Drivers". 

A better solution would be to simply put the character into a buffer 
and return out of the interrupt routine. Then the trick becomes "How
do we get the characters through the line disipline routines?". One
method might be to steal an idea that was presented in the book
"The UNIX Papers" where polling was used in the device drivers. A
working example of this is the Bell Tech's ICC card or Digiboard's
Intelligent card where a seperate process is running handling the 
details of the card. The idea I have is to have a seperate process
waiting on a wait() in the kernel where it would wakeup every
1/60th or 1/30th of a second to read the characters out of the 
buffer and pass them though the line disipline routines. In this
way, most of the character processing time would be handled on
with interrupts turned on. To improve processing time, one could
allow reads and writes in raw mode to be passed directly to the buffers
bypassing the line disipline routines altogether. (This assumes
programs like UUCP and ZMODEM run with the serial line in RAW mode).
I'm no device driver expert but I think the process can be made to 
wait at a priority less than PZERO so it will be the next process to run
every tick or so...we don't want an undue latency time for people
running terminals on the serial lines.

I would be happy to write the above mentioned driver (to include
support for the 16550 UARTS sitting here) if someone could explan
to me what all of the fields in the linesw structure (in sys/conf.h)
and tty structure (in sys/tty.h) are.

    -Mike Grenier
     mike@cimcor.mn.org
     ...uunet!rosevax!cimcor!mike
     ...amdahl!bungia!cimcor!mike

herder@myab.se (Jan Herder) (10/04/88)

In article <1900@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:
<However we take note that SCO mysteriously looses far less than Microport.
<The reason of course is that Microport spends a *lot* more time at spl7 than
<SCO does. This exacerbates the problem that you mention.  
<
<In my experience system interrupt overhead and time lost through use of spl7
<is the primary cause of lost interrupts. At least when they are lost due to
<a large influx of them.
<
<SCO apparently has spent much time and effort in finding all places where
<spl7 is needed and *not* needed and has reduced the amount of time when they
<lock them out. 
<
<For example with the tty drivers, try the following:
<
<	stty 19200 -ixoff -echo; cat > /tmp/test
<
<then from your terminal emulator program dump about 100kb of data to the
<system. 

There are ways to dealing with this problem, it's called pseudo dma.
If you have dumb serial port, you make a very small interrupt ruotine
wich reads the uart and puts the characters in a big circular list, which
can be read at a later time. If you make sure to never lock out this small
interrupt routine you don't loose any characters. 

This technic has been used whith DZ ports on VAXen and serial ports on SUNs.
The *RIGHT* way to do it is of course to get a better serial card.

-- 
Jan Herder, MYAB Sweden                    |  Phone: +46 31 18 75 12
Internet: herder@myab.se                   |  Fax:   +46 31 18 28 42
UUCP: 	  uunet!enea!chalmers!myab!herder  |  Address: Dr. Forseliusg 21
ARPA:	  herder%myab.se@uunet.uu.net      |           413 26 Gothenburg

vandys@hpcupt1.HP.COM (Andrew Valencia(Seattle)) (10/04/88)

/ hpcupt1:comp.unix.microport / hedrick@athos.rutgers.edu (Charles Hedrick) / 11:16 am  Oct  3, 1988 /
>But SCO is based on Xenix.  I don't know how much traditional Xenix
>code is present now compared with code from ATT's System V, but at
>least the developers had Xenix available to steal from.

	God, here we go again.  Listen *very carefully*:

1. Old SCO XENIX was weird

2. Current SCO XENIX is a port of System V

3. Current SCO XENIX is still somewhat weird in the name of compatibility

4. Current SCO XENIX will become less weird when the merged port comes out

			'Nuff said.
			Andy

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (10/05/88)

In article <Oct.3.14.16.01.1988.28689@athos.rutgers.edu> hedrick@athos.rutgers.edu (Charles Hedrick) writes:
>Microport to fix it.  I've recently been working on Minix a lot to get
>serial I/O to work there.  The fixes were in general not to the RS232
>driver, but throughout the kernel to keep down the size of locked code
>segments.  Also some adjustment of buffer sizes, again not at the
>driver level.  I would expect something similar in Unix.  Microport
>may be unable/unwilling to make changes throughout the ATT-maintained
>portion of the kernel.  Everybody keeps yelling about the serial
>device drivers as if the problem could be fixed there.  I really doubt
>it.

I apologize for not making my original comments a little more clear. Yes
this is exactly the problem. You can't just stick a better serial driver in
without changing other things in the kernel as well.

For example one of the basic differences between SCO 386 and the SysV 386 
products is the priority of the interrupts.

	SCO			SysV
  SPL7	Serial		SPL7	Clock
  SPL6	Clock		SPLTTY	Serial

SysV allows the clock interrupt to take over the machine at a higher
priority level than (for example) the serial interrupts.

SCO places the Serial interrupts at the top allowing them to take priority
over virtually everything else in the system.

Which one do you think will loose more serial interrupts (i.e. they both do
but the numbers vary greatly)?

SCO also has some other tricks in the serial driver interrupt handler such
as not doing the standard input process there, but doing it from a poll
routine at the clock interrupt priority level; again allowing receiving
chars to take precedence over processing them.


-- 
Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl     Vancouver,BC,604-937-7532

dyer@spdcc.COM (Steve Dyer) (10/05/88)

In article <10770002@hpcupt1.HP.COM> vandys@hpcupt1.HP.COM (Andrew Valencia(Seattle)) writes:
>>But SCO is based on Xenix.  I don't know how much traditional Xenix
>>code is present now compared with code from ATT's System V, but at
>>least the developers had Xenix available to steal from.
>	God, here we go again.  Listen *very carefully*:
>1. Old SCO XENIX was weird
>2. Current SCO XENIX is a port of System V
>3. Current SCO XENIX is still somewhat weird in the name of compatibility
>4. Current SCO XENIX will become less weird when the merged port comes out

Well, you're both right.  I think Chuck's point is well taken that Microsoft
had had a lot more experience on what NOT do to to get decent performance
on a PC-type machine.  A lot of this is just plain old kernel expertise.
If you've ever looked at Sys V.3 kernel sources, you will find spln()'s all
over the place, in places where there's no possibility that an interrupt
could affect a particular flag or data structure.  This is not inherently
bad, but is a clue that some of the people who worked on it weren't quite
on the ball (now, there's a lot of code which is correct, too!)  Add to
that the need for an OEM like Microport to provide its own device drivers
and this has a greater possibility of occurring (calling the line discipline
input routine at spl7(), if it's true, is a good example of this.)

-- 
Steve Dyer
dyer@harvard.harvard.edu
dyer@spdcc.COM aka {harvard,husc6,linus,ima,bbn,m2c,mipseast}!spdcc!dyer

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (10/06/88)

In article <591@cimcor.mn.org> mike@cimcor.mn.org (Michael Grenier) writes:
>From article <Oct.3.14.16.01.1988.28689@athos.rutgers.edu>, by hedrick@athos.rutgers.edu (Charles Hedrick):

>A better solution would be to simply put the character into a buffer 
>and return out of the interrupt routine. Then the trick becomes "How
>do we get the characters through the line disipline routines?". One
>method might be to steal an idea that was presented in the book
>"The UNIX Papers" where polling was used in the device drivers. A

>with interrupts turned on. To improve processing time, one could
>allow reads and writes in raw mode to be passed directly to the buffers
>bypassing the line disipline routines altogether. (This assumes

Not really required, the line disciplines are not to bad when it comes to
raw I/O. 

>programs like UUCP and ZMODEM run with the serial line in RAW mode).

Uucp does, zmodem doesn't.


This *is* essentially what SCO is already doing. They have built in support
for a poll routine in a driver which is called every clock tick. Their
interrupt routines for the serial driver are at SPL7 and the clock tick is
SPL6. The serial interrupts operate out of buffers which are filled/emptied
by the poll routine.

>I'm no device driver expert but I think the process can be made to 
>wait at a priority less than PZERO so it will be the next process to run
>every tick or so...we don't want an undue latency time for people
>running terminals on the serial lines.

By using the poll routine (or equivalent using timeout() with other Unix's)
you don't have to worry about running as a user process with the other
attendant issues you mention. You are effectively still running as an
interrupt routine, the trick is to get the clock running at a lower spl
level than serial interrupt. We are not too worried about general overhead
as much as we are worried about leaving lot's of cpu cycles available at
SPL7. In other words when a serial interrupt arrives, there is never a
period of more than one or two hundred micro-seconds before we run the
interrupt service routine.

Of course there are some bugs to be worked out but it does work fairly well.
On problem under SCO 386 is that all of the line discipline routines (e.g.
canon()) protect the tty structure at SPL5. Unfortunately the poll routines
come in at SPL6! There are a couple of small windows where some important
information can get lost and the port will stop functioning until closed and
re-opened. SCO has extra code in their poll routines to compensate for this
problem.

>I would be happy to write the above mentioned driver (to include
>support for the 16550 UARTS sitting here) if someone could explan
>to me what all of the fields in the linesw structure (in sys/conf.h)
>and tty structure (in sys/tty.h) are.

Already done. I'm finishing up the non-polling version today for both SCO
and SysV on the 386. 

Hope to have polling versions tested by next week...  it's working but the 
SPL5 problem is bitch. I've got to ensure that I've found all of the problem
area's.

Actually with the 16550's you don't quite need to go to a polling scheme, 
but with the 16450's it's the only way to guarrantee you don't loose
interrupts.

-- 
Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl     Vancouver,BC,604-937-7532

mike@cimcor.mn.org (Michael Grenier) (10/06/88)

From article <1905@van-bc.UUCP>, by sl@van-bc.UUCP (pri=-10 Stuart Lynne):
! For example one of the basic differences between SCO 386 and the SysV 386 
! products is the priority of the interrupts.
! 
! 	SCO			SysV
!   SPL7	Serial		SPL7	Clock
!   SPL6	Clock		SPLTTY	Serial
! 
! SysV allows the clock interrupt to take over the machine at a higher
! priority level than (for example) the serial interrupts.

I don't think so. Microport has the serial interrupts at SPL7 (the
highest) and the clock at the lowest (which is probably why the
clock loses time!). In fact, I doubt Microport is losing that
many interrupts on the serial lines until the entire system
gets too loaded which doesn't take that much  with the overhead 
being incurred.

    -Mike Grenier
    mike@cimcor.mn.org
    uunet!rosevax!cimcor!mike

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (10/07/88)

In article <592@cimcor.mn.org> mike@cimcor.mn.org (Michael Grenier) writes:
>From article <1905@van-bc.UUCP>, by sl@van-bc.UUCP (pri=-10 Stuart Lynne):
>! For example one of the basic differences between SCO 386 and the SysV 386 
>! products is the priority of the interrupts.
 
>! 	SCO			SysV
>!   SPL7	Serial		SPL7	Clock
>!   SPL6	Clock		SPLTTY	Serial
 
>! SysV allows the clock interrupt to take over the machine at a higher
>! priority level than (for example) the serial interrupts.

>I don't think so. Microport has the serial interrupts at SPL7 (the
>highest) and the clock at the lowest (which is probably why the

Can't speak to Microport 286, but I just spent an hour and a half pulling in
Microport's 386 atconf directories off tape and they match the standard 
System V / 386 stuff pretty close. 

The clock is at SPL7 and serial is at SPLTTY. 

For inquiring minds, SPL6 < SPLTTY < SPL7. In other words SPL7 is actually
priority level 8!

In any event I'm not sure it will be possible to distribute a polling serial
driver which needs the clock to be a lower SPL level, the standard release
has a check for what SPL level it is running at and panics with a polite
message if not at SPL7. 

-- 
Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl     Vancouver,BC,604-937-7532