[comp.sys.ibm.pc] NMI is SUPPOSED to be disabled!

james@bigtex.cactus.org (James Van Artsdalen) (11/03/88)

Since this isn't really a NeXT issue, I've forwarded to c.s.i.p.

In <7465@nsc.nsc.com>, grenley@nsc.nsc.com.UUCP (George Grenley) wrote:

> A person who shall remain nameless,

Many years ago my parents gave me a name I am now quite fond of, and I
think I'll keep it.

> while commenting on the lack of parity on NeXTs, mentioned that the
> 80X86 family had a "design flaw" in that one could not disable NMI
> without external hardware.

That's not quite what I said.  For reference:

	For those not aware: the Intel 80x88 family has a design flaw
	that requires external hardware to disable NMI.  Without such
	hardware it is not possible to prevent the system from
	randomly crashing when NMIs are used.

What I meant was that it was necessary to have such hardware, not that
Intel should have built it in.  Indeed, Intel should have prevented
the problem by better design in the first place.

> I am no fan of Intel or its machines, but this is a flat-out dumb remark.
> NMI is not SUPPOSED to be maskable.  That's why they call it non-maskable.
> I know of no uP which has a maskable NMI.

Well...  It is necessary to mask even NMI in any situation where the
interrupt table or stack is unstable.  On the 386 this can happen
because (1) the stack pointer has two halves, SS and SP, and (2) there
is no atomic way to switch modes and change interrupt tables.

In case (1), the original 8088 just died, and I don't believe there
was any way to prevent it.  In the 386 and 286 (and later 8088s I think),
the "MOV SS,reg/mem" instruction implicitly "disables" NMI checking after
the instruction, so that the sequence

	mov	ss,stack_ss
	mov	sp,stack_sp

was atomic with regards to interrupts.  Case (2) still affects the 386
as far as I know.  Here the problem is a result of the fact that the
interrupt table has different formats in different modes.  There is
always a window for an NMI between when the table pointer is changed
and when the mode is switched.

The 030 obvious suffers from none of this: the stack pointer is all in
one piece, and there aren't any "modes" to switch between.

> Intel did it right.  It's IBM, with their incredibly kludgy, screwed up,
> committe'd to death AT design that is in error.

Not true.  Intel got it wrong.  IBM did the only thing that could make
the machine work.

> Buy a Mac!

Buy a NeXT!
-- 
James R. Van Artsdalen      james@bigtex.cactus.org      "Live Free or Die"
Home: 512-346-2444 Work: 338-8789       9505 Arboretum Blvd Austin TX 78759

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (11/04/88)

In article <10175@bigtex.cactus.org> james@bigtex.cactus.org (James Van Artsdalen) writes:

| 	For those not aware: the Intel 80x88 family has a design flaw
| 	that requires external hardware to disable NMI.  Without such
| 	hardware it is not possible to prevent the system from
| 	randomly crashing when NMIs are used.

  I think that IBM was totally out to lunch on many parts of the PC
design, and this is one of them. The way NMI is useful is for situations
requiring panic stop of the current process. This does not imply that in
all cases it will be possible to restart what was going on at the time
of the NMI. Large computers use NMI-like features to allow saving a few
things when power fails, parity errors, etc.

  Designing a system to use NMI for common events seems to me to be a
case of trying to use the wrong tool. The Intel chip has a complete
interrupt structure available, using the 8259 or other chips. There
seems to be no reason for NMI on any recoverable condition. People who
regard parity as a recoverable error don't care if the results are
valid, and I can't agree with people who talk about "recovering from a
parity error" in any way except fixing the machine. In good condition a
PC should have <1 parity/year.

  To blame Intel for providing this feature is to not understand the
correct usage of the chip. I guess IBM screwed up in a number of other
ways, too, since the Intel CPU manual clearly states (for the 8086 and
all later versions) that "interrupts less than 32 are reserved for
future hardware enhancements". A system doesn't "randomly crash" with
NMI unless the hardware is broken or ill-designed such that it generates
a panic interrupt for trivial conditions.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

james@bigtex.cactus.org (James Van Artsdalen) (11/04/88)

In <12520@steinmetz.ge.com>, davidsen@crdos1.UUCP (bill davidsen) wrote:

> The way NMI is useful is for situations requiring panic stop of the
> current process.  [...]  There seems to be no reason for NMI on any
> recoverable condition.

Unfortunately, I've never seen Intel document anything like this.  If
Intel hadn't intended NMI to be recoverable, Intel wouldn't have
bother saving the return address, nor bothered with blocking NMI after
certain instructions (mov ss,_).

> People who regard parity as a recoverable error don't care if the
> results are valid, and I can't agree with people who talk about
> "recovering from a parity error" in any way except fixing the machine.

I doubt Intel designed NMI with parity error handling as the only
usage in mind.  Certainly debuggers must have been an intended usage.
Currently several video cards use NMI, as do many debuggers.

> A system doesn't "randomly crash" with NMI unless the hardware is
> broken or ill-designed such that it generates a panic interrupt for
> trivial conditions.

Actually, they did crash at random.  Back in the earliest days of the
PC with the earliest 8088s, before MOV SS,_ blocked interrupts, PCs
crashed somewhat randomly with applications that failed to do a CLI
before changing the stack segment.  This was *very* infrequent because
interrupts are so infrequent, and the window was only a couple of
clocks wide, but it did happen.  Anyone with hardware that used NMI in
a recoverable situation certainly would have seen spurious crashes.
It went away with later 8088s and was never an issue with the 286 or
386.
-- 
James R. Van Artsdalen      james@bigtex.cactus.org      "Live Free or Die"
Home: 512-346-2444 Work: 338-8789       9505 Arboretum Blvd Austin TX 78759

johne@hpvcla.HP.COM (John Eaton) (11/05/88)

<<<
< > Intel did it right.  It's IBM, with their incredibly kludgy, screwed up,
< > committe'd to death AT design that is in error.
<
< Not true.  Intel got it wrong.  IBM did the only thing that could make
< the machine work.
----------
Actually they both blew it. Intel's NMI design problems are non-issues if 
NMI is used for its stated purpose of "catastrophic failures like power
failure or timeout of a system watchdog timer". Their only problem was
placing the vector in low memory instead of up near the Reset vector where
it could be carved in rom. It should not be used as a general purpose 
interrupt. The problem is on the PC buss that sometimes it is the only one
left to grab so you yank on the IO Check line and chain your service routine
into the parity check routine.


John Eaton
!hpvcla!johne