[comp.os.minix] MINIX 1.5.10 ignores NMI on Intel machines?

ghelmer@dsuvax.uucp (Guy Helmer) (08/16/90)

I've recently discovered that my main MINIX machine is suffering memory
lossage.  I suspected it for some time, but didn't have proof since
I saw no panics while MINIX was running and never received the dreaded
PARITY ERROR while running MS-DOS junk.  Finally, MINIX became extremely
unstable, so I dusted off the diagnostic software for this machine and
located the faulty bank of RAM.

My beef is: I though MINIX would have panic'ed as fast as possible on
receipt of NMI, which at least on XT-class machines means parity trouble.
There is no way an operating system should continue if this condition
occurs.

Assuming MINIX doesn't reset the D-type flip-flop at i/o addr 0xa0
to stop NMI's from reaching the processor, I've been researching the
kernel sources to find out what happens when an NMI occurs.  So far,
I think I see code in mpx.x accepting the nmi and calling exception
in exception.c; exception is supposed to either call cause_sig() in
system.c or panic.  I lose track of SIGBUS at that point, and it doesn't
seem to matter anyway since I haven't found any code to deal with
a SIGBUS (and signal.h indicates that SIGBUS is "obsolete", which
I have trouble believing right now).

Can the kernel gurus please help?  I'll keep searching and try to
find a place to stick a handler, but I think this kind of problem
should panic ASAP to lessen the chance of multiple NMI's and peripheral
corruption.

Thanks in advance.
-- 
Guy Helmer
work: DSU Computing Services, Business & Education Institute    (605) 256-5315
play: MidIX System Support Services                             (605) 256-2788
dsuvax!ghelmer@cs.utexas.edu, ...!bigtex!loft386!dsuvax!ghelmer

brucee@runxtsa.runx.oz.au (Bruce Evans) (08/18/90)

In article <1990Aug16.021356.24004@dsuvax.uucp> ghelmer@dsuvax.uucp (Guy Helmer) writes:
>My beef is: I though MINIX would have panic'ed as fast as possible on
>receipt of NMI, which at least on XT-class machines means parity trouble.
>There is no way an operating system should continue if this condition
>occurs.

An NMI in kernel+mm+fs causes a panic.

An NMI in a user program just causes a core dump. Unless it is INIT or a shell
that gets killed, the shell normally prints a message and you have the system
alive to help find the error.
-- 
Bruce Evans
Internet: brucee@runxtsa.runx.oz.au    UUCP: uunet!runxtsa.runx.oz.au!brucee
(My other address (evans@ditsyda.oz.au) no longer works)