[comp.unix.sysv386] Kernel mode trap. Type 0x0000000E

beauchem@terre (Denis Beauchemin) (03/01/91)

Hi there,

We often see the following error message on different 386/33 MHz systems (some
are SCSI and some aren't).  UNIX SVR3.2.2 is installed:

6 lines of register information, then

PANIC: Kernel mode trap. Type 0x0000000E

Could someone tell me what is the cause of this PANIC?  We've been told that
it's supposed to be related to memory, but is it hardware or software?

Thanks for your help!

Denis Beauchemin
-- 
=== Denis Beauchemin,  === beauchem@DMI.USherb.CA ===
=== Dir. R&D           ==============================
=== Sisca Informatique === I speak for myself...  ===
=== (819) 564-4003.    ==============================

support@bomber.ism.isc.com (Support Account) (03/06/91)

In article <1991Feb28.185352.22561@DMI.USherb.CA> beauchem@terre (Denis Beauchemin) writes:
>We often see the following error message on different 386/33 MHz systems (some
>are SCSI and some aren't).  UNIX SVR3.2.2 is installed:
>6 lines of register information, then
>PANIC: Kernel mode trap. Type 0x0000000E
>Could someone tell me what is the cause of this PANIC?  We've been told that
>it's supposed to be related to memory, but is it hardware or software?

According to Intel's 386 chip documentation, interrupt 14 (="e")
is a page fault exception occurring when paging is enabled and an
error occurs translating a linear address to a physical address.
This error can be caused if the procedure doesn't have privileges
to access the page, or if the page-directory or page-table entry
used for address translation has a zero in its present bit.

A driver going awry can cause this condition. It can be analyzed
by building the kernel debugger into the kernel. When the fault
occurs and the system panics the os should drop into the debugger
which can then be used to display registers and to do a stack dump.
Under ISC Unix, the debugger is documented in section 8 of the
reference manual.


....

tmh@prosun.first.gmd.de (Thomas Hoberg) (03/16/91)

In article <1991Mar05.164607.1179@ism.isc.com>, support@bomber.ism.isc.com (Support Account) writes:
|> In article <1991Feb28.185352.22561@DMI.USherb.CA> beauchem@terre (Denis Beauchemin) writes:
|> >We often see the following error message on different 386/33 MHz systems (some
|> >are SCSI and some aren't).  UNIX SVR3.2.2 is installed:
|> >6 lines of register information, then
|> >PANIC: Kernel mode trap. Type 0x0000000E
|> >Could someone tell me what is the cause of this PANIC?  We've been told that
|> >it's supposed to be related to memory, but is it hardware or software?
|> 
|> According to Intel's 386 chip documentation, interrupt 14 (="e")
|> is a page fault exception occurring when paging is enabled and an
|> error occurs translating a linear address to a physical address.
|> This error can be caused if the procedure doesn't have privileges
|> to access the page, or if the page-directory or page-table entry
|> used for address translation has a zero in its present bit.
|> 
|> A driver going awry can cause this condition. It can be analyzed
|> by building the kernel debugger into the kernel. When the fault
|> occurs and the system panics the os should drop into the debugger
|> which can then be used to display registers and to do a stack dump.
|> Under ISC Unix, the debugger is documented in section 8 of the
|> reference manual.
|> 
Well, It seems there are plenty ways to hit that wall. I'd like to share one:
We have an Armas 486/25 board here that employs a chip set by OPTI and has a
128k secondary level cache. We got 16MB in it and disabled relocation and caching
on the 256k left over from the first Meg. After running for about eight hours
the machine will start to get flaky. Programs dump core at will etc. And sooner
or later there will be a page fault within the kernel. Any reboot will reproduce
the error almost immediately after start up (a heat problem?--everything seems
rather cool when I take the machine apart, though). Disabling the second level
cache made the problem go away. I guess some of the secondary level cache's
static ram is faulty. Since there is no parity checking on that RAM, it is never
detected. If the kernel has to fetch some pointer from the 2nd-level cache and
that data is mangled, dereferencing that pointer might trigger the page fault.
Too bad, there is no easy way to find out which one of the 18 static RAMs has gone
bad...or is there?

:-> tom

-- 
----
Thomas M. Hoberg   | UUCP: tmh@bigfoot.first.gmd.de  or  tmh%gmdtub@tub.UUCP
c/o GMD Berlin     |       ...!unido!tub!gmdtub!tmh (Europe) or
D-1000 Berlin 12   |       ...!unido!tub!tmh
Hardenbergplatz 2  |       ...!pyramid!tub!tmh (World)
Germany            | BITNET: tmh%DB0TUI6.BITNET@DB0TUI11 or
+49-30-254 99 160  |         tmh@tub.BITNET