beauchem@terre (Denis Beauchemin) (03/01/91)
Hi there, We often see the following error message on different 386/33 MHz systems (some are SCSI and some aren't). UNIX SVR3.2.2 is installed: 6 lines of register information, then PANIC: Kernel mode trap. Type 0x0000000E Could someone tell me what is the cause of this PANIC? We've been told that it's supposed to be related to memory, but is it hardware or software? Thanks for your help! Denis Beauchemin -- === Denis Beauchemin, === beauchem@DMI.USherb.CA === === Dir. R&D ============================== === Sisca Informatique === I speak for myself... === === (819) 564-4003. ==============================
support@bomber.ism.isc.com (Support Account) (03/06/91)
In article <1991Feb28.185352.22561@DMI.USherb.CA> beauchem@terre (Denis Beauchemin) writes: >We often see the following error message on different 386/33 MHz systems (some >are SCSI and some aren't). UNIX SVR3.2.2 is installed: >6 lines of register information, then >PANIC: Kernel mode trap. Type 0x0000000E >Could someone tell me what is the cause of this PANIC? We've been told that >it's supposed to be related to memory, but is it hardware or software? According to Intel's 386 chip documentation, interrupt 14 (="e") is a page fault exception occurring when paging is enabled and an error occurs translating a linear address to a physical address. This error can be caused if the procedure doesn't have privileges to access the page, or if the page-directory or page-table entry used for address translation has a zero in its present bit. A driver going awry can cause this condition. It can be analyzed by building the kernel debugger into the kernel. When the fault occurs and the system panics the os should drop into the debugger which can then be used to display registers and to do a stack dump. Under ISC Unix, the debugger is documented in section 8 of the reference manual. ....
tmh@prosun.first.gmd.de (Thomas Hoberg) (03/16/91)
In article <1991Mar05.164607.1179@ism.isc.com>, support@bomber.ism.isc.com (Support Account) writes: |> In article <1991Feb28.185352.22561@DMI.USherb.CA> beauchem@terre (Denis Beauchemin) writes: |> >We often see the following error message on different 386/33 MHz systems (some |> >are SCSI and some aren't). UNIX SVR3.2.2 is installed: |> >6 lines of register information, then |> >PANIC: Kernel mode trap. Type 0x0000000E |> >Could someone tell me what is the cause of this PANIC? We've been told that |> >it's supposed to be related to memory, but is it hardware or software? |> |> According to Intel's 386 chip documentation, interrupt 14 (="e") |> is a page fault exception occurring when paging is enabled and an |> error occurs translating a linear address to a physical address. |> This error can be caused if the procedure doesn't have privileges |> to access the page, or if the page-directory or page-table entry |> used for address translation has a zero in its present bit. |> |> A driver going awry can cause this condition. It can be analyzed |> by building the kernel debugger into the kernel. When the fault |> occurs and the system panics the os should drop into the debugger |> which can then be used to display registers and to do a stack dump. |> Under ISC Unix, the debugger is documented in section 8 of the |> reference manual. |> Well, It seems there are plenty ways to hit that wall. I'd like to share one: We have an Armas 486/25 board here that employs a chip set by OPTI and has a 128k secondary level cache. We got 16MB in it and disabled relocation and caching on the 256k left over from the first Meg. After running for about eight hours the machine will start to get flaky. Programs dump core at will etc. And sooner or later there will be a page fault within the kernel. Any reboot will reproduce the error almost immediately after start up (a heat problem?--everything seems rather cool when I take the machine apart, though). Disabling the second level cache made the problem go away. I guess some of the secondary level cache's static ram is faulty. Since there is no parity checking on that RAM, it is never detected. If the kernel has to fetch some pointer from the 2nd-level cache and that data is mangled, dereferencing that pointer might trigger the page fault. Too bad, there is no easy way to find out which one of the 18 static RAMs has gone bad...or is there? :-> tom -- ---- Thomas M. Hoberg | UUCP: tmh@bigfoot.first.gmd.de or tmh%gmdtub@tub.UUCP c/o GMD Berlin | ...!unido!tub!gmdtub!tmh (Europe) or D-1000 Berlin 12 | ...!unido!tub!tmh Hardenbergplatz 2 | ...!pyramid!tub!tmh (World) Germany | BITNET: tmh%DB0TUI6.BITNET@DB0TUI11 or +49-30-254 99 160 | tmh@tub.BITNET