jdg0@GTE.COM (Jose Diaz-Gonzalez) (07/17/90)
Hi there, My machine has been crashing about twice daily for the last week or so. The msg in the subject line shows up with all the register contents just before it crashes. I have contacted my vendor, they contacted ISC, and all they were able to tell me was that it the problem is a hardware error. Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and everything appears to be OK. Does anyone have any idea of what a type 0x0000000E error means? This might help me to narrow down the alternatives. Any pointers will be appreciated. Thanks, -- Jose +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + + + + Jose Pedro Diaz-Gonzalez + + + SrMTS + + + GTE Laboratories, Inc. + Tel: (617) 466-2584 + + MS-46 + email: jdiaz@gte.com + + 40 Sylvan Rd. + + + Waltham, MA 02254 + + + + + +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
peter@ncsbv.UUCP (Peter Jannesen) (07/18/90)
In article <9480@bunny.GTE.COM> jdg0@GTE.COM (Jose Diaz-Gonzalez) writes: >Hi there, > >My machine has been crashing about twice daily for the last week or so. >The msg in the subject line shows up with all the register contents just >before it crashes. I have contacted my vendor, they contacted ISC, and >all they were able to tell me was that it the problem is a hardware >error. Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and >everything appears to be OK. Does anyone have any idea of what a type >0x0000000E error means? This might help me to narrow down the >alternatives. Any pointers will be appreciated. Thanks, > Panic 0x00000E is a memory violation of the kernel. This can by a hardware problem. a other posibilty is a bug in the TCP/IP modules in the kernel. There is a bug in the TCP/IP streams module which generate a memory violation. This is a very old problem in 368/ix version 2.0.2. Our hope is that the new version in better. =============================================================================== Peter Jannesen Network Communication Systems (N.C.S), The Netherlands Phone: +31104130093 Fax: +31104146452 Address: Westbaak 96 Email: peter@ncsbv 3012 KM Rotterdam, The Netherlands =============================================================================== There is in the TCP/IP drivers in the kernel a bug
aland@infmx.UUCP (Colonel Panic) (07/18/90)
In article <9480@bunny.GTE.COM> jdg0@GTE.COM (Jose Diaz-Gonzalez) writes: >Hi there, > >My machine has been crashing about twice daily for the last week or so. >The msg in the subject line shows up with all the register contents just >before it crashes. I have contacted my vendor, they contacted ISC, and >all they were able to tell me was that it the problem is a hardware >error. On what basis? If they can confidently make such a statement, they should have a clue as to WHICH piece of hardware is the culprit... > Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and >everything appears to be OK. Don't count on it -- those diags aren't very complete. (At least you GOT some -- I'm STILL waiting for the diag disks for the machines we bought in December! I have one set from our sales rep to tide me over...) For example, the don't provide diags for all peripherals, the POST memory diags are hosed, etc. > Does anyone have any idea of what a type >0x0000000E error means? This might help me to narrow down the >alternatives. Any pointers will be appreciated. Thanks, I'm no kernel hacker, but I have seen this error when DMA was attempted on two channels simultaneously when DMAEXCL was set at default. Do you use >1 device that uses DMA? (Cartridge drives, caching controllers, etc.) Otherwise, it may help to list all of the hardware in use. >+ Jose Pedro Diaz-Gonzalez Cross-posted to comp.sys.att in case the helpful folks there have add'l info. Followups to comp.unix.i386. -- Alan Denney @ Informix Software, Inc. "We're homeward bound aland@informix.com {pyramid|uunet}!infmx!aland ('tis a damn fine sound!) ----------------------------------------------- with a good ship, taut & free Disclaimer: These opinions are mine alone. We don't give a damn, If I am caught or killed, the secretary when we drink our rum will disavow any knowledge of my actions. with the girls of old Maui."
darryl@ism780c.isc.com (Darryl Richman) (07/18/90)
In article <9480@bunny.GTE.COM> jdg0@GTE.COM (Jose Diaz-Gonzalez) writes:
"My machine has been crashing about twice daily for the last week or so.
"The msg in the subject line shows up with all the register contents just
"before it crashes. I have contacted my vendor, they contacted ISC, and
"all they were able to tell me was that it the problem is a hardware
"error. Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and
"everything appears to be OK. Does anyone have any idea of what a type
"0x0000000E error means? This might help me to narrow down the
"alternatives. Any pointers will be appreciated. Thanks,
You can do a bit of tracing yorself to see what is going on. A trap E
is a page fault--which usually means that there is a bad pointer being
followed in the kernel. You can discover what routine within the kernel
is causing the problem by noting the EIP value in the register dump,
and after rebooting, do "nm -vexp /unix | sort >/tmp/foo". Then edit
/tmp/foo and look for the first 5 digits or so of the EIP value...the
greatest address less than or equal to your EIP value is the routine
that was executing.
An even easier way to do this is to configure your kernel
with the kernel debugger. When the panic occurs, you will drop into the
debugger. Type "stack" to see a stack backtrace. You will also see the
instruction that caused the fault. This will give you much more information
with which to use to get an answer out of your reseller, and ultimately,
ISC.
A "hardware error" means nothing. Either your vendor misunderstood the
reply or hasn't pushed very hard on your behalf. Unix tends to be a
much harder test of the hardware than the vendor's diagnostics; we had
a case where a certain vendor was shipping cards that worked fine under
DOS and passed all of their tests just fine, but would never send an
interrupt; needless to say, Unix found this out quickly. When discussing
a problem like this, it is extremely important to pass along as much
information about your configuration as possible--all of the boards,
their interrupt and DMA numbers, how much memory, the make, model, and
geometry of the disks (if they are involved), whose motherboard, any
coprocessors, and so on. All of these things tend to interact.
--Darryl Richman
--
Copyright (c) 1990 Darryl Richman The views expressed are the author's alone
darryl@ism780c.isc.com INTERACTIVE Systems Corp.-A Kodak Company
"For every problem, there is a solution that is simple, elegant, and wrong."
-- H. L. Mencken
thssdwv@iitmax.IIT.EDU (David William Vrona) (07/19/90)
In article <45326@ism780c.isc.com> darryl@ism780c.UUCP (Darryl Richman) writes: >In article <9480@bunny.GTE.COM> jdg0@GTE.COM (Jose Diaz-Gonzalez) writes: >"My machine has been crashing about twice daily for the last week or so. >"The msg in the subject line shows up with all the register contents just >"before it crashes. I have contacted my vendor, they contacted ISC, and >"all they were able to tell me was that it the problem is a hardware >"error. Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and >"everything appears to be OK. Does anyone have any idea of what a type >"0x0000000E error means? This might help me to narrow down the >"alternatives. Any pointers will be appreciated. Thanks, > >You can do a bit of tracing yorself to see what is going on. A trap E >is a page fault--which usually means that there is a bad pointer being >followed in the kernel. You can discover what routine within the kernel >is causing the problem by noting the EIP value in the register dump, >and after rebooting, do "nm -vexp /unix | sort >/tmp/foo". Then edit >/tmp/foo and look for the first 5 digits or so of the EIP value...the >greatest address less than or equal to your EIP value is the routine >that was executing. > >An even easier way to do this is to configure your kernel >with the kernel debugger. When the panic occurs, you will drop into the >debugger. Type "stack" to see a stack backtrace. You will also see the >instruction that caused the fault. This will give you much more information >with which to use to get an answer out of your reseller, and ultimately, >ISC. > >A "hardware error" means nothing. Either your vendor misunderstood the >reply or hasn't pushed very hard on your behalf. Unix tends to be a >much harder test of the hardware than the vendor's diagnostics; we had >a case where a certain vendor was shipping cards that worked fine under >DOS and passed all of their tests just fine, but would never send an >interrupt; needless to say, Unix found this out quickly. When discussing >a problem like this, it is extremely important to pass along as much >information about your configuration as possible--all of the boards, >their interrupt and DMA numbers, how much memory, the make, model, and >geometry of the disks (if they are involved), whose motherboard, any >coprocessors, and so on. All of these things tend to interact. It's a hardware problem. Exact same thing happened to me. Took me two months to realize it was a noisy (electrically that is) power supply. Borrow a supply from a friend before you knock yourself out with all the other stuff.
tyager@maxx.UUCP (Tom Yager) (07/19/90)
In article <9480@bunny.GTE.COM>, jdg0@GTE.COM (Jose Diaz-Gonzalez) writes: > Hi there, > > My machine has been crashing about twice daily for the last week or so. > ...Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and > everything appears to be OK. Does anyone have any idea of what a type > 0x0000000E error means? This might help me to narrow down the > alternatives. Any pointers will be appreciated. Thanks, I just went through a round of problems revolving around this error message. I wish I could tell you what the problem is, but neither the vendor nor I were able to put our fingers on it. The only thing I can tell you about my own experience with this error is that it seems to flag a fundamental problem with the way the system talks to memory. The system in question had 16MB of memory, 8 of which was on a 32-bit expansion card. With the card installed, the system would panic. Sometimes immediately, sometimes after running OK for hours. With the extra memory removed, it would hum along and run forever. There are probably a thousand conditions that could trigger this error, but what you've been told so far jives with my own experience: It is a hardware problem. See if your dealer/distributor/whoever is willing to swap your machine for you. My problem wasn't solved until my vendor sent me a system based on a completely different motherboard design, so you might see how an older (25 or 20 MHz) 6386 behaves. If, that is, you can lived with the decrease in speed. (ty) -- +--Tom Yager, Technical Editor, BYTE----Reviewer, UNIX World---------------+ | UUCP: decvax!maxx!tyager NET: maxx!tyager@bytepb.byte.com | | Always looking for qualified UNIX,Mac,DOS and OS/2 software reviewers-- | +--mail to "reviews" instead of "tyager" above.---I speak only for myself.-+
beser@tron.UUCP (Eric Beser) (07/19/90)
I had the same problem. It turned outto be a memory chip that was not seated properly. What a way to find the problem! Eric Beser beser@tron.bwi.wec.com (301)-765-1010 "Captain, I think we can do it!" "Make it so, number one!"
overby@plains.UUCP (Glen Overby) (07/19/90)
In article <9480@bunny.GTE.COM>, jdg0@GTE.COM (Jose Diaz-Gonzalez) writes: > My machine has been crashing about twice daily for the last week or so. > ...Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and > everything appears to be OK. Does anyone have any idea of what a type > 0x0000000E error means? This might help me to narrow down the > alternatives. Any pointers will be appreciated. Thanks, We have a older Zenith 386 running 2.0.2 does also panic with the same obscure error message, but only when we yank serial port plugs on our DigiBoard COM/8. I looked up the error code in a 386 data sheet, but that didn't help much :-) Our DigiBoard, one of the "dumb" ones with no CPU, is set up to use the COM2 interrupt, since COM1 is on the main CPU board. We have talked to the people at DigiBoard about the problem, and their solution has always been to send us a new version of the driver (they release a new one about every two or three months). The panic occurs not only when we yank a port plug off, but also when our Gandalf StarMaster gets a headache and drops carrier on all of our ports (such as when they power it down for service, or when the power circut the Gandalf is on goes down but ours doesn't). It only happens once a month at the most, so we've been living with it. We are, obviously, using the modem control minor devices (/dev/ttyd1[A-h] rather than /dev/ttyd1[a-h]). The problem does NOT occur on lines without modem control. I suspect we're fighting different fires, but this might be something to look at. Good Luck! -- Glen Overby <overby@plains.nodak.edu> uunet!plains!overby (UUCP) overby@plains (Bitnet)
erik@westworld.esd.sgi.com (Erik Fortune) (07/20/90)
*My* "Kernel mode trap, type 0x00000E" under SCO ODT turned out to be a motherboard problem. Never can tell... -- Erik (erik@sgi.com)
dpi@loft386.uucp (Doug Ingraham) (07/20/90)
In article <63@maxx.UUCP>, tyager@maxx.UUCP (Tom Yager) writes: > In article <9480@bunny.GTE.COM>, jdg0@GTE.COM (Jose Diaz-Gonzalez) writes: > > Hi there, > > > > My machine has been crashing about twice daily for the last week or so. > > ...Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and > > everything appears to be OK. Does anyone have any idea of what a type Nobody has diagnostics that are any good. In our case Unix uses lots more of the hardware than any Diagnostic. Diagnostics only seem to be able to find really broken hardware. > > 0x0000000E error means? This might help me to narrow down the > > alternatives. Any pointers will be appreciated. Thanks, > This is a kernel panic because a parity error occured while executing in the kernel. > I just went through a round of problems revolving around this error message. > I wish I could tell you what the problem is, but neither the vendor nor I > were able to put our fingers on it. The only thing I can tell you about my > own experience with this error is that it seems to flag a fundamental > problem with the way the system talks to memory. The system in question had > 16MB of memory, 8 of which was on a 32-bit expansion card. With the card > installed, the system would panic. Sometimes immediately, sometimes after > running OK for hours. With the extra memory removed, it would hum along and > run forever. Unix kernel lives up at the top of memory. You could probably have moved simm's or dips around until you found the problem if it was an oddending component. It might also be a slow bus tranceiver on the motherboard or memory card. Find vendors you can work with unless you are very hardware and software savy. You will save time, money and graying of the hair. -- Doug Ingraham (SysAdmin) Lofty Pursuits (Public Access for Rapid City SD USA) uunet!loft386!dpi
jon@savant.UUCP (Jon Gefaell) (07/22/90)
In article <9480@bunny.GTE.COM> jdg0@GTE.COM (Jose Diaz-Gonzalez) writes: >Hi there, > >My machine has been crashing about twice daily for the last week or so. >The msg in the subject line shows up with all the register contents just >before it crashes. I have contacted my vendor, they contacted ISC, and >all they were able to tell me was that it the problem is a hardware >error. Now, I've run my diagnostics (I'm using an AT&T 6386E/33) and >everything appears to be OK. Does anyone have any idea of what a type >0x0000000E error means? This might help me to narrow down the >alternatives. Any pointers will be appreciated. Thanks, > Well, I get the same panic when I boot the bootable install disk from my ESIX (rev C) distribution on my AGI 33Mhz 386 box. It happens EVERY time, and there has only been one way to aleviate the problem... I set my bus speed for 8.33Mhz instead of 11Mhz, boot the disk, and all works fine. Then, at the point when it reboots and you start back up with the second disk of the base package, I set the bus speed back to 11Mhz and never see the error again... Strange, eh? recap: I get the 0000000000000000000E :) error when my bus speed is 11Mhz and I'm booting the base package installation disk. Lowering the bus speed to 8.33Mhz 'cures' the problem. Oh, I also got the E panic when I used to have a motherboard (Orchid) that simply WOULD NOT run ESIX... (returned it for a full refund from Orchid direct, it didn't work, but they made it right!) Hope this has helped a tad... I don't think it's gonna work for you, but it does seem to point out that: 1.) This error seems to be a catch all, and/or 2.) These folks (vendors, manufacturers) can't problem solves them selves out of a paper bag from the inside even with all the error messages and register contents in the world... :( -- +----------- Domain? DOMAIN? We Don't Need No Steeeenkin' Domain! -----------+ | __/\ | | \/~~ | +-savant!jon@virginia.edu {...}!uunet!virginia!savant!jon jeg7e@virginia.edu-+