fortin@zap.UUCP (Denis Fortin) (03/14/88)
In article <829@ddsw1.UUCP> karl@ddsw1.UUCP (Karl Denninger) writes: >In article <115@hawkmoon.UUCP> det@hawkmoon.UUCP (Derek E. Terveer) writes: >>[running SysV/386] The problem is that >>i seem to be having lots of (relatively) unexplainable panics and i was >>wondering if anyone else with the 386 version was also having numbers of these >>panics, like "kernel mode traps" (type e), "user mode traps" (type 2 and 8), >>and "iupdat - iaddress >2^24" panics. Plus i keep getting a number of "NMI in >>system mode" messages. > >The key one is the NMI message. > >This can only be generated one way -- if your memory board(s) generate a >parity fault. Hmmm. I have been having similar problems, so I guess this is a good time to post about them... I'm running Microport System V/AT 2.3 on a 6/10MHz AT-class machine. I can run the system with no problems at 6MHz, and I can run it at 10MHz without any problems under both DOS and IBM XENIX 1.0 (which I don't use anymore since I have uPort). Unfortunately, whenever I attempt to run SV/AT at 10MHz, it crashes after a few minutes (sometimes even before that). In a few cases, I've noticed the message "NMI in system mode" at boot time, *but* my 2MB of RAM are all 100ns chips, which *should* be more than sufficient for 10MHz operation!!! (Question: my video board is an original IBM EGA board. Could it be the culprit? Can it handle 10MHz?) In general, the system will boot without any problems, and after a few minutes, the response time slows down a lot and ultimately I get the following message on the console: user=0xC7E cs=0x208 ds=0x220 es=0x220 ss=0x213 di=0x400 si=0x5BE0 bp=0x2C0 bx=0x7 dx=0xA1 cx=0x0 ax=0x7 ip=0x5807 flags=0x202 trap type 0xD err=0x210 stack frame address = E830270 Double panic: Software detects double fault I have also seen "user=0x10 ... err=0x8173". I know that this is a bit cryptic, but none of my requests for help from Microport on this issue (even when my SysV/AT was still under warranty) have yielded any result. (In most cases, I was told that the info was transfered to someone else ... who never got back to me.) I currently have an update contract (I still think that uPort is a pretty good product), but I have not purchased a technical support contract because from what I have seen during my warranty period, their technical service won't help me much with this problem (note: this was about 1.5 years ago). I guess my biggest problem is that I have really no way of knowing what the register dump really means... Also, I'm very puzzled by the fact that IBM Xenix 1.0 will run on my machine at 10MHz (I can understand why DOS works: it's not as demanding interrupt-wise on the machine, but XENIX *does* work and that annoys me!) Anyway, if anybody has an idea about things I could try, I would appreciate it *** A LOT ***. Running my AT at 6MHz is definitely not the same as running it at 10HMz (SIGH!). (Also, could anybody post a description of what those "user=n" messages mean? It's not anywhere in the documentation. (I understand that you get vanilla-flavored SysV documentation, but I still feel that it's quite annoying to run software that generates hexadecimal error messages with no explanations!)) -- Denis Fortin | fortin@zap.UUCP CAE Electronics Ltd | philabs!micomvax!zap!fortin The opinions expressed above are my own | fortin%zap.uucp@uunet.uu.net
det@hawkmoon.MN.ORG (Derek E. Terveer) (03/17/88)
In article <421@zap.UUCP>, fortin@zap.UUCP (Denis Fortin) writes: > In article <829@ddsw1.UUCP> karl@ddsw1.UUCP (Karl Denninger) writes: > >In article <115@hawkmoon.UUCP> det@hawkmoon.UUCP (Derek E. Terveer) writes: > >>[running SysV/386] The problem is that > >>i seem to be having lots of (relatively) unexplainable panics and i was > >>wondering if anyone else with the 386 version was also having numbers of these > >>panics, like "kernel mode traps" (type e), "user mode traps" (type 2 and 8), > >>and "iupdat - iaddress >2^24" panics. Plus i keep getting a number of "NMI in > >>system mode" messages. > > > >The key one is the NMI message. > > > >This can only be generated one way -- if your memory board(s) generate a > >parity fault. When i <det@hawkmoon> posted the original message, above, my machine was not only in panic mode --> so was i. I had just spent a not inconsiderable amount of money on a '386 machine plus the uport software to run on it. It installed and then pretty much from day 2 started crashing from 1 to 5 times a day. Imagine my horror at witnessing these events! So of course i was worried and hoped that someone on the net would be able to help. Karl <karl@ddsw1> was the one that tipped me off. He stated that the "NMI in system mode" messages only happened in this plane of existence when parity errors occurred. I thought about it for a little bit -- there should not have been *ANY* parity errors from the chips themselves, it was a brand new board and i tested the board and all the chips for some 70+ times. I was confident that it couldn't be the board. Therefore, drawing on my experience with running some other unix machines (vax 11/780s), i inspected my environment. Power? I had my pc plugged into an outlet with one of those little-itsy-bitsy noise filters that plug into a two outlet wall receptacle and provide three somewhat filterd outlets. I also had my stereo and a lamp plugged into the same outlet - considering the house is 40+ years old and they didn't use electricity back then (:-) i only have two outlets in my room! I decided to test whether or not the stereo and lamp were somehow dirtying the power to my pc and moved them to another outlet, courtesy of a long extension cord. I ran for a couple of days -- no problems. I have now run for TWO weeks with not a *single* problem!!!! Obviously, i was getting some sort of substandard power with the other stuff plugged in. Especially upon retrospect i realized that my pc only seemed to crash when my stereo was on. So an apology is on order here from me to microport. They may or may not have problems, but my PC crashing all the time is now not one of them. I am very pleased with the stability of the system now (now if only i had more memory it would be a little faster (:-)). Moral of the story: When you start getting errors like the ones i described (esp. NMI errors), check the environment and isolate your pc if you can. Now all i have to do is figure out where to put my damn extension cord connecting my stereo and lamp..... > [..] > I can run the system with no problems at 6MHz, and I can run it at 10MHz > without any problems under both DOS and IBM XENIX 1.0 (which I don't use > anymore since I have uPort). Unfortunately, whenever I attempt to run > SV/AT at 10MHz, it crashes after a few minutes (sometimes even before that). > [..] > In a few cases, I've noticed the message "NMI in system mode" at boot time, > *but* my 2MB of RAM are all 100ns chips, which *should* be more than > sufficient for 10MHz operation!!! I got the same message within about 5 minutes or less of boot time, but i don't think it has to do with the speed of the chips. Karl pointed out that these are parity errors. You could run your board diagnostics, if you have any, and see if there are any problems there. Also, a lot of times the speed of the BUS is different than the speed of the cpu/motherboard. For example, i have a 16MHz cpu, but my bus speed, to which i have attached 2Mb of memory on an intel above board, is *only* ~8Mhz. You didn't state your config as far as memory goes, so if you only need the barest minimum of memory, like 640K, to run uport, perhaps you could take out the extra memory if its on a seperate board and see if you can then run uport at 10MHz. If you can, then theres obviously a problem with your extra memory. You suggestion about the graphics board is a valid one, but i don't know enough about that to comment further. Finally, check the power. Hope this helps...! > (Also, could anybody post a description of what those "user=n" messages > mean? Yes, yes, please! derek -- Derek Terveer det@hawkmoon.MN.ORG uunet!rosevax!elric!hawkmoon!det
karl@ddsw1.UUCP (Karl Denninger) (03/17/88)
In article <421@zap.UUCP> fortin@zap.UUCP (Denis Fortin) writes: >In article <829@ddsw1.UUCP> karl@ddsw1.UUCP (Karl Denninger) writes: >> >>The key one is the NMI message. >> >>This can only be generated one way -- if your memory board(s) generate a >>parity fault. > >Hmmm. I have been having similar problems, so I guess this is a good time >to post about them... [Some detail deleted] >In general, the system will boot without any problems, and after a few >minutes, the response time slows down a lot and ultimately I get the >following message on the console: > > user=0xC7E > cs=0x208 ds=0x220 es=0x220 ss=0x213 di=0x400 si=0x5BE0 > bp=0x2C0 bx=0x7 dx=0xA1 cx=0x0 ax=0x7 ip=0x5807 flags=0x202 > trap type 0xD > err=0x210 > stack frame address = E830270 > Double panic: Software detects double fault Aha! Now we're talking. A crash dump (well, sorta)! To find the routine in the kernel which caused the panic, you do this: nm -x /system5 >/tmp/xxxxx (dump list of kernel to file) Now, go looking for the address you panic'd at. You put the 'cs' and 'ip' values together to get this number (code segment & instruction pointer). In this case, you get 0x0208005807. Find the routine (use 'vi' or another editor; looking will take ALL DAY; this is a huge file!) which has the largest address LESS THAN the panic address. This is the routine which was executing when the system crashed. From the numbers above, I'll guess that the routine you'll find will be 'rmsd'. IF SO - get on the phone and yell loudly -- you have a manifestation of the very-common SIO crash which has plagued us poor '286 Microport owners for over a year! If it's NOT 'rmsd' then please post the name of the routine (or mail it to me), as it's probably a new one... and might give all us net.gurus some ideas! >I have also seen "user=0x10 ... err=0x8173". > >I know that this is a bit cryptic, but none of my requests for help from >Microport on this issue (even when my SysV/AT was still under warranty) >have yielded any result. (In most cases, I was told that the info was >transfered to someone else ... who never got back to me.) This is interesting -- they didn't even tell you how to get the address of the routine where you paniced? Sheesh! A master list at Uport doesn't help anyone with this, as the addresses move if you use the link kit. >I currently have an update contract (I still think that uPort is a pretty >good product), but I have not purchased a technical support contract because >from what I have seen during my warranty period, their technical service >won't help me much with this problem (note: this was about 1.5 years ago). > >I guess my biggest problem is that I have really no way of knowing what >the register dump really means... Also, I'm very puzzled by the fact >that IBM Xenix 1.0 will run on my machine at 10MHz (I can understand why >DOS works: it's not as demanding interrupt-wise on the machine, but XENIX >*does* work and that annoys me!) It ought to annoy you... it does us as well. ----- Karl Denninger | Data: +1 312 566-8912 Macro Computer Solutions, Inc. | Voice: +1 312 566-8910 ...ihnp4!ddsw1!karl | "Quality solutions for work or play"
root@uwspan.UUCP (John Plocher) (03/17/88)
+---- fortin@zap.UUCP (Denis Fortin) writes in <421@zap.UUCP> ---- | user=0xC7E | cs=0x208 ds=0x220 es=0x220 ss=0x213 di=0x400 si=0x5BE0 | bp=0x2C0 bx=0x7 dx=0xA1 cx=0x0 ax=0x7 ip=0x5807 flags=0x202 | trap type 0xD | err=0x210 | stack frame address = E830270 | Double panic: Software detects double fault | | I have also seen "user=0x10 ... err=0x8173". | | I guess my biggest problem is that I have really no way of knowing what | the register dump really means +---- The user= and the err= don't really tell you anything; the ones you are interested in are cs= and ip=. First you need a copy of the symbol table from the kernel. You get this by executing the following command: nm /system5 > system5.nm system5.mn is LARGE - several hundred K - so be sure you don't put it in (your small) root filesystem. It looks like this: Symbols from /system5: Name Value Class Type Size Line Section gdt.s | | file | | | | sludge |35656025|static| | | |.data tfsbot |35658780|static| | | |.data tfstack |35659804|static| | | |.data wnsbot |35659892|static| | | |.data wnstack |35660916|static| | | |.data conf.c | | file | | | | prfintr |33554432|extern| int( )| 6| |.text emul_present |33554438|extern| int( )| 11| |.text sioscan |33554449|extern| long( )| 12| |.text lomem.c | | file | | | | linesw.c | | file | | | | buffers.c | | file | | | | ... (but your numbers WILL be different than these :-) Then you get the CS:IP address from the above register dump: | cs=0x208 | ip=0x5807 and form them into a full address: 02085807 At this point you need to find out where the panic happened: vi system5.mn - start the editor /020858 - search for the MSBs of the address -- NOTE that the LSBs are not looked for (Most Significant Bits, Least Significant Bits) - There may be a few places where this search succeedes, look for the place that is the closest one with a value SMALLER than the address calculated above. eg. in choosing between 02085800 and 02085810, you would choose the first one. This is because you want to find the name of the routine which was executing when the panic happened, not the name of the one just after it in memory. At this point you should be able to find a class text address near the one calculated from CS and IP above. NOTE that the variable "sioscan" is located near 33554440, as is "emul_present" and "prfintr". We can also tell from this symbol table that these routines can be found in the file conf.c. - Hope this isn't too confusing, John -- Comp.Unix.Microport is now unmoderated! Use at your own risk :-)
david@bdt.UUCP (David Beckemeyer) (03/18/88)
In article <421@zap.UUCP> fortin@zap.UUCP (Denis Fortin) writes: >I know that this is a bit cryptic, but none of my requests for help from >Microport on this issue (even when my SysV/AT was still under warranty) >have yielded any result. (In most cases, I was told that the info was >transfered to someone else ... who never got back to me.) Typical of my experiences with uport too. I've tried calling. I've tried "official" bug-reports. I've tried everything short of driving down there and pounding on the front desk until somebody reacts. It is very frustrating to try to deal with uport. They seem very unorganized. I have the two drive bug, among others. We will be developing our products for SCO Xenix, me thinks. -- David Beckemeyer | "To understand ranch lingo all yuh Beckemeyer Development Tools | have to do is to know in advance what 478 Santa Clara Ave, Oakland, CA 94610 | the other feller means an' then pay UUCP: ...!ihnp4!hoptoad!bdt!david | no attention to what he says"
caf@omen.UUCP (Chuck Forsberg WA7KGX) (03/19/88)
In article <126@hawkmoon.MN.ORG> det@hawkmoon.MN.ORG (Derek E. Terveer) writes:
: Power? I had my pc plugged into an outlet with one
:of those little-itsy-bitsy noise filters that plug into a two outlet wall
:receptacle and provide three somewhat filterd outlets. I also had my stereo
:and a lamp plugged into the same outlet - considering the house is 40+ years
:old and they didn't use electricity back then (:-) i only have two outlets in
:my room! I decided to test whether or not the stereo and lamp were somehow
:dirtying the power to my pc and moved them to another outlet, courtesy of a
:long extension cord. I ran for a couple of days -- no problems.
Unless your voltage is rather low, a good computer supply should be able
to operate properly even when the light(s) visibly dim. Furthermore,
the power supply should stop processing and reset the computer if it
does run out of stored energy.
So, I should suspect either a marginal power supply or a marginal EMC
(ElectroMagnetic Compatibility) problem that was fixed by rearranging
the grounds et al.
(EMC is the reverse of Radio Frequency Interference (RFI). EMC relates
to the ability of external noise to "talk" to the computer.) Poor EMC
is what sells anti static treatments for DP rooms.
Chuck Forsberg WA7KGX ...!tektronix!reed!omen!caf
Author of YMODEM, ZMODEM, Professional-YAM, ZCOMM, and DSZ
Omen Technology Inc "The High Reliability Software"
17505-V NW Sauvie IS RD Portland OR 97231 503-621-3406
TeleGodzilla BBS: 621-3746 CIS: 70007,2304 Genie: CAF