root@blender.UUCP (Herb Peyerl) (09/10/89)
Several weeks ago I asked the net about a problem I had been having regarding a "General Protection Trap" kernel panic. I had intended to follow up on majority advice next time the problem cropped up. However, the problem seems to have disappeared. It hasn't happened for over two weeks now and the system's been busier than ever. It's a strange one but here's a summary of the help I've received. A big thanx to Chip Rosenthal, Jim Morton, and Ragnar Paulsen for offering help. It is very much appreciated. ************************************** From: chip@vector.dallas.tx.us (Chip Rosenthal) The key thing is the IP value, so you can find out where the thing was when it crashed (via adb). Hopefully, it is within a device driver, so you can then easily go off and do some finger pointing. Unfortunately, if it's outside a device driver it's pretty tough to run this one down... If you get any good suggestions, you might want to summarize to the net. ************************************** From: ubc-cs!uunet!applix!jim (Jim Morton) Sounds like a memory problem (Is this machine a Wyse?). Try turning the machine's speed down or switching to 1 wait state if possible. Note: the diagnostics from most vendors will NOT catch these problems. Wyse machines, and other vendors who use banks of 8 memory chips instead of 9 have "non-parity memory" which means when you get a failure, the machine crashes. *************************************** From: Ragnar Paulson <wilma!ragnar@uunet.UU.NET> Herb, In your posting you ask: > > [Inclusion of my posting removed for brevity] > Yep. In particular you should record the cs:ip values. A general protection trap is caused by a kernel write to an invalid location. Often this location is 0:0. It is my contention that user software, no matter how badly written should not cause a kernel panic. So this problem is a bug in some kernel routine. Almost always the bug is in one of your add on peripherals, such as the AT-Vantage. A panic can also be caused by a hardware failure in such a peripheral. Then the hardware may cause the controlling software to get an invalid pointer. I suppose really robust software could even handle this, but that often proves to be very difficult. Seeing as you have not recently added new software, it may be that you have a hardware problem. If you have a development system, you can use adb to find out where the Panic is occuring. For example, if the cs:ip value is 18:3567 then use adb as follows: adb /xenix * $x * 18:3567?ia siointr+45 mov [bx], 0 * $q Adb prompts with "*". After typing in the adress?ia, adb will print out the routine address and instruction that the address corresponds to. In this example, it is the interrupt service routine of the "sio" driver. Of course any kernel I have is different from the one you have. IF you don't have the developement system, or the above exercise yields something useless like copyseg(), then you will just have to remove peripherals one at a time until the problem goes away. Once you have identified the source of the problem, then call the manufacturer and get them to fix it. -- UUCP: herb@blender.UUCP || ...calgary!xenlink!blender!{herb||root} ICBM: 51 03 N / 114 05 W "The other day, I...... No wait... That wasn't me!" <Steven Wright>