[comp.unix.i386] need help with FATAL error in unix 3.2u

rk@bigbroth.UUCP (rohan kelley) (08/15/90)

Problems with unix 3.2u BellTech (Interactive) installation in
Gateway2000-25 cache system.

Error message:
FATAL:Parity error on the motherboard
PANIC:Parity error address unknown
     Trying to dump 1024 pages (etc)

The kernal debugger automatically loads.  The message is different each
time the sytem crashes.  The following is an example of one message:

NMI debugger entered from df_dstack +300048f7
   EAX        EBX      ECK       EDX       ESI       EDI       EBP      ESP 
00000060 0000033a  00000130  000084ff  00000000  0000e83d  0000e7f6 e0000e68

   CS       SS        DS        ES        FS        GS       EPI      EFL
000002bc 000202bc  e0000000  00000000  00000000  00000000  000048f7 00020246

(Unfortunately, I don't know enough to use the debugger to go in and
examine the kmem file to see what actually happened.)

During other crashes, the debugger typically enters from a much lower stack
number, for example, 00000006, although the number is not consistent
from crash to crash.

HELP:

I'm stumped and the tech at Gateway2000 is stumped, although they claim
to have unix running on 3 of their boxes in house, presumably running
their network.

I'm trying to send 2 kids off to college with these systems up and
running.  If I can't solve this problem pretty fast I'll have to regurn
the systems to Gateway (and pay the freight) and my 30 day return
window is fast closing.

Any help would be sincerely appreciated.  Please Email or call collect
if you have any solutions.

Comment: 

The system software is version 3.2u BellTech (now intel) which is a
vanilla interactive port.  The "u" upgrade among other things repaired
the ESDI driver so it now works consistently.

This same software is running happily on my intel 302 25mh cached Phoenix 
bios machine with a large ESDI drive and on a noname motherboard with an AMI 
bios, cached, and an MFM small drive. Locus merge 386 is also installed on
all machines.
	
Inducing condition:

Crash occurs when accessing the floppy drive (either 0 or 1) but only
at intermittent times. Commands current have been cpio and format.  If
the command begins to function normally, it will terminate
normally.  For example, using the "installpkg" command on the C 
development set of 4 disks, ran normally, but immediately after, trying
to format a high density floppy failed. 

Hardware configuration:

Micronics motherboard with intel 80306DX-25 and 80385 cache controller
64K cache on motherboard
4Mb memory in 4 1-MB simms on motherboard
Phoenix Bios
Microscience 5100 110 Mb ESDI drive with Ultra 12(F) cached controller. 
    (for 2 floppy and 1 hard disk)
ATI SVGA video board with CrystalScan monitor
absent 80387
no network or LAN installed. Currently running as stand-alone.

System configuration:

Disk formatted, partition 1 27 Mb dos, partition 2 (balance) unix.
   (dos partition empty - no system or files loaded)
Disk controller jumpered to set Bios address at C800:0
System board switch set NOT to relocate video bios into ram
System board switch set NOT to relocate system bios into ram

(Unsuccessful) attempts to correct problem:

1. Disable, alternatively, and then collectively, the disk controller
   cache and the motherboard cache.

2. Load up an identically configured system (I ordered 2) to determine
   if there is a hardware malfunction.  No change in the problem.

3. Jumper the motherboard to reset the floppy I/O port to its secondary
   address (370-377) from primary at (3f0-3f7).  Bios advised of 
   incorrect setup on boot.

I note in my intel 302 manual, at secton 3.7.9, it reads:

"3.7.9 UNIX MODE

Difference between a UNIX operating system and a non-UNIX operating
sytem require a corresponding change in extended mmemory mapping.
Non-UNIX operating systems such as DOS or OS/2, require the BIOS to be
mapped to the upper part of the 16M address spaece.  Even if the sytem
memory exceeds 16M, the memory addresses from 15.5M to 16M will be
reserved for the BIOS. 

A UNIX operating sytem has no such requirement and so all extended
memory is available.  As shown on Table 3-13, jumper pins E37 through
E39 determine which operating system is enabled."

Any help or suggestions would be sincerely appreciated

Thanx much

=======================================================================
Rohan Kelley -- UNIleX Systems, Inc. (Systems and software for lawyers)
UUCP:  ...{gatech!uflorida,ucf-cs}!novavax!bigbroth!rk (office)
                                   novavax!mdlbrotr!rk (home)
ATTmail:  attmail!bigbroth!rk
3365 Galt Ocean Drive, Ft. Lauderdale, FL 33308 Phone: (305) 563-1504

"Go first class or your heirs will" -somebodyelse
=======================================================================

bogatko@lzga.ATT.COM (George Bogatko) (08/15/90)

Sounds to me like memory chip problems.  If you have memory diagnostics,
run them.  If you don't, then if you can remove the chips, move the
chips in the low addresses to those in high address's, and vice-versa, and
see if your problems go away.  If they do, then memory chips was indeed
your problem. Now you'll have to determine which ones. 

If not chips, then ?????

GB

aab@cichlid.com (Andrew A. Burgess) (08/16/90)

In article <601@bigbroth.UUCP> rk@bigbroth.UUCP (rohan kelley) writes:
>Problems with unix 3.2u BellTech (Interactive) installation in
>Gateway2000-25 cache system.
>
>Error message:
>FATAL:Parity error on the motherboard
>PANIC:Parity error address unknown
>     Trying to dump 1024 pages (etc)
...
>Crash occurs when accessing the floppy drive (either 0 or 1) but only
>at intermittent times. Commands current have been cpio and format.  If
...

I had a similar problem with an old AMI motherboard once. I first noticed that
it would drop about one byte per million when reading from the tape drive. 
Repeating the read was successful. Note that there were no error messages
from the tape drive -- it thought the tape had read correctly. I only became
aware of a problem when doing a compare of disk to tape after a backup.

Then I noticed that reading floppys had a similar problem. The only common
denominator I could think of was that both subsystems used the motherboard
DMA controller to transfer data. So I created a little test under DOS.
I made a 500Kb file of random data and put two copies on a 1.2MB floppy.
I then ran the DOS comp (file compare) program endlessly. The files would
miscompare in a few minutes. Eventually the program would crash. My guess was
that this one bye in a million was not just vanishing but instead being
written to a 'random' location.

This could be your problem. If so then a replacement motherboard would
solve it (assuming you have a marginal component somewhere rather than
a bad motherboard design).

If the dealer is willing to swap and maybe give you another week or so to
test, you might get lucky. Then again this is a WILD GUESS!

You might also try writing a test program like mine.

Good luck
Andy

-- 
Andy Burgess
Independent Consultant
uunet!silma!cichlid!aab
aab@cichlid.com