mark@bruce.oz (Mark Goodwin) (08/30/88)
Installing SR9.7 on our DN580-TURBO bombed out with a message indicating the CPU board needs an ECO (Engineering Change Order). According to our local Apollo reps (who kindly removed the FPA and all was fine) we need to send our CPU board to Scotland - has anyone suffered similar problems? The Apollo rep reckons (that's Australian for "says") it is possible to run with the FPA installed under SR9.7, but it's risky and may cause disk crashes. So I'd like to know a few things before we send off our CPU. a) Can I bring up SR9.7 with the FPA installed? Will it work if I load the FPA ucode manually from the Mnemonic Debugger before booting Aegis? b) Has this problem been fixed in SR10, or will we still need the ECO? c) Is the Apollo software cartridge labelled "FPX-FIX" a software solution? Any info very much appreciated ... -------------- Mark Goodwin |ACSnet: mark@bruce.oz CSO, Dept. Computer Science| UUCP: seismo!munnari!bruce.oz!mark Monash University, Clayton | ARPA: mark%bruce.oz@seismo.css.gov AUSTRALIA 3168. Please note: our machine has recently changed from moncsbruce to bruce.
krowitz@RICHTER.MIT.EDU (David Krowitz) (09/01/88)
From the SR9.7 release notes:
Software Release 9.7
1.2.1 Disked DN5xx-T with Floating Point Accelerator Compa-
tibility
With SR9.7, we provide a floating point compiler option (the
-cpu fpx switch) that makes use of a memory mapped interface
to the Floating Point Accelerator (FPX). Running in this
manner on a DN5xx-T with ST506 86-megabyte or 190-megabyte
disks has uncovered a latent hardware bug which can result
in user data corruption.
Specifically, the CPU-FPX communications via the memory
mapped interface coinciding with disk DMA activity which
needs to invalidate specific entries in the cache, can hold
the VMEbus in an incomplete bus acquisition cycle for up to
64 microseconds. The bus arbitor on the CPU will timeout in
less than 64 microseconds, causing a rearbitration to occur.
If the disk had been granted the bus prior to the re-
arbitration, and the ring comes along with a bus request,
the re-arbitration will grant the bus to the ring. Thus,
when the bus is finally released from the cache invalida-
tion, both the ring and disk can clash on the bus, causing
data corruption.
To avoid data corruption, we have created a trap in the
Aegis software that prevents the possibility of bringing up
systems with the corrupting problem. If you install SR9.7
software in the damaged configuration, you will be unable to
boot the system. The Aegis operating system will give you a
crash status of 1b0005 and issue the following message:
Unable to boot this software release with the current
CPU board revision level. Running with an FPX
without the required CPU board ECO can result in data
corruption.
Please place a service call through the normal channels
and request ECO # 14078 for your CPU PCB.
If you encounter this crash on your node, you can tem-
porarily work around the problem, until Apollo Customer Ser-
vice installs the ECO, by booting diskless from another node
running SR9.7 or another release. There will be no danger
of data corruption while booted diskless. DO NOT subse-
quently mount the local volume once you have booted the node
diskless.
Once the ECO is installed, its installation can be checked
by inspecting the hardware revision returned by netstat
-config, which should have a value of "1.06". In this case,
the number "1" to the left of the decimal point indicates
that the ECO is installed.
-- David Krowitz
krowitz@richter.mit.edu (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)