[comp.sys.apollo] SR9.7 on a DN580-T with a floating point accelerator

mark@bruce.oz (Mark Goodwin) (08/30/88)

Installing SR9.7 on our DN580-TURBO bombed out with a message indicating
the CPU board needs an ECO (Engineering Change Order). According to our
local Apollo reps (who kindly removed the FPA and all was fine) we need
to send our CPU board to Scotland - has anyone suffered similar problems?

The Apollo rep reckons (that's Australian for "says") it is possible to
run with the FPA installed under SR9.7, but it's risky and may cause
disk crashes. So I'd like to know a few things before we send off our CPU.

a) Can I bring up SR9.7 with the FPA installed? Will it work if I load the
FPA ucode manually from the Mnemonic Debugger before booting Aegis?

b) Has this problem been fixed in SR10, or will we still need the ECO?

c) Is the Apollo software cartridge labelled "FPX-FIX" a software solution?

Any info very much appreciated ...

--------------
Mark Goodwin               |ACSnet:    mark@bruce.oz
CSO, Dept. Computer Science|  UUCP:    seismo!munnari!bruce.oz!mark
Monash University, Clayton |  ARPA:    mark%bruce.oz@seismo.css.gov
AUSTRALIA 3168.

Please note: our machine has recently changed from moncsbruce to bruce.

krowitz@RICHTER.MIT.EDU (David Krowitz) (09/01/88)

From the SR9.7 release notes:



                                    Software Release 9.7



                1.2.1  Disked DN5xx-T with Floating Point Accelerator Compa-
                       tibility

                With SR9.7, we provide a floating point compiler option (the
                -cpu fpx switch) that makes use of a memory mapped interface
                to the Floating Point Accelerator (FPX). Running in this
                manner on a DN5xx-T with ST506 86-megabyte or 190-megabyte
                disks has uncovered a latent hardware bug which can result
                in user data corruption.

                Specifically, the CPU-FPX communications via the memory
                mapped interface coinciding with disk DMA activity which
                needs to invalidate specific entries in the cache, can hold
                the VMEbus in an incomplete bus acquisition cycle for up to
                64 microseconds.  The bus arbitor on the CPU will timeout in
                less than 64 microseconds, causing a rearbitration to occur.
                If the disk had been granted the bus prior to the re-
                arbitration, and the ring comes along with a bus request,
                the re-arbitration will grant the bus to the ring. Thus,
                when the bus is finally released from the cache invalida-
                tion, both the ring and disk can clash on the bus, causing
                data corruption.

                To avoid data corruption, we have created a trap in the
                Aegis software that prevents the possibility of bringing up
                systems with the corrupting problem.  If you install SR9.7
                software in the damaged configuration, you will be unable to
                boot the system. The Aegis operating system will give you a
                crash status of 1b0005 and issue the following message:

                     Unable to boot this software release with the current
                     CPU  board  revision  level.   Running  with  an  FPX
                     without the required CPU board ECO can result in data
                     corruption.
                     Please place a service call through the normal channels
                     and request ECO # 14078 for your CPU PCB.

                If you encounter this crash on your node, you can tem-
                porarily work around the problem, until Apollo Customer Ser-
                vice installs the ECO, by booting diskless from another node
                running SR9.7 or another release.  There will be no danger
                of data corruption while booted diskless.  DO NOT subse-
                quently mount the local volume once you have booted the node
                diskless.

                Once the ECO is installed, its installation can be checked
                by inspecting the hardware revision returned by netstat
                -config, which should have a value of "1.06".  In this case,
                the number "1" to the left of the decimal point indicates
                that the ECO is installed.







 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter@eddie.mit.edu
krowitz%richter@athena.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)