mark@bruce.oz (Mark Goodwin) (08/30/88)
Installing SR9.7 on our DN580-TURBO bombed out with a message indicating the CPU board needs an ECO (Engineering Change Order). According to our local Apollo reps (who kindly removed the FPA and all was fine) we need to send our CPU board to Scotland - has anyone suffered similar problems? The Apollo rep reckons (that's Australian for "says") it is possible to run with the FPA installed under SR9.7, but it's risky and may cause disk crashes. So I'd like to know a few things before we send off our CPU. a) Can I bring up SR9.7 with the FPA installed? Will it work if I load the FPA ucode manually from the Mnemonic Debugger before booting Aegis? b) Has this problem been fixed in SR10, or will we still need the ECO? c) Is the Apollo software cartridge labelled "FPX-FIX" a software solution? Any info very much appreciated ... -------------- Mark Goodwin |ACSnet: mark@bruce.oz CSO, Dept. Computer Science| UUCP: seismo!munnari!bruce.oz!mark Monash University, Clayton | ARPA: mark%bruce.oz@seismo.css.gov AUSTRALIA 3168. Please note: our machine has recently changed from moncsbruce to bruce.
krowitz@RICHTER.MIT.EDU (David Krowitz) (09/01/88)
From the SR9.7 release notes: Software Release 9.7 1.2.1 Disked DN5xx-T with Floating Point Accelerator Compa- tibility With SR9.7, we provide a floating point compiler option (the -cpu fpx switch) that makes use of a memory mapped interface to the Floating Point Accelerator (FPX). Running in this manner on a DN5xx-T with ST506 86-megabyte or 190-megabyte disks has uncovered a latent hardware bug which can result in user data corruption. Specifically, the CPU-FPX communications via the memory mapped interface coinciding with disk DMA activity which needs to invalidate specific entries in the cache, can hold the VMEbus in an incomplete bus acquisition cycle for up to 64 microseconds. The bus arbitor on the CPU will timeout in less than 64 microseconds, causing a rearbitration to occur. If the disk had been granted the bus prior to the re- arbitration, and the ring comes along with a bus request, the re-arbitration will grant the bus to the ring. Thus, when the bus is finally released from the cache invalida- tion, both the ring and disk can clash on the bus, causing data corruption. To avoid data corruption, we have created a trap in the Aegis software that prevents the possibility of bringing up systems with the corrupting problem. If you install SR9.7 software in the damaged configuration, you will be unable to boot the system. The Aegis operating system will give you a crash status of 1b0005 and issue the following message: Unable to boot this software release with the current CPU board revision level. Running with an FPX without the required CPU board ECO can result in data corruption. Please place a service call through the normal channels and request ECO # 14078 for your CPU PCB. If you encounter this crash on your node, you can tem- porarily work around the problem, until Apollo Customer Ser- vice installs the ECO, by booting diskless from another node running SR9.7 or another release. There will be no danger of data corruption while booted diskless. DO NOT subse- quently mount the local volume once you have booted the node diskless. Once the ECO is installed, its installation can be checked by inspecting the hardware revision returned by netstat -config, which should have a value of "1.06". In this case, the number "1" to the left of the decimal point indicates that the ECO is installed. -- David Krowitz krowitz@richter.mit.edu (18.83.0.109) krowitz%richter@eddie.mit.edu krowitz%richter@athena.mit.edu krowitz%richter.mit.edu@mitvma.bitnet (in order of decreasing preference)