NED@YMIR.BITNET (Ned Freed) (01/17/88)
Recently I found a microcode bug on our VAX 8600. When we reported it to DEC they provided us with a microcode update that fixed the problem immediately. This bug is not especially esoteric, as the following example MACRO program shows: .entry start,^m<r2,r3> movl first,-(sp) movl second,r0 mulf2 r0,(sp) movl (sp)+,r0 ret first: .float 1.0e-14 second: .float 1.0e-27 .end start The operation performed by the program is quite simple. One small floating point value is loaded onto the stack and another is loaded into R0. These two values are then multiplied with the result stored on the stack. This operation will underflow so the result should be 0. This result is then popped off the stack into R0 and the program returns. The net result should then be a "NONAME-W-NOMSG, Message number 000000" message reported as the program exits. And yes indeed, the program does just this on the VAX-11/750, the uVAX-II and the VAX 8700. But not on our 8600. On our 8600 the program returns a status value of 1 and not 0! If you carefully single step the program in the debugger the reason for this will become clear -- for some reason the "mulf2 r0,(sp)" instruction DECREMENTS the stack pointer by 4. Thus you end up picking up some random value off the stack that turns out to be a 1. This problem has been verified on two different 8600s at different sites, both under DEC maintenance, so don't assume that YOUR microcode is up to date. Our 8600 recently had a whole slew of hardware problems and almost every part of it was replaced and checked, but the microcode was not updated until I found this problem. Here are a few additional technical points: (1) Almost ANY change to the program will cause the problem to go away. For example, everything works fine if you add a "nop" just after the "mulf2", or remove the "movl (sp)+,r0", or do almost anything else. (2) The problem appears not to be sensitive to the floating point values involved; anything that causes an underflow will cause the problem. The program works properly if the multiply does not underflow. (3) The problem does not appear to exist when using floating point types other than F_floating. (4) Despite the clear indications that the error has something to do with the handling of floating underflow in the 8600 pipeline, the error does manifest itself even when single stepping in the debugger. I think this is especially strange. (5) I have not tried this program on an 8650, and I would be very interested to find out if this problem exists on that CPU. In fact, I would appreciate receiving reports of the results people get when they run this program on their systems, regardless of what type of CPU they have. This hardware error has been plaguing our local software for more than two years, causing a whole series of access violations and divide by zero errors. I have been looking for the cause off and on for quite a while, but it just didn't occur to me that a microcode bug could be to blame! I am somewhat upset that DEC knew about this problem and didn't see fit to distribute a fix for it. It is quite conceivable that this problem could manifest itself in such a way that a program would report no obvious errors but would return erroneous results. Ned Freed ned@ymir.bitnet
gkn@SDS.SDSC.EDU (Gerard K. Newman) (01/17/88)
From: Ned Freed <NED%YMIR.BITNET@CUNYVM.CUNY.EDU> Subject: Microcode problem on 8600 processors Date: Sat, 16 Jan 88 21:35 PST [long and lucid description of problem omitted ... gkn] (5) I have not tried this program on an 8650, and I would be very interested to find out if this problem exists on that CPU. In fact, I would appreciate receiving reports of the results people get when they run this program on their systems, regardless of what type of CPU they have. I just ran it on an 8650 and it produced the correct results. Regards, gkn ---------------------------------------- Internet: GKN@SDS.SDSC.EDU Bitnet: GKN@SDSC Span: SDSC::GKN (27.1) USPS: Gerard K. Newman San Diego Supercomputer Center P.O. Box 85608 San Diego, CA 92138-5608 AT&T: 619.534.5076
levy@ttrdc.UUCP (Daniel R. Levy) (01/19/88)
In article <8801170735.AA23357@ucbvax.Berkeley.EDU>, NED@YMIR.BITNET (Ned Freed) writes: #> .entry start,^m<r2,r3> #> #> movl first,-(sp) #> movl second,r0 #> mulf2 r0,(sp) #> movl (sp)+,r0 #> ret #> #> first: .float 1.0e-14 #> second: .float 1.0e-27 #> #> .end start #> #> The net result should then be a #> "NONAME-W-NOMSG, Message number 000000" message reported as the program exits. #> #> (5) I have not tried this program on an 8650, and I would be very interested #> to find out if this problem exists on that CPU. In fact, I would appreciate #> receiving reports of the results people get when they run this program on #> their systems, regardless of what type of CPU they have. This works fine on our 8650 under VMS 4.5. -- |------------Dan Levy------------| Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa, | an Engihacker @ | <most AT&T machines>}!ttrdc!ttrda!levy | AT&T Computer Systems Division | Disclaimer? Huh? What disclaimer??? |--------Skokie, Illinois--------|
ZWARTS@HGRRUG51.BITNET (01/19/88)
> (5) I have not tried this program on an 8650, and I would be very interested > to find out if this problem exists on that CPU. In fact, I would > appreciate receiving reports of the results people get when they run > this program on their systems, regardless of what type of CPU they have. I have tried it on our Vax-8300, VAXstation 2000 and MicroVax I. On all these machines it runs correctly. F. Zwarts Phone: (+31)50-633619 Kernfysisch Versneller Instituut Bitnet/Earn: ZWARTS@HGRRUG51 Zernikelaan 25 Surfnet: KVIANA::ZWARTS 9747 AA Groningen Telefax: (+31)50-634003 The Netherlands Telex: 53410 rugro nl
SYSTEM@CRNLNS.BITNET (01/20/88)
Ned, Both of our 8600s run your microcode bug detector correctly, returning a 0. We have been running our current microcode for quite a while (many months). I don't want to stir up any trouble, but I think you need to talk to your DEC field service management about bringing your systems' FCO's up to date in a timely fashion, including installing new console software when it becomes available. The file "NOTICE.NEW" on our console packs claims that it is for both 8600s and 8650s and that it includes CI microcode rev 7.0. This last may have a bearing. Since our systems are clustered, I made a point of telling our Field Service rep that VMS v4's release notes mentioned the desirability of using CI rev 7. I hope this helps. Selden E. Ball, Jr. (Wilson Lab's network and system manager) Cornell University NYNEX: +1-607-255-0688 Laboratory of Nuclear Studies BITNET: SYSTEM@CRNLNS Wilson Synchrotron Lab Internet: SYSTEM%CRNLNS.BITNET@CUNYVM.CUNY.EDU Judd Falls & Dryden Road HEPnet/SPAN: LNS61::SYSTEM = 44283::SYSTEM Ithaca, NY, USA 14853