vrs@uoregon.UUCP (08/24/83)
#R:edcaad:-56600:uoregon:400001:000:213 uoregon!vrs Aug 22 17:32:00 1983 Is the value of MC750_TBPAR supposed to be two? Where is the value of this symbol defined? Our 4.1 system doesn't have any such symbol in any of the files in sys/h. ...!{tektronix,hp-pcd,hp-cvd}!uoregon!vrs
dave@edcaad.UUCP (08/26/83)
Thanks to extensive discussions with our helpful local DEC people, and several flaky /750s, I can add some details to the work of Peter Col- linson (ukc!pc), Tucker Withington (vaxine!ptw), and Dennis Ritchie (research!dmr) on the /750's machine check 2 handler in 4.?BSD: 1. If the TB PARITY ERROR bit in the stored Error Summary Register is set (mcf->mc5_mcesr&4), and irrespective of the state of the other bits in this register, recovery may be attempted. We have seen these errors with bits 0 and 3 set. 2. It appears that the TB must be invalidated, by mtpr(TBIA, 0), as soon as possible, and in any case before the Error Summary Regis- ter is cleared by mtpr(MCESR, 0xf). 3. It is NOT always possible to recover from these errors. An instruction may be resumed if: a) It has not affected the processor mode. This can be deter- mined by comparing the processor mode in the machine check frame with the mode in the interrupt frame. A panic must be issued if they differ. b) If the instruction is single-byte, and its op-code has a one bit in the following table: 0000111101101011 REI,RET,etc. 1111111110111111 JSB 1111111111111111 1111111111111111 1111111111111111 1111111111111111 0000000000101111 EMODF,CVTFD,etc. 0000111100000000 Double Prec. FP 1100000101001010 EMUL,EDIV,etc. 1111111111111111 1111111111111111 1111111111111111 0000001111111111 PUSHR,POPR,etc. 1111111111111111 1111111111111111 1111111111111111 0000000111111111 CALLG,CALLS,etc. Further, VMS disables the cache if cache errors happen less than 100ms apart, and disables half the Translation Buffer and uses the other half if it detects failures less than 100ms apart. Code to implement all these features for 4.1c BSD has been written; when it has been tested it will be posted. Unfortunately, testing is a matter of sitting and waiting for the hardware. In the meantime, here are the fixes to /usr/sys/vax/machdep.c to improve machine check handl- ing. 1. The error messages for the different machine check types for the /750 should read as follows: char *mc750[] = { 0, "ctrl str par", "cp tbuf", 0, 0, 0, "ucode lost", "bad ird" }; 2. The 750's case in the first switch in machinecheck() should look like: #if VAX750 case VAX_750: printf("%s fault\n", mc750[type&0x7]); break; #endif 3. The /750's case in the second switch in machinecheck() should be: #if VAX750 case VAX_750: { register struct mc750frame *mcf = (struct mc750frame *)cmcf; mtpr(TBIA, 0); /* Assume bad - ala VMS */ printf("\tva %x errpc %x mdr %x smr %x rdtimo %x tbgpar %x cacherr %x\n", mcf->mc5_va, mcf->mc5_errpc, mcf->mc5_mdr, mcf->mc5_svmode, mcf->mc5_rdtimo, mcf->mc5_tbgpar, mcf->mc5_cacherr); printf("\tbuserr %x mcesr %x pc %x psl %x mcsr %x\n", mcf->mc5_buserr, mcf->mc5_mcesr, mcf->mc5_pc, mcf->mc5_psl, mfpr(MCSR)); mtpr(MCESR, 0xf); if ((type&0xf)==MC750_TBPAR && (mcf->mc5_mcesr&0x4) && ResumeableInstr(mcf)) { printf("tbuf par!?!: flushing and returning\n"); return; } break; } #endif 4. The following routine should be added to machdep.c #if VAX750 static u_short InstrBitMap[] = { 0x0f6b, 0xffbf, 0xffff, 0xffff, 0xffff, 0x002f, 0x0f00, 0xc18a, 0xffff, 0xffff, 0xffff, 0x03ff, 0xffff, 0xffff, 0xffff, 0x01ff }; static int ResumeableInstr(mcf) register struct mc750frame *mcf; { register u_int OpCode; register u_int ret; /* * If instruction changed mode cannot resume * (this part untested) */ if ((mcf->mc5_svmode)&03 != (mcf->mc5_psl&PSL_CURMOD)>>24) { printf("CP mode changed\n"); return (0); } /* * VMS has the process mapped in to the system's * address space. Don't think UNIX does. * (this part tested) */ OpCode = ( mcf->mc5_errpc&0x80000000 ? *((char *) mcf->mc5_errpc) : fubyte(mcf->mc5_errpc) ); ret = ((InstrBitMap[(OpCode&0xf0)>>4])>>(OpCode&0xf))&1; printf("Instruction %x %s resumable\n", OpCode, (ret ? "" : "not")); return (ret); } #endif VAX750 David Rosenthal {vax135|mcvax}!edcaad!dave
dave@edcaad.UUCP (08/26/83)
Whoops, I posted the wrong version of the article. The test for matching processor modes should read: if ((mcf->mc5_svmode)&03 != (mcf->mc5_psl&PSL_CURMOD)>>24) { Many apologies if you got the wrong one - David Rosenthal