tom@vcvax1.UUCP (04/18/86)
[ While experimenting with an asynchronous communication driver for VENIX (in protected mode) on the IBM PC/AT, I encountered some rather strange behavior that I now attribute to a bug in the Intel 80286 processor. In brief, I suspect that the "popf" instruction enables interrupts under certain circumstances even though the IF flag is 0 before the instruction is executed and set to 0 by the "popf" instruction itself. Because of the "popf" is so often used successfully, I would like to hear from others about whether they have encountered the same or a similar problem. If so, I would like to know how they programmed around it. My concern is that I'm either mistaken and there is no hardware bug or that I'm correct and the fix that I found is not applicable to all processor lots. The Evidence The problem expresses itself in the asynchronous driver by a loss of characters on OUTPUT. Only single characters are lost every now and then. The problem can express itself when kernel printf's indicate that no interrupts other than transmit interrupts are occurring. I have tried 2 different AT's with the same results. The reason for the character loss turns out to be due to a completion interrupt occurring while the tty startup routine is running. The startup routine disables interrupts and stuffs a character into the transmit buffer. But for some reason, the 80286 allows the COM port to interrupt causing the transmit interrupt routine to overwrite the character just put in the buffer. Kernel printf's triggered by sanity checks in the driver indicate the the instruction being interrupted is a "popf" and that the IF flag before and after the "popf" is 0. A number of obvious checks were done to rule out programmer error. For example, after a kernel was demonstrated to show the bug, I dumped the code segment of the running kernel and compared it to the object file. No difference. The Fix Since the source of the interrupt was always a particular "popf" (in the splx() routine), I concentrated on recoding the kernel where the "popf" occurs. To convince myself that the symptoms were not due to a bug in the loader, I recoded the kernel using "adb" on the object file and then booted the modified object file. Therefore, changes in behavior are directly correlated with code changes (and not with changes in linking the kernel, compilation, size of kernel, etc.). The following code sequences for the routine splx() were tried and FAILED. The first is the original: 1) pop cx | return address popf | new flags pushf | dummy arg jmp cx | return 2) pop cx nop, nop, nop, nop | Padding for timing. popf pushf jmp cx 3) pop cx popf push cx | Could "pushf" be messing up "popf" ? jmp cx 4) pop cx popf push cx push cx | Could "jmp cx" be messing up "popf"? ret 5) pop cx xor ax,ax | Dummy register access. popf push cx jmp cx 6) pop cx mov ax,#0 | Dummy memory access. popf pushf jmp cx Perhaps the reasons for testing the above code will be clear by the following coding of splx() that FIXES the problem: 7) pop cx pop ax push ax test ax,#0x200 | Don't use popf! bne Lsplx cli jmp cx Lsplx: sti jmp cx 8) pop cx pop ax and ax,#0x7FD5 | Mask off "don't care" bits. push ax popf pushf jmp cx 9) pop cx pop ax and ax,#0xFFFF | Does masking really work? push ax popf pushf jmp cx 10) pop cx pop ax | Is it the "and" that does it? push ax popf pushf jmp cx 11) mov bx,sp push *2(bx) popf ret Remember that the kernel was patched with each of the above codings of splx(). Those that worked, worked for as long as I watched them (several minutes at 9600 baud). Those that failed, failed every few seconds or so. The interrupted instruction was always the "popf". Tom Scott VenturCom, Inc. ..!seismo!harvard!cybvax0!vcvax1!tom
cck@cucca.UUCP (Charlie C. Kim) (04/21/86)
In article <179@vcvax1.UUCP> tom@vcvax1.UUCP (tom) writes: > ... >encountered some rather strange behavior that I now >attribute to a bug in the Intel 80286 processor. In >brief, I suspect that the "popf" instruction enables >interrupts under certain circumstances even though the >IF flag is 0 before the instruction is executed and set >to 0 by the "popf" instruction itself. > ... > >Tom Scott >VenturCom, Inc. >..!seismo!harvard!cybvax0!vcvax1!tom Believe it or not, this is a documented "feature" or misfeature of the 286 processor (in both protected and real modes). See page 9-6 of the IBM PC/AT Technical Reference or (I remember seeing it here) the appropriate Personal Computer Seminar Proceedings to find out the exact conditions under which it can occur, but it all has something to do with the condition "CPL <= IOPL" holding true (I honestly have no idea what they are talking about here. I suppose it is some internal chip condition). The documented workaround is: jmp L1 ; jump around iret L2: iret ; pop cs,ip, flags L1: push cs ; push cs call L2 ; call near .... ; program to continue here which would work for any 80* cpu. They suggest coding this as a macro which make sense. (I guess you could replace the call near with a call far and drop the push cs). I would probably have programmed it as: push cs ; save fake cs push L2 ; push return point (286 only) iret ; really just jump ahead one instr with popped flags L2: ... ; continuation point if it wasn't going to have to run on anything but a 286 or 186 (push immediate was new with these) so things at least look linear! I'm sure you can come up with a dozen different way to accomplish the same thing now that you know the problem exists. Charlie C. Kim User Services
maddox@renoir.berkeley.edu (William Maddox) (04/22/86)
In article <179@vcvax1.UUCP> tom@vcvax1.UUCP (tom) writes: > >While experimenting with an asynchronous communication driver >for VENIX (in protected mode) on the IBM PC/AT, I >encountered some rather strange behavior that I now >attribute to a bug in the Intel 80286 processor. In >brief, I suspect that the "popf" instruction enables >interrupts under certain circumstances even though the >IF flag is 0 before the instruction is executed and set >to 0 by the "popf" instruction itself. This is a known incompatibility between the 80286 and other members of the 8086 family. See pages 9-6 and 9-7 of the AT Technical Reference for details. I quote: If the system microprocessor executes a POPF instruction in either the real or the virtual address mode with CPL <= IOPL, then a pending maskable interrupt (the INTR pin active) may be improperly recognized after executing the POPF instruction even if maskable interrupts were disabled before the POPF instruction and the value popped had IF = 0. IBM suggests the following code sequence as a compatible replacement: JMP $+3 IRET PUSH CS CALL $-2 ---------------------------------------------------------- Bill Maddox ucbvax!renoir!maddox "Lisp programmers know the value of everything but the cost of nothing." - Alan Perlis
john@inthap.UUCP (John Casey) (04/23/86)
> > While experimenting with an asynchronous communication driver > for VENIX (in protected mode) on the IBM PC/AT, I > encountered some rather strange behavior that I now > attribute to a bug in the Intel 80286 processor. In > brief, I suspect that the "popf" instruction enables > interrupts under certain circumstances even though the > IF flag is 0 before the instruction is executed and set > to 0 by the "popf" instruction itself. Early 80286 chips (B step) did contain a problem like the one described here. These chips can be identified by the copyright notice marked on the chip. Chips bearing the notice: (C) INTEL '83 may suffer from the popf problem. Chips bearing a later copyright date (84 or beyond) do not suffer from this problem. While the B step 80286 has not been produced by Intel for some time they may exist in some older 80286 based machines such as early ATs. Since their are obviously machines out their with these older chips let me further describe both the problem and the workarounds. If a B step 80286 executes a POPF instruction while interrupts are disabled a pending maskable interrupt (INTR pin active) may be improperly recognized after executing the POPF instruction even if maskable interrupts were disabled before the POPF instruction and the value popped had IF=0. IF the interrupt is improperly recognized, it is processed correctly. The problem occurs in B step 80286s executing a POPF instruction while interrupts are disabled in either Real Address Mode, or in Protected Mode with CPL <= IOPL. Executing in Protected Mode with CPL <= IOPL implies that the processors Current Privilege Level (CPL) is numerically less than or equal to the IO Privilege Level (IOPL), that is, the currently executing code is privileged enough to enable/disable interrupts and do IO. The occurrence of this errata may be affected by the number of wait states during the data read bus cycle of the POPF, and by even or odd address alignment of the stack. Two wait states added to the memory read bus cycle will eliminate the problem. The problem can be avoided by replacing POPF with an alternate code sequence in code that may be susceptible to the problem. The original posting stated that the following code failed: > > 1) pop cx | return address > popf | new flags > pushf | dummy arg > jmp cx | return Recoding without using POPF produced the following working code: > > 7) pop cx > pop ax > push ax > test ax,#0x200 | Don't use popf! > bne Lsplx > cli > jmp cx > Lsplx: sti > jmp cx This illustrates one possible alternate code sequence for POPF which will work if AX need not be saved and if IF is the only flag of interest. Other more general alternatives to POPF are as follows: push cs push #popflags iret popflags: OR call popflags | must be a far call where the routine popflags is defined as follows popflags: iret NOTE: This problem exists only in older 80286s, and occurs only when a POPF is executed with interrupts disabled and results in interrupts remaining disabled. John Casey Intel Corporation (516) 231-3300 ...!philabs!ron1!polyof!inthap!john
jrc@hpcnof.UUCP (04/24/86)
You're right, the POPF fails. IBM publishes the fix in their "PC/AT Technical Reference Manual", Part Number 1502243. Jim Conrad ucbvax!hplabs!hpcnof!j_conrad