cbush@RAND-UNIX.ARPA (08/09/85)
"WARNING probe instructions may be hazardous to the health of your system." >>>THANKS: This is a thanks to all on quickly solving are problem. No new crashes YET!! Got 6 replies which all hit it on the nose or very close to it. Restating the problem and solution may fill in the story for some. I'm somewhat suprised that at the very least a warning comment in not included in locore.s if the bug has been know so so long. >>>PROBLEM: Need help with persistent, at least one a day, system crashes on our new VAX 11/785 running 4.2BSD UNIX. Always get the same panic messages; trap type 9, code = 80001400, pc = 80001400 panic: Protection fault My reading of the above and from looking at the kernal stack frames says, in summary, the system was attempting to execute the instruction at location 80001400 while in user mode!! >>>SOLUTION: ( Courtesy of Chris Torek <chris@maryland> ) Sounds like you've managed to invoke the 780/785 CPU bug with prefetch. The probe instruction works by changing the CPU microstate to user mode. If a prefetch crosses a page boundary you can get a trap. The "fix" is to insert enough noops to push the probe ... in locore.s ... down past the start of a new page. If that works, let us all know . . I'm still puzzeled by the difference in rate of occurence on the 785 (~once/day) as compared to 780 (~once a month) but am leaving it as an exercise for now.
ggs@ulysses.UUCP (Griff Smith) (08/12/85)
> > The "fix" is to insert enough noops to push the probe ... in locore.s ... > down past the start of a new page. If that works, let us all know . . > No! The fix is to insert a ".space n" directive immediately before the function in locore.s that is causing the problem. Noops take time! The probe is in the middle of a commonly called system function, don't give it lead shoes. -- Griff Smith AT&T Bell Laboratories, Murray Hill Phone: (201) 582-7736 Internet: ggs@ulysses.uucp UUCP: ulysses!ggs ( {allegra|ihnp4}!ulysses!ggs )
chris@umcp-cs.UUCP (Chris Torek) (08/13/85)
>No! The fix is to insert a ".space n" directive immediately before >the function in locore.s that is causing the problem. Yes, this is better, but is also more difficult to apply. It's much harder to push down, e.g, the probew in vax/trap.c this way. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
ggs@ulysses.UUCP (Griff Smith) (08/15/85)
> >No! The fix is to insert a ".space n" directive immediately before > >the function in locore.s that is causing the problem. > > Yes, this is better, but is also more difficult to apply. It's ^^^^^^^^^^^^^^^^^^^^^^^? > much harder to push down, e.g, the probew in vax/trap.c this way. > Huh? The code in question is . . clrl 12(r2) ret _Copyin: .globl _Copyin # <<<massaged for jsb by asm.sed>>> movl 12(sp),r0 # copy length blss ersb movl 4(sp),r1 # copy user address cmpl $NBPG,r0 # probing one page or less ? bgeq cishort # yes ciloop: prober $3,$NBPG,(r1) # bytes accessible ? beql ersb # no addl2 $NBPG,r1 # incr user address ptr acbl $NBPG+1,$-NBPG,r0,ciloop # reduce count and loop cishort: prober $3,r0,(r1) # bytes accessible ? beql ersb # no . . Change it to the following: . . clrl 12(r2) ret .space 50 # kludge to move probers to next page _Copyin: .globl _Copyin # <<<massaged for jsb by asm.sed>>> movl 12(sp),r0 # copy length blss ersb movl 4(sp),r1 # copy user address cmpl $NBPG,r0 # probing one page or less ? bgeq cishort # yes ciloop: prober $3,$NBPG,(r1) # bytes accessible ? beql ersb # no addl2 $NBPG,r1 # incr user address ptr acbl $NBPG+1,$-NBPG,r0,ciloop # reduce count and loop cishort: prober $3,r0,(r1) # bytes accessible ? beql ersb # no . . There is nothing difficult about this. References to probes in C code are another problem; you probably WILL need to pad with nop's. So what? Nothing says you have to use the same solution in both languages. When casting spells in assembly, use the tools that are available. -- Griff Smith AT&T Bell Laboratories, Murray Hill Phone: (201) 582-7736 Internet: ggs@ulysses.uucp UUCP: ulysses!ggs ( {allegra|ihnp4}!ulysses!ggs )
chris@umcp-cs.UUCP (Chris Torek) (08/15/85)
>>>No! The fix is to insert a ".space n" directive immediately before >>>the function in locore.s that is causing the problem. >> Yes, this is better, but is also more difficult to apply. It's > ^^^^^^^^^^^^^^^^^^^^^^^? >There is nothing difficult about this. References to probes in C code >are another problem; you probably WILL need to pad with nop's. So what? >Nothing says you have to use the same solution in both languages. That is what I meant. I considered different phrasing, but thought that was the shortest that conveyed what I meant. I guess it did, but just barely. So here's the long version: Yes, it's better in this case to use a .space directive to push the probe down, as that saves CPU time (as you pointed out). However, if you encounter the same problem later with one of the probe instructions which is embedded within the C code, it will be much more difficult (though not impossible) to use .space or other magic to move the probe instruction. Using nop's may be inefficient, but it is easy to implement in every case in which a solution to the probe bug must be applied, therefore I present it as the general solution. How's that? :-) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland