[net.unix-wizards] Pre-fetch

reno@bunker.UUCP (Jim Reno) (07/09/85)

We ran into an interesting problem here that I haven't seen mentioned
anywhere previously.

The symptoms were that under certain circumstances the system would
hang due to a memory fault while in the kernel. The system we use
is 68000 based with a 4-segment memory management unit. While in
supervisor state the MMU is essentially disabled and the processor
has access to all of physical memory.

The kernel happens to relocate some parts of itself during initialization.
It turned out that a routine for a driver was being relocated to the
absolute end of physical memory, such that the last two bytes of
memory contained an RTS (return from subroutine). When this
instruction was executed the fault occurred because the 68K prefetches
2 to 4 bytes ahead of where it's executing. The prefetch was into
nonexistent memory, hence the external logic produced the fault.
This is a classic problem with pipelined systems.

The fix for the kernel was simple - just ensure a few bytes of unused
padding at the end of physical memory.

However, the problem exists for user-mode programs as well. Suppose
you have a shared text program where the code is exactly some multiple
of the basic block size used by the MMU (1k on our system). Further
suppose that your kernel allocates exactly that amount of memory.
If the processor prefetches, the MMU will fault (and the
program dump core) when the very last instruction is executed.
The MMU, of course, has no way of knowing that the processor would
never have actually used those bytes.

Non-shared text programs don't have the problem, since there is usually data
and/or stack above the code.

There are a number of solutions. The loader could always pad shared text
images by a few bytes. Perhaps a better solution is to have the exec
code in the kernel check to see if the shared text segment is exactly a
multiple of the MMU block size, and allocate an extra block.

friesen@psivax.UUCP (Stanley Friesen) (07/11/85)

In article <891@bunker.UUCP> reno@bunker.UUCP (Jim Reno) writes:
>We ran into an interesting problem here that I haven't seen mentioned
>anywhere previously.
>
>The kernel happens to relocate some parts of itself during initialization.
>It turned out that a routine for a driver was being relocated to the
>absolute end of physical memory, such that the last two bytes of
>memory contained an RTS (return from subroutine). When this
>instruction was executed the fault occurred because the 68K prefetches
>2 to 4 bytes ahead of where it's executing. The prefetch was into
>nonexistent memory, hence the external logic produced the fault.
>This is a classic problem with pipelined systems.
>
>However, the problem exists for user-mode programs as well. Suppose
>you have a shared text program where the code is exactly some multiple
>of the basic block size used by the MMU (1k on our system). Further
>suppose that your kernel allocates exactly that amount of memory.
>If the processor prefetches, the MMU will fault (and the
>program dump core) when the very last instruction is executed.
>The MMU, of course, has no way of knowing that the processor would
>never have actually used those bytes.
>
>There are a number of solutions. The loader could always pad shared text
>images by a few bytes. Perhaps a better solution is to have the exec
>code in the kernel check to see if the shared text segment is exactly a
>multiple of the MMU block size, and allocate an extra block.

	Actually, I think a better solution would be to patch the
kernel to check if the fault occured on the last byte of memory and
simply *ignore* it the first time! This would avoid having to do
strange things to user programs to make them work. After all it is
the kernel which translates the fault into a signal.
-- 

				Sarima (Stanley Friesen)

{trwrb|allegra|cbosgd|hplabs|ihnp4|aero!uscvax!akgua}!sdcrdcf!psivax!friesen
or {ttdica|quad1|bellcore|scgvaxd}!psivax!friesen

bruce@stride.UUCP (Bruce Robertson) (07/11/85)

In article <891@bunker.UUCP> reno@bunker.UUCP (Jim Reno) writes:
>
>Suppose
>you have a shared text program where the code is exactly some multiple
>of the basic block size used by the MMU (1k on our system). Further
>suppose that your kernel allocates exactly that amount of memory.
>If the processor prefetches, the MMU will fault (and the
>program dump core) when the very last instruction is executed.
>The MMU, of course, has no way of knowing that the processor would
>never have actually used those bytes.

Actually, this shouldn't cause a problem on most systems, because the data
segment is allocated on the next page immediately following the text
segment, so the prefetch will come from the first few bytes of the data
segment.  If you have separated I/D on your 68000, though, you do have a
problem.

A similar problem is if you have your stack located at the very top of
memory, and you pop your registers from the stack with a MOVEM instruction.
This instruction prefetches from the data space, and will cause the same
problem.  Under UNIX, however, the problem is nicely avoided by the fact
that you have your arguments and environment sitting between the very top of
memory and the top of the stack.
-- 

	Bruce Robertson
	UUCP: {ucbvax!menlo70,seismo}!unr70!unrvax!stride!bruce

chris@umcp-cs.UUCP (Chris Torek) (07/15/85)

Speaking of prefetch bugs, how about the 780 bug that causes a
page fault if you execute a probe instruction too close to a page
boundary so that the instruction buffer fill crosses the boundary
while the micro pc state is set to user mode?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

crp@stcvax.UUCP (Charlie Price) (07/16/85)

Instruction pre-fetch causing problems when the last
instruction is at the end of memory isn't new.
Back in the dark ages when I was learning assembly language
programming on a CDC 6400 we were warned about this problem
(pre-fetch also forces you to be careful for *shudder* self-modifying code.)

The 6000 series machines have base-and-limit-register memory management
and 60-bit words with 15 or 30 bit instructions.
After the first instruction of a word is executed, the machine prefetches
the next word into the instruction buffer.
If you are in the last word:  Bang, you're dead.

With the 6000's memory management scheme I suppose that compilers
"naturally" never cause this problem for compiled languages.

Given that this is a bad gotcha for a compiled language,
I don't think it would be an outrageous "feature" to have the linker
(whoever builds finished executables) on any system normally elminate this
problem by forcing some extra null words into the text if needed.

-- 
Charlie Price   {hao ihnp4 decvax}!stcvax!crp   (303) 673-5698
USnail:	Storage Technology Corp  -  MD 3T / Louisville, CO / 80028

jbn@wdl1.UUCP (11/20/85)

> Back in the dark ages when I was learning assembly language
> programming on a CDC 6400 we were warned about this problem
> (pre-fetch also forces you to be careful for *shudder* self-modifying code.)
> 
> The 6000 series machines have base-and-limit-register memory management
> and 60-bit words with 15 or 30 bit instructions.
> After the first instruction of a word is executed, the machine prefetches
> the next word into the instruction buffer.
> If you are in the last word:  Bang, you're dead.

      It was worse than that; the CDC 6400 was the economy model; the CDC
6600 had TEN instruction look-ahead, and, reasonably enough, one could
get a protection fault by executing code within ten words of the memory
lockout limits.  Worse, this effect was intermittent, because the machine
was asynchronous; sometimes the jump instruction would cut off the lookahead
before it reached the memory limits, and sometimes the lookahead would get
there first; results varied from CPU to CPU, depending on exact wire lengths
and gate delays.  Repeatability was never a strong point of the CDC 6600.
With three million discrite components, (this was before ICs) the CDC 6600 
goes down in history as the all-time winner on parts count.
      Semour did learn from his experience with this monster; the Cray I
was a strictly synchronous machine, with parity checking (Semour used to
say ``parity is for farmers'', but learned better.)
      No, I never used one of these things; I was in junior high when it
came out, but one should have some sense of history.  It's worth realizing
that some of the many-register RISC machines are likely to have the same 
problems with the lookahead logic.

					John Nagle