[net.micro.68k] Pre-fetch

reno@bunker.UUCP (Jim Reno) (07/09/85)

We ran into an interesting problem here that I haven't seen mentioned
anywhere previously.

The symptoms were that under certain circumstances the system would
hang due to a memory fault while in the kernel. The system we use
is 68000 based with a 4-segment memory management unit. While in
supervisor state the MMU is essentially disabled and the processor
has access to all of physical memory.

The kernel happens to relocate some parts of itself during initialization.
It turned out that a routine for a driver was being relocated to the
absolute end of physical memory, such that the last two bytes of
memory contained an RTS (return from subroutine). When this
instruction was executed the fault occurred because the 68K prefetches
2 to 4 bytes ahead of where it's executing. The prefetch was into
nonexistent memory, hence the external logic produced the fault.
This is a classic problem with pipelined systems.

The fix for the kernel was simple - just ensure a few bytes of unused
padding at the end of physical memory.

However, the problem exists for user-mode programs as well. Suppose
you have a shared text program where the code is exactly some multiple
of the basic block size used by the MMU (1k on our system). Further
suppose that your kernel allocates exactly that amount of memory.
If the processor prefetches, the MMU will fault (and the
program dump core) when the very last instruction is executed.
The MMU, of course, has no way of knowing that the processor would
never have actually used those bytes.

Non-shared text programs don't have the problem, since there is usually data
and/or stack above the code.

There are a number of solutions. The loader could always pad shared text
images by a few bytes. Perhaps a better solution is to have the exec
code in the kernel check to see if the shared text segment is exactly a
multiple of the MMU block size, and allocate an extra block.

friesen@psivax.UUCP (Stanley Friesen) (07/11/85)

In article <891@bunker.UUCP> reno@bunker.UUCP (Jim Reno) writes:
>We ran into an interesting problem here that I haven't seen mentioned
>anywhere previously.
>
>The kernel happens to relocate some parts of itself during initialization.
>It turned out that a routine for a driver was being relocated to the
>absolute end of physical memory, such that the last two bytes of
>memory contained an RTS (return from subroutine). When this
>instruction was executed the fault occurred because the 68K prefetches
>2 to 4 bytes ahead of where it's executing. The prefetch was into
>nonexistent memory, hence the external logic produced the fault.
>This is a classic problem with pipelined systems.
>
>However, the problem exists for user-mode programs as well. Suppose
>you have a shared text program where the code is exactly some multiple
>of the basic block size used by the MMU (1k on our system). Further
>suppose that your kernel allocates exactly that amount of memory.
>If the processor prefetches, the MMU will fault (and the
>program dump core) when the very last instruction is executed.
>The MMU, of course, has no way of knowing that the processor would
>never have actually used those bytes.
>
>There are a number of solutions. The loader could always pad shared text
>images by a few bytes. Perhaps a better solution is to have the exec
>code in the kernel check to see if the shared text segment is exactly a
>multiple of the MMU block size, and allocate an extra block.

	Actually, I think a better solution would be to patch the
kernel to check if the fault occured on the last byte of memory and
simply *ignore* it the first time! This would avoid having to do
strange things to user programs to make them work. After all it is
the kernel which translates the fault into a signal.
-- 

				Sarima (Stanley Friesen)

{trwrb|allegra|cbosgd|hplabs|ihnp4|aero!uscvax!akgua}!sdcrdcf!psivax!friesen
or {ttdica|quad1|bellcore|scgvaxd}!psivax!friesen

bruce@stride.UUCP (Bruce Robertson) (07/11/85)

In article <891@bunker.UUCP> reno@bunker.UUCP (Jim Reno) writes:
>
>Suppose
>you have a shared text program where the code is exactly some multiple
>of the basic block size used by the MMU (1k on our system). Further
>suppose that your kernel allocates exactly that amount of memory.
>If the processor prefetches, the MMU will fault (and the
>program dump core) when the very last instruction is executed.
>The MMU, of course, has no way of knowing that the processor would
>never have actually used those bytes.

Actually, this shouldn't cause a problem on most systems, because the data
segment is allocated on the next page immediately following the text
segment, so the prefetch will come from the first few bytes of the data
segment.  If you have separated I/D on your 68000, though, you do have a
problem.

A similar problem is if you have your stack located at the very top of
memory, and you pop your registers from the stack with a MOVEM instruction.
This instruction prefetches from the data space, and will cause the same
problem.  Under UNIX, however, the problem is nicely avoided by the fact
that you have your arguments and environment sitting between the very top of
memory and the top of the stack.
-- 

	Bruce Robertson
	UUCP: {ucbvax!menlo70,seismo}!unr70!unrvax!stride!bruce

doug@terak.UUCP (Doug Pardee) (07/15/85)

According to the specs on the NS32016:

  The one exception to [the page fault interrupt] sequence occurs if
  the aborted bus cycle was on an instruction prefetch.  If so, it is
  not yet certain that the aborted prefetched code is to be executed.
  Instead of causing an interrupt, the CPU only aborts the bus cycle,
  and stops prefetching.  If the information in the Instruction Queue
  runs out, meaning that the instruction will actually be executed,
  the ABT interrupt will occur, in effect aborting the instruction
  which was being fetched.

[Edited version of original posting -- for the folks on net.micro.16k]

> The symptoms were that under certain circumstances the system would
> hang due to a memory fault while in the kernel. The system we use
> is 68000 based with a 4-segment memory management unit. While in
> supervisor state the MMU is essentially disabled and the processor
> has access to all of physical memory.
> 
> It turned out that a routine for a driver was being relocated to the
> absolute end of physical memory, such that the last two bytes of
> memory contained an RTS (return from subroutine). When this
> instruction was executed the fault occurred because the 68K prefetches
> 2 to 4 bytes ahead of where it's executing. The prefetch was into
> nonexistent memory, hence the external logic produced the fault.
> This is a classic problem with pipelined systems.
-- 
Doug Pardee -- Terak Corp. -- !{ihnp4,seismo,decvax}!noao!terak!doug
               ^^^^^--- soon to be CalComp

crp@stcvax.UUCP (Charlie Price) (07/16/85)

Instruction pre-fetch causing problems when the last
instruction is at the end of memory isn't new.
Back in the dark ages when I was learning assembly language
programming on a CDC 6400 we were warned about this problem
(pre-fetch also forces you to be careful for *shudder* self-modifying code.)

The 6000 series machines have base-and-limit-register memory management
and 60-bit words with 15 or 30 bit instructions.
After the first instruction of a word is executed, the machine prefetches
the next word into the instruction buffer.
If you are in the last word:  Bang, you're dead.

With the 6000's memory management scheme I suppose that compilers
"naturally" never cause this problem for compiled languages.

Given that this is a bad gotcha for a compiled language,
I don't think it would be an outrageous "feature" to have the linker
(whoever builds finished executables) on any system normally elminate this
problem by forcing some extra null words into the text if needed.

-- 
Charlie Price   {hao ihnp4 decvax}!stcvax!crp   (303) 673-5698
USnail:	Storage Technology Corp  -  MD 3T / Louisville, CO / 80028

thomson@uthub.UUCP (Brian Thomson) (07/18/85)

Yes, the 32016 does properly handle (i.e. ignore) page fault and
protection traps on prefetches, but it still isn't quite perfect.
We have seen a 32016 continue to prefetch after partially executing
an SVC instruction, such that it uses the old user-mode PC for
the prefetch address but does the accesses in system mode.
This can be a problem if the system-space address is, eg., a device
register or if the page is valid but mapped to nonexistent memory.
-- 
		    Brian Thomson,	    CSRI Univ. of Toronto
		    {linus,ihnp4,uw-beaver,floyd,utzoo}!utcsrgv!uthub!thomson