[comp.arch] Prefetch instruction

gerry@zds-ux.UUCP (Gerry Gleason) (03/01/90)

In article <7488@pdn.paradyne.com> alan@oz.paradyne.com (Alan Lovejoy) writes:
<In article <BBC.90Feb25195510@legia.rice.edu< Benjamin Chase <bbc@rice.edu> writes:
<<alan@oz.paradyne.com writes:
<<>The point about
<<>vertical lines is that they always (unless they, or the pixels themselves,
<<>are very thick) fall within one or two words horizontally.

<<Yes, and ~1000 words vertically.  And these words are spaced at ~32
<<intervals.  Thus, when you draw a vertical line, you get a word of
<<screen memory, perform some operation on it to turn on or off ~1 bit
<<of that word, and then write it back to screen memory.  Then, you get
<<the next word, which is 32 words further along, and whoops, it's not
<<in the data cache, because your silly fetcher got the next 3
<<consecutive words of screen memory (which you won't be needing right
<<now because your vertical line is so skinny and all), rather than the
<<next 3 words spaced at 32 word offsets, which is what you really
<<wanted it to do if you were drawing a vertical line.

Actually this case of vertical lines (or anything) is a case where
wider cache lines can hurt, but there are ways effectively "prefetching"
cache lines as Alan points out.

<Yep. That's why one "loads" the word from screen memory before "load"-ing
<the word from the source bitmap, so that there are as many cycles as possible
<between the load instruction which fetches the destination word and the 
<first instruction which accesses that word.  This is where the 88k's multiple
<independent functional units, full pipelining and register scoreboarding
<really help.

Let me see if I understand this correctly.  For it to help to put the load
early, the instruction pipeline would have to be able to continue executing
as long as no instruction references the loaded register (this is what the
scoreboard does, indicates which registers are "valid").  Is this a correct
interpretation of what you said?

I can think of a simpler way to do this, with an instruction just like load
except it doesn't wait for the memory operation to complete or write a
register, its only effect would be to do a read to fill the cache line.  The
compiler could put these in wherever a cache miss is likely (could be based
on profile statistics).  Of course if my interpretation is correct, the 88k
can accomplish almost the same thing with its scoreboard; the compiler just
move the load itself up.  The only difference is when the register gets
allocated.

Gerry Gleason

preston@titan.rice.edu (Preston Briggs) (03/02/90)

In article <203@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes:
in reply to 
alan@oz.paradyne.com (Alan Lovejoy) and  Benjamin Chase <bbc@rice.edu>

>I can think of a simpler way to do this, with an instruction just like load
>except it doesn't wait for the memory operation to complete or write a
>register, its only effect would be to do a read to fill the cache line.  

Yes indeed.  That's just what Ben suggested.  I call it cache prefetching
and he called it "whispering gently in the processor's ear", but it
seems like a good idea either way.

Of course, we want the cache miss handling to occur asynchronously with
normal processing.  That is, don't freeze the processor just because there's
a reload happening.

>Of course if my interpretation is correct, the 88k
>can accomplish almost the same thing with its scoreboard; the compiler just
>move the load itself up.  The only difference is when the register gets
>allocated.

Right, assuming cache misses are handled like I want.
Of course, we suddenly see a need for lots of registers.
--
Preston Briggs				looking for the great leap forward
preston@titan.rice.edu

alan@oz.nm.paradyne.com (Alan Lovejoy) (03/02/90)

In article <203@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes:
<Let me see if I understand this correctly.  For it to help to put the load
<early, the instruction pipeline would have to be able to continue executing
<as long as no instruction references the loaded register (this is what the
<scoreboard does, indicates which registers are "valid").  Is this a correct
<interpretation of what you said?

Yes.  That's the main point of having a scoreboard, after all.

Notice how the relative importance of things depends on the environment as
a whole, as well as the tasks you wish to accomplish?! "It is impossible to
do just one thing."

____"Congress shall have the power to prohibit speech offensive to Congress"____
Alan Lovejoy; alan@pdn; 813-530-2211; AT&T Paradyne: 8550 Ulmerton, Largo, FL.
Disclaimer: I do not speak for AT&T Paradyne.  They do not speak for me. 
Mottos:  << Many are cold, but few are frozen. >>     << Frigido, ergo sum. >>