david@daisy.UUCP (David Schachter) (02/25/89)
The claim has been made recently that I caches are good but D caches are hard to use, because there is little locality of reference in data access. Um, how about constructing the D cache such that a program can give it a hint? "Hey, D cache, I intend to access data starting from location x, w bytes at a time, stride factor y, for z times." The D cache could then intelligently look ahead. How about it? Also, is there any use in splitting your D cache into, say, supervisor/user areas? I doubt it... -- David "Oh yeah? Yeah. Oh." Schachter / ...!ucbvax!imagen!atari--\ david@daisy.uucp OR + ...!uunet-----------------!daisy!david \ ...!pyramid--------------/
w-colinp@microsoft.UUCP (Colin Plumb) (02/27/89)
david@daisy.UUCP (David Schachter) wrote: > Um, how about constructing the D cache such that a program > can give it a hint? "Hey, D cache, I intend to access data > starting from location x, w bytes at a time, stride factor > y, for z times." > > The D cache could then intelligently look ahead. How about > it? See: "The WM Computer Architecture," Wm. A. Wulf, Computer Architecture News (SIGARCH newsletter) Vol. 16, No. 1, March 1988. From what I've heard of the CDC 6600, it uses a similar idea. To load or store, an instruction computes the address, and an access to r0 uses or supplies the data. For a load, you must specify an address before reading r0, but for a store, you can do things in either order. FIFOs actually allow you to get a few words ahead on the address or data side. I suppose that filling one FIFO while the other is empty causes a trap. The nifty thing is the streaming instructions, which allow you to specify a series of loads or stores as a (base, stride, number) triple. In the paper, two registers are provided for this purpose, but the concept can apply to any number. I think it's a great idea for a vector pipe, given the memory latency/CPU speed ratios we have these days. And it doesn't constrain the processor to produce or eat values at any fixed rate. In general, it's a fun paper. I disagree with some of the ideas (there's no provision for things like division which are popular, but take multiple cycles), but there are a lot of good ones in there. He has two ALU's, and instructions are of the form dest = src3 op (src2 op src1). Any comments on that one? -- -Colin (uunet!microsoft!w-colinp) "Don't listen to me. I never do."
jesup@cbmvax.UUCP (Randell Jesup) (02/28/89)
In article <2756@daisy.UUCP> david@daisy.UUCP (David Schachter) writes: >Um, how about constructing the D cache such that a program >can give it a hint? "Hey, D cache, I intend to access data >starting from location x, w bytes at a time, stride factor >y, for z times." > >The D cache could then intelligently look ahead. How about >it? Hit the nail #1 on the head. Who's up for nail #2? (actually, no need to tell it how many times you're going to look, just tell it starting here, offset so much, and maybe how many bytes accessed at each spot - note that there are other optimizations that you can do IF the processor lets you get at some useful bits of internal state). -- Randell Jesup, Commodore Engineering {uunet|rutgers|allegra}!cbmvax!jesup