m5d@bobkat.UUCP (Mike McNally ) (11/20/86)
Perhaps I am giving away a valuable idea, but I won't flatter myself that this could be the case. Various architectures (the x86 architecture in particular) involve a mechanism whereby the CPU is able to fetch code during unused/ unusable bus cycles. In the x86 series, the prefetch is dumb (less dumb on the 2- and 386 than the original chips, but dumb nevertheless). The prefetch queue is kept small (although I was surprised at the size (16 bytes) of the queue in the 386) because it was found that the point of diminishing returns was encountered at about 6 bytes. The obvious problem is that branch instructions do not cooperate with the idea of prefetching. Attempts to follow different paths beyond a branch seem like Koyaanisqatsi solutions to me. A compiler with a minimal amount of smarts can detect runs of code which can be guaranteed to execute in sequence. Since the compiler is generating the code, why not provide information on the size of the upcoming run of certainly sequential instructions? This could easily be provided in the 86 architecture as a special tag byte, leaving maybe four or five bits to tell how many instructions can safely be prefetched. The advantages to such a system are even greater given a burst-mode bus cycle. Do any existing architectures incorporate such a scheme? Seems pretty obvious to me. Are there any problems that I can't see? Could this be as significant as the development of multitronics ("an entirely new approach...") (note my username)? -- **** **** **** At Digital Lynx, we're almost in Garland, but not quite **** **** **** Mike McNally Digital Lynx Inc. Software (not hardware) Person Dallas TX 75243 uucp: ...convex!ctvax!bobkat!m5 (214) 238-7474
dan@prairie.UUCP (Daniel M. Frank) (11/21/86)
In article <242@bobkat.UUCP> m5d@bobkat.UUCP (Mr Mike McNally) writes: >The obvious problem is that branch instructions do not cooperate >with the idea of prefetching. Attempts to follow different paths >beyond a branch seem like Koyaanisqatsi solutions to me. Many pipelined machines (for about the last 20 years) have had two prefetch queues, and fetched down both branch paths for a while until the branch was resolved. This is a good solution because there may be several instructions ahead of the branch in the pipeline, so you have no idea whether the branch is going to be taken or not. Also, prefetch queues are seldom that long, so the case where a branch is followed in the prefetch by another branch is infrequent enough not to be costly (usually it just stalls fetching for a while). The idea has been broached (and I think used in some experimental processors) of a "branch usually" or "branch seldom" instruction, in which the compiler indicates to the processor which branch is most likely to be taken. If branch predictions are relatively accurate, you only pay for the branch in the occasional case when it goes against the prediction. The question of how to make accurate predictions is a current research topic. Of course, this is all complicated when the branch target address is being computed when the branch instruction hits the decode pipeline step. This usually happens when the target address is in a register. Instruction fetch may have to come to a halt until the interlock can be resolved. This problem can be reduced somewhat by careful coding. One processor under development at the UW actually EXECUTES down both branch paths (I won't say any more about that, as a paper is in the works by the folks developing it). -- Dan Frank uucp: ... uwvax!prairie!dan arpa: dan%caseus@spool.wisc.edu
crowl@rochester.ARPA (Lawrence Crowl) (11/21/86)
In article <242@bobkat.UUCP> m5d@bobkat.UUCP (Mr Mike McNally) writes: >Various architectures involve a mechanism whereby the CPU is able to fetch >code during unused/unusable bus cycles. In the x86 series, the prefetch >is dumb. The prefetch queue is kept small because it was found that the >point of diminishing returns was encountered at about 6 bytes. The >obvious problem is that branch instructions do not cooperate with the >idea of prefetching. Attempts to follow different paths beyond a branch >seem like Koyaanisqatsi solutions to me. >A compiler with a minimal amount of smarts can detect runs of code which >can be guaranteed to execute in sequence. Since the compiler is >generating the code, why not provide information on the size of the >upcoming run of certainly sequential instructions? ... The advantages >to such a system are even greater given a burst-mode bus cycle. >Mike McNally The sequence approach still does not help in prefetching code past the branch point before actually finding out which way the branch will go. I suspect a variable length burst mode will be no more effective than a fixed burst mode which is always applied on instruction fetch. One possible solution to this problem is to have the branch instructions themselves indicate which branch is more likely. The machine could then prefetch the most likely sequence, and most often it will already have the instructions it needs. When the branch taken is the least common, the prefetch queue must be flushed and re-filled. The RISC approach is to have the instructions after the branch executed reguardless if the branch is taken or not. For a small pipe or prefetch queue, the compiler can usually fill this "delay slot" with a useful instruction. Though this represents a problem if the pipe or prefetch queue gets to large. -- Lawrence Crowl 716-275-5766 University of Rochester crowl@rochester.arpa Computer Science Department ...!{allegra,decvax,seismo}!rochester!crowl Rochester, New York, 14627