aglew@mcdurb.Urbana.Gould.COM (02/06/89)
I thought this was worth noting in this newsgroup: From Electronic Design, Dec 8, 1988 (paraphrased, without permission): Fujitsu is implementing a SPARC chip with Harvard architecture, split I and D buses, 64 bits wide bus (1 or 2? I dunno) --> Pipeline interlocks in the chip's circuits eliminate timing --> problems that might result when software written for earlier --> Sparc CPUs is running. --- In case you haven't guessed, one of my pet peeves is people who say that interlocks are no longer necessary because the compiler will handle them. Until the advent of binary compilers for ABIs, interlocks will be necessary. No two implementations of the same device will have the same timings. So, the question becomes: WHAT interlocks are appropriate? Delayed branches are a fairly common type of interlock avoidance technique - but does the need to interlock on the second delay slot, in the latest MIPS machines, cost any more than the lack of need of an interlock in the first? (I suppose that the timing is relaxed, but what about complexity). Machine designers: which interlocks are the most complex, the most likely to be timing critical? Compiler writers: which are the hardest interlocks to optimize around? Performance Analysts: which interlocks are exercised most frequently? What do they cost? (the MIPS pixie data has been useful here) Technology prophets: what ways is the technology likely to go? Longer or shorter branch delays? Memory latencies? --- You don't want to paint yourself into a corner by not having interlocks, then to find that this increases the complexity of a multiple issue implementation of your architecture. Andy "Krazy" Glew aglew@urbana.mcd.mot.com uunet!uiucdcs!mcdurb!aglew Motorola Microcomputer Division, Champaign-Urbana Design Center 1101 E. University, Urbana, Illinois 61801, USA. My opinions are my own, and are not the opinions of my employer, or any other organisation. I indicate my company only so that the reader may account for any possible bias I may have towards our products.
pohua@uicsgva.csg.uiuc.edu (02/10/89)
Is there a good algorithm for scheduling 'negative' dependencies without backtracking, when delayed branch is used? For example, A : r3 = r4 + r5 B : br (r1>9) The operation A can be moved into operation B's delay slot space. Thus, the dependence graph constructed from the above code will indicate a 'negative' control-dependence distance from A to B. Many interesting deterministic scheduling schemes which I have read about model a program by its dependence graph. Then the interlocking problem is transformed into one which finds correct initiation times for all the operations. Because all types of dependencies (data/control/...) can be expressed as some inequality equations, I don't think any one of these dependencies is more difficult than others to be handled. Of course, the scheduling problem is made more difficult by adding resource constraints and increasing pipeline length. But I still believe that software interlocking is a feasible approach. pohua
aglew@mcdurb.Urbana.Gould.COM (02/11/89)
Don't get me wrong - I think that SW interlocking is a perfectly valid approach for compilers and architectures. Commercially, however, it suffers the problem that two implementations of the same instruction set might have different needs for interlocking. So, either software handles the worst case interlocking for both architectures, or you have hardware interlocking in one of them, or you require recompilation or binary transaltion to move between the implementations. Neither of recompilation or binary translation is commercially practical at the moment, and I'd guess not for the next few years, so I'd like to look at points in the spectrum lying between TOTALLY HARDWARE INTERLOCKED ... Some HW I'L, some SW I'L ... TOTALLY SOFTWARE INTERLOCKED ON the mixed points you have options like Cydra's memory latency register, etc.