[comp.arch] Fujitsu SPARC Interlocks

aglew@mcdurb.Urbana.Gould.COM (02/06/89)

I thought this was worth noting in this newsgroup:

From Electronic Design, Dec 8, 1988 (paraphrased, without permission):

Fujitsu is implementing a SPARC chip with Harvard architecture,
split I and D buses, 64 bits wide bus (1 or 2? I dunno)

--> Pipeline interlocks in the chip's circuits eliminate timing
--> problems that might result when software written for earlier
--> Sparc CPUs is running.

---

In case you haven't guessed, one of my pet peeves is people who say
that interlocks are no longer necessary because the compiler
will handle them.
    Until the advent of binary compilers for ABIs, interlocks
will be necessary. No two implementations of the same device
will have the same timings.

So, the question becomes: WHAT interlocks are appropriate?
Delayed branches are a fairly common type of interlock avoidance
technique - but does the need to interlock on the second delay
slot, in the latest MIPS machines, cost any more than the lack
of need of an interlock in the first? (I suppose that the timing
is relaxed, but what about complexity).

Machine designers: which interlocks are the most complex, 
the most likely to be timing critical?

Compiler writers: which are the hardest interlocks to optimize around?

Performance Analysts: which interlocks are exercised most frequently?
What do they cost? (the MIPS pixie data has been useful here)

Technology prophets: what ways is the technology likely to go?
Longer or shorter branch delays? Memory latencies?

---

You don't want to paint yourself into a corner by not having interlocks,
then to find that this increases the complexity of a multiple 
issue implementation of your architecture.



Andy "Krazy" Glew   aglew@urbana.mcd.mot.com   uunet!uiucdcs!mcdurb!aglew
   Motorola Microcomputer Division, Champaign-Urbana Design Center
	   1101 E. University, Urbana, Illinois 61801, USA.
   
My opinions are my own, and are not the opinions of my employer, or
any other organisation. I indicate my company only so that the reader
may account for any possible bias I may have towards our products.

pohua@uicsgva.csg.uiuc.edu (02/10/89)

Is there a good algorithm for scheduling 'negative' dependencies
without backtracking, when delayed branch is used?

For example,
	A : 	r3 = r4 + r5
	B :	br (r1>9)

The operation A can be moved into operation B's delay slot space.
Thus, the dependence graph constructed from the above code will
indicate a 'negative' control-dependence distance from A to B.

Many interesting deterministic scheduling schemes which I have read
about model a program by its dependence graph. Then the interlocking
problem is transformed into one which finds correct initiation times
for all the operations. Because all types of dependencies (data/control/...)
can be expressed as some inequality equations, I don't think any
one of these dependencies is more difficult than others to be handled.

Of course, the scheduling problem is made more difficult by
adding resource constraints and increasing pipeline length.
But I still believe that software interlocking is a feasible approach.

pohua

aglew@mcdurb.Urbana.Gould.COM (02/11/89)

Don't get me wrong - I think that SW interlocking is a perfectly valid
approach for compilers and architectures.

Commercially, however, it suffers the problem that two implementations
of the same instruction set might have different needs for interlocking.
So, either software handles the worst case interlocking for both
architectures, or you have hardware interlocking in one of them,
or you require recompilation or binary transaltion to move between the
implementations.
    Neither of recompilation or binary translation is commercially practical
at the moment, and I'd guess not for the next few years, so I'd like to
look at points in the spectrum lying between

    TOTALLY HARDWARE INTERLOCKED
    ...
    Some HW I'L, some SW I'L
    ...
    TOTALLY SOFTWARE INTERLOCKED

ON the mixed points you have options like Cydra's
memory latency register, etc.