[comp.arch] 386 pipelined addressing mode

goss@jetsun.WEITEK.COM (Richard Goss) (01/11/90)

I have heard in the past that running the 386 in pipelined addressing mode
degrades performance compared to running in non-pipelined mode with the 
appropriate wait states.  I heard it had something to do with the bus
interface unit starting a bus cycle when the execution unit needs to take a 
branch, jump, etc. and having to wait for this useless bus cycle to complete 
and then be flushed. Someone said the degradation was around 12% on average.
Could someone please confirm this.

Thank you.

dhinds@portia.Stanford.EDU (David Hinds) (01/20/90)

In article <858@jetsun.WEITEK.COM>, goss@jetsun.WEITEK.COM (Richard Goss) writes:
> I have heard in the past that running the 386 in pipelined addressing mode
> degrades performance compared to running in non-pipelined mode with the 
> appropriate wait states.  I heard it had something to do with the bus
> interface unit starting a bus cycle when the execution unit needs to take a 
> branch, jump, etc. and having to wait for this useless bus cycle to complete 
> and then be flushed. Someone said the degradation was around 12% on average.
> Could someone please confirm this.
> 
    The effect of pipelined addressing is that during the final cycle of a
memory access, the 80386 sets the address lines up for the next access.
Accessing memory takes several cycles; I think a read takes 2 cycles, and
a write takes 3 cycles, assuming that there are no wait states on memory.
If wait states are needed, they are present in both pipelined and non-
pipelined modes.  The difference is that in the pipelined mode, one cycle
of each memory access does "double duty", so that 1 wait state can be
effectively hidden, if the memory system can take advantage of the early
address information.  This is possible in interleaved memory systems,
where different banks of chips are being referenced by two overlapping
memory accesses.
    The question seems to be whether the code prefetch unit does a lot of
unnecessary memory accesses, which tie up the bus and occasionally block
operand accesses.  I THINK that this effect would be independent of the
status of address pipelining.  In any case, it should be very small -
nothing like a 12% performance loss - if this were true, why would they
go to the trouble of pipelining in the first place?  The prefetch unit
is fairly smart, and will not fetch dead instructions following a call or
unconditional jump.  It will fetch through a conditional jump, assuming
that these will usually not be taken, and will not fetch through loop
instructions, assuming that these jumps WILL usually be taken.  It should
rarely have to throw prefetched instructions away.

 - David Hinds
   dhinds@portia.stanford.edu