[comp.arch] Controller registers vs. Speculative Execution

hutch@fps.com (Jim Hutchison) (01/17/91)

I've been hearing a bit about new processors which do "speculative execution".
That is execution of branchs based on executing both paths or guessing at
which way the branch will go.  Without addressing the viability of speculative
execution, I am curious about how controller registers are addressed.

If it write a value to a control register on a controller, something may happen.
If it read a control register, something may happen.

What happens with these?  I could make guesses, but I might guess at something
covered in a "non-disclosure" that someone else FPS already knows, so I won't.

--
-
Jim Hutchison		{dcdwest,ucbvax}!ucsd!fps!hutch
Disclaimer:  I am not an official spokesman for FPS computing

djohnson@beowulf.ucsd.edu (Darin Johnson) (01/17/91)

In article <14829@celit.fps.com> hutch@fps.com (Jim Hutchison) writes:
>I've been hearing a bit about new processors which do "speculative execution".
>...
>If it write a value to a control register on a controller, something may
> happen. If it read a control register, something may happen.

IO has a tendency of always intruding itself into nice clean theories.
The major problem is that the semantics of memory change completely
when using mapped IO.  For example, two reads of the same location
without a write to that location in between may possibly result in
different values being read.  This same sort of problem occurs when
designing caches (esp.  with programmable IO controllers such as
channels).  The problem isn't as great with RISC-style CPU's that can
place an instruction in the delay slot after a branch, since the
compiler can choose a 'safe' instruction.

There are several solutions or techniques that limit the problem, I
won't list them all (even if I knew them all).

First, is to use explicit IO instructions, and to stall speculative
branch execution when one of these is reached.  Also, IO can be mapped
into an address space that the CPU knows about (high order bit set),
and stall the same way.

Second, many of these machines are designed to be attached processors,
with IO only to the frontend or IO ports (context switching can really
bog things down...).  So this isn't a major problem.

Third, devices could be controlled via an IO processor, which handles all
the gory details.

Other solutions possible...

>Jim Hutchison		{dcdwest,ucbvax}!ucsd!fps!hutch
>Disclaimer:  I am not an official spokesman for FPS computing

-- 
Darin Johnson
djohnson@ucsd.edu

prener@arnor.uucp (01/18/91)

Typically, such speculative execution stops when it reaches a point
where possibly visible side-effects might occur.

                               Dan Prener
                               (prener @ ibm.com)

mhjohn@aspen.IAG.HP.COM (Mark Johnson) (01/19/91)

In comp.arch, lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:

    As to what the CPU does about it .. the main requirement is that
    uncached speculative reads not be done. (There's no such thing as a
    speculative write, because in general you don't know the prior
    contents of the memory location, and hence, there is no reasonable
    way to undo the write.)

There certainly can be such a thing as speculative writes.  Any design
that includes a write buffer can also speculatively execute a write.
The buffer would not dump the information to memory until the
speculative operation was committed.  If the speculative operation is
not needed, the write is never done.

Speculative execution would not have to stall on writes, which occur
with about a 1 in 7 frequency on the architecture I last looked at.

Write buffers are a common way of integrating a very fast execution
unit to a slower main memory.  The one's I am familiar with were
designed to be completely software transparent.  They are a handy
place to accommodate unaligned writes, gather sequential writes into
a wider write, merging direct I/O with programmed
writes, etc.  

wright@Stardent.COM (David Wright) (01/21/91)

In article <14829@celit.fps.com> hutch@fps.com (Jim Hutchison) writes:
>I've been hearing a bit about new processors which do "speculative execution".
>That is execution of branchs based on executing both paths or guessing at
>which way the branch will go.  Without addressing the viability of speculative
>execution...

Is this really a new idea?  I was under the impression that the IBM
360/91 did this.  Or is there some new wrinkle in the recent stuff?

  -- David Wright, not officially representing Stardent Computer Inc
     wright@stardent.com  or  uunet!stardent!wright

cet1@cl.cam.ac.uk (C.E. Thompson) (01/21/91)

In article <11625@pt.cs.cmu.edu> lindsay@gandalf.cs.cmu.edu (Donald Lindsay) writes:
>Yes, I was simplifying. For example, see "RISC System/6000 Hardware
>Overview":
>
>"...a five entry pending store queue and a four-entry store data
>queue in the FPU enable the FXU [integer unit] to execute floating-
>point store operations before the FPU produces the data. This allows
>the FXU to generate the address, initiate TLB or cache reload
>sequences, and check for data protection for a floating-point store
>instruction, and then continue executing the subsequent instructions
>without being held back by the FPU."
>
>Note that this machine isn't even advertised as a speculative-
>execution design - merely a parallel one. One wonders about the
>sequencing of FPU and MMU interrupts, and about how much more fun the
>design could be if some of those stores were conditional. And then
>there's the classical problem of matching the addresses of reads
>against the addresses with pending writes.

But this isn't speculative execution of writes at all! It is simply early
detection of exceptions: the FXU address calculation cycle happens first
and all possible (storage access) exceptions happen then. Thereafter
the store operation sits around in the PSQ until the FPU gets around
to delivering the data, but the store is absolutely guaranteed to complete.
(There are some details I haven't seen any documentation on, admittedly,
such as how the FXU makes sure that the required line is still in the  
cache later on, and hasn't been flushed by intermediate FXU storage accesses.)
In fact, the RS/6000 isn't advertised as a speculative-execution design
because it isn't one. Unless you count "conditional dispatching" as 
speculative excecution, which I certainly wouldn't.

Chris Thompson
JANET:    cet1@uk.ac.cam.phx
Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (01/23/91)

In article <1991Jan21.142422.17655@cl.cam.ac.uk>
	cet1@cl.cam.ac.uk (C.E. Thompson) writes:
>>"...a five entry pending store queue and a four-entry store data
>>queue in the FPU enable the FXU [integer unit] to execute floating-
>>point store operations before the FPU produces the data. This allows
>>the FXU to generate the address, initiate TLB or cache reload
>>sequences, and check for data protection for a floating-point store
>>instruction, and then continue executing the subsequent instructions
>>without being held back by the FPU."
>>
>>Note that this machine isn't even advertised as a speculative-
>>execution design - merely a parallel one. One wonders about the
>>sequencing of FPU and MMU interrupts, and about how much more fun the
>>design could be if some of those stores were conditional. And then
>>there's the classical problem of matching the addresses of reads
>>against the addresses with pending writes.
>
>But this isn't speculative execution of writes at all! 

Yes, that's what I said. The design issues raised by this machine
would be that much more difficult if some of the stores were
initiated speculatively, and could not be committed until a third
execution unit (the branch unit) signalled permission. For one thing,
one normally tries to do writes in order (hence, the RIOS uses
queues). But, if some writes were stalled on conditions, I can quite
imagine IBM adding queue-jumping logic.

>It is simply early detection of exceptions:

There's still the ordering issue. The FXU and FPU execute from queues
of issued instructions, and either may be ahead. If a particular
instruction is capable of causing two exceptions, which one is
raised?

>(There are some details I haven't seen any documentation on, admittedly,
>such as how the FXU makes sure that the required line is still in the  
>cache later on, and hasn't been flushed by intermediate FXU storage accesses.)

Interesting issue.
-- 
Don		D.C.Lindsay .. temporarily at Carnegie Mellon Robotics