[comp.arch] Update to word addressing

bcase@amdcad.UUCP (04/15/87)

Well, I just got a mail message from one of our circuit designers.  While
I don't consider his comments "officially binding" or anything (by that I
mean he might later want to change his mind or modify what he said), I
presume they are reasonably well considered.  He said that by and large
John Mashey's comments were correct:  It isn't very costly in terms of
time to have the byte alignment network on the chip.  One implementation
technique is to have the byte alignment network control decoded as the
address leaves the chip so that when the data comes back there is only
a propagation delay, not a decode plus prop delay.

I can think of some issues which would still need to be resolved, but
they are probably resolvable.  The effect on setup and hold times is my
biggest worry (strange words from someone considered to be a software
type!).  The point, though, is that I may have spoken a little strongly
and a little quickly.

I must say that I *still* feel comforable with our design.  I must also
say that these net discussions have been quite enlightening and have
humbled me just a bit (any more humble and I'll only be able to speak
in a mumble).

    bcase

bcase@amdcad.UUCP (04/15/87)

Ok, here is an update to the update.  This information is from our "main
man" circuit designer Dave Witt (the boss of the other circuit designer
whose comments I earlier recounted).  There isn't major dissagreement
here, just some clarification.

---------------------------
     Hi, brian

       Anyway, if what you are talking about is being able to mux
an arbitrary byte to/from a byte position to/from a byte positon via
load, my guess is that this would have an impact on performance
for us, but that this would be very architecturally and speed path
dependent.  On the 29000, because we do direct forwarding of loads,
the impact of allowing arbitrary multiplexing of a byte location to
any byte location of a register would cause an extra 1.5-2.0 nanoseconds
of setup on the delay from address to data valid.  (This is ignoring
the increased hardware associated with providing access to any byte
at each byte location in the datainput latch, and the selective byte
drive/tristate on our internal buses in the data input latch and the
register file).  This is because we use dynamic buses, which require
the data stable before the drive clock, and also that the picket [that
is a half clock cycle, ED] that we transfer the data-input latch to
the alu/shifter we currently use all that time for data transfer and
setting up the control signals for the funnel shift/alu/prioritizer.  
     I'll say that there is obviously increased complexity
associated with allowing byte loads,  that the effect of whether there
is a net effect on the performance of the processor is very dependent
on the internal pipe and associated internal architecture/speed paths,
and in the case of the 29000 if we were to have implemented this feature
it would have effected our address/data valid setup time.  This may
not be a major problem on other chips, but when
you are trying for 25-40mhz with associated external memory systems
and caches, then there is no more critical item to a processor
than giving the channel as much of the cycle time as possible.

                 David Witt
---------------------------------
Well, just thought that the net might find this interesting.  I guess
the thing to realize is that it is difficult to consider the effects
of a single feature separately.  If we had specified the alignment
network from the beginning, maybe our circuit guys would have found
a zero-time solution (they are pretty clever).  Over the phone, Dave
also worried that this circuitry might not scale, time wise, as well
as other stuff.  These are tough issues!

    bcase