bcase@amdcad.UUCP (04/15/87)
Well, I just got a mail message from one of our circuit designers. While I don't consider his comments "officially binding" or anything (by that I mean he might later want to change his mind or modify what he said), I presume they are reasonably well considered. He said that by and large John Mashey's comments were correct: It isn't very costly in terms of time to have the byte alignment network on the chip. One implementation technique is to have the byte alignment network control decoded as the address leaves the chip so that when the data comes back there is only a propagation delay, not a decode plus prop delay. I can think of some issues which would still need to be resolved, but they are probably resolvable. The effect on setup and hold times is my biggest worry (strange words from someone considered to be a software type!). The point, though, is that I may have spoken a little strongly and a little quickly. I must say that I *still* feel comforable with our design. I must also say that these net discussions have been quite enlightening and have humbled me just a bit (any more humble and I'll only be able to speak in a mumble). bcase
bcase@amdcad.UUCP (04/15/87)
Ok, here is an update to the update. This information is from our "main man" circuit designer Dave Witt (the boss of the other circuit designer whose comments I earlier recounted). There isn't major dissagreement here, just some clarification. --------------------------- Hi, brian Anyway, if what you are talking about is being able to mux an arbitrary byte to/from a byte position to/from a byte positon via load, my guess is that this would have an impact on performance for us, but that this would be very architecturally and speed path dependent. On the 29000, because we do direct forwarding of loads, the impact of allowing arbitrary multiplexing of a byte location to any byte location of a register would cause an extra 1.5-2.0 nanoseconds of setup on the delay from address to data valid. (This is ignoring the increased hardware associated with providing access to any byte at each byte location in the datainput latch, and the selective byte drive/tristate on our internal buses in the data input latch and the register file). This is because we use dynamic buses, which require the data stable before the drive clock, and also that the picket [that is a half clock cycle, ED] that we transfer the data-input latch to the alu/shifter we currently use all that time for data transfer and setting up the control signals for the funnel shift/alu/prioritizer. I'll say that there is obviously increased complexity associated with allowing byte loads, that the effect of whether there is a net effect on the performance of the processor is very dependent on the internal pipe and associated internal architecture/speed paths, and in the case of the 29000 if we were to have implemented this feature it would have effected our address/data valid setup time. This may not be a major problem on other chips, but when you are trying for 25-40mhz with associated external memory systems and caches, then there is no more critical item to a processor than giving the channel as much of the cycle time as possible. David Witt --------------------------------- Well, just thought that the net might find this interesting. I guess the thing to realize is that it is difficult to consider the effects of a single feature separately. If we had specified the alignment network from the beginning, maybe our circuit guys would have found a zero-time solution (they are pretty clever). Over the phone, Dave also worried that this circuitry might not scale, time wise, as well as other stuff. These are tough issues! bcase