[net.micro.ns32k] 32332 faster than 32032?

johnl@hammer.UUCP (John Light) (06/16/86)
How can the 32332 be so much faster than the 32032?  Easy.
Let's take a brief look inside the 32032.  I did this work
while our 6200 workstation was a product, and I am telling it
from memory, so some details may be wrong.

When the 32032 came out, I couldn't understand why it was only
about 20% faster than the 32016.  I had waited in anticipation
because by examining the bus usage of the 32016, it was apparent
that the part was fully bus limited.  I did this by comparing the
instruction microcode times in the back of the manual with the
cycle-by-cycle activity on its bus.  It was clear from looking at
those numbers that the microcode spent a lot of time waiting for
bus bandwidth.  I predicted that the 32032 would be 80-100% faster
than the 32016 based on that analysis.

When the parts arrived, I was amazed and disappointed by the observed
performance.  The bus bandwidth was doubled, and the parts were running
only 20% faster.  I again looked at a cycle-by-cycle comparison on its
bus.  I used the "one million times" benchmark from the Byte (or was it
Aim) benchmarks.  It is a simple loop that increments a counter inside
a loop.  It took about 90 clocks for one loop on the 32016.

I first did a naive summation of the microcode execution times from the
manual (32 clocks).  Then I looked at the bus (~70 clocks).  I said to
myself "the bus must not be well synchronized to the cpu", so I did
another analysis of the microcode times on a cycle-by-cycle basis. 
There were conflicts with asynchronous activity in the bus unit.  This
made the theoretical time 38 clocks.  Then I assumed a typical asynchronous
rendezvous time of one clock whenever the cpu wanted the bus interface unit.
That brought the time to ~45 clocks.

The only remaining explanations were that 1) the microcode times were
wrong, or 2) the 32032 was badly designed.  I doubted 1) since they had
just updated the table and the microcode times were "reasonable".  The
remaining conclusion was inescapable.  With perspective I now say it was
inevitable.  However attractive the concept is, you can't just change
an asynchronous bus interface unit from 16 to 32 bits and expect it to
perform.  The 32032 was a 16-bit processor with a wide bus tacked on.

The 32332's performance is therefore explanable.  It is much faster than
its ill behaved predecessor because it was designed from the start to
support a 32-bit bus effectively.  If the 32032 had been done right, it
would have been twice as fast at the 32016.  Then the 32332 would only be 
25-50% faster than a 32032 of the same clock speed.  That's reasonable,
isn't it?

John Light
tektronix!hammer!johnl