johnl@hammer.UUCP (John Light) (06/16/86)
How can the 32332 be so much faster than the 32032? Easy. Let's take a brief look inside the 32032. I did this work while our 6200 workstation was a product, and I am telling it from memory, so some details may be wrong. When the 32032 came out, I couldn't understand why it was only about 20% faster than the 32016. I had waited in anticipation because by examining the bus usage of the 32016, it was apparent that the part was fully bus limited. I did this by comparing the instruction microcode times in the back of the manual with the cycle-by-cycle activity on its bus. It was clear from looking at those numbers that the microcode spent a lot of time waiting for bus bandwidth. I predicted that the 32032 would be 80-100% faster than the 32016 based on that analysis. When the parts arrived, I was amazed and disappointed by the observed performance. The bus bandwidth was doubled, and the parts were running only 20% faster. I again looked at a cycle-by-cycle comparison on its bus. I used the "one million times" benchmark from the Byte (or was it Aim) benchmarks. It is a simple loop that increments a counter inside a loop. It took about 90 clocks for one loop on the 32016. I first did a naive summation of the microcode execution times from the manual (32 clocks). Then I looked at the bus (~70 clocks). I said to myself "the bus must not be well synchronized to the cpu", so I did another analysis of the microcode times on a cycle-by-cycle basis. There were conflicts with asynchronous activity in the bus unit. This made the theoretical time 38 clocks. Then I assumed a typical asynchronous rendezvous time of one clock whenever the cpu wanted the bus interface unit. That brought the time to ~45 clocks. The only remaining explanations were that 1) the microcode times were wrong, or 2) the 32032 was badly designed. I doubted 1) since they had just updated the table and the microcode times were "reasonable". The remaining conclusion was inescapable. With perspective I now say it was inevitable. However attractive the concept is, you can't just change an asynchronous bus interface unit from 16 to 32 bits and expect it to perform. The 32032 was a 16-bit processor with a wide bus tacked on. The 32332's performance is therefore explanable. It is much faster than its ill behaved predecessor because it was designed from the start to support a 32-bit bus effectively. If the 32032 had been done right, it would have been twice as fast at the 32016. Then the 32332 would only be 25-50% faster than a 32032 of the same clock speed. That's reasonable, isn't it? John Light tektronix!hammer!johnl