[comp.arch] How fast is fast?

butcher@arisia.Xerox.COM (Lawrence Butcher) (08/29/90)

While eating chinese in downtown asymptotia, we came up with this question:

What is the biggest SPEC mark a computer can have?

To try to make this more concrete, imagine a computer which fetches or stores
one data item each clock.  Let each data reference be to an aligned data item
of the proper size; so that for instance fetching 4 bytes one at a time takes
4 clocks.  (Fetching a 64 bit quantity would take 2 clocks over a 32 bit bus.)
Let it have about 32 integer and 32 floating point registers to allow reuse of
data.  Assume that it never misses a clock due to data dependencies.  It just
does the data references the SPEC programs need, one reference per clock.

At 10 MHz and with a 32 bit bus, what would the SPEC rating be?

At 10 MHz and with a 64 bit bus, what would the SPEC rating be?

What if it had 2 data busses?  Specifically if it could reference 32 bits of
data from the stack AND main memory per clock, what SPEC rating would it have?

What about the same 2 data busses, but with each data path 64 bits wide?

				Lawrence

patrick@convex.COM (Patrick F. McGehearty) (08/31/90)

In article <12035@arisia.Xerox.COM> butcher@arisia.Xerox.COM (Lawrence Butcher) writes:
>While eating chinese in downtown asymptotia, we came up with this question:
>
>What is the biggest SPEC mark a computer can have?
>
...<omitted limited hardware and software specification >
how many registers, how complex instructions are allowed, is super scalar
allowed, is vector allowed, are all relevant to the specification.
All would need to be well defined before this question can begin to be
answered.  Even so, software issues make it of limited value anyway.
Optimizing compiler technology is not static, with advances are being made
all the time.

All of which means, if someone were to claim that an ideal processor running
at x MHz could obtain no more than y SPECMARKS, I claim that given sufficent
time and incentive, I (or any tuning specialist) could find a way to modify
the processor internals or the software implementation to obtain y+delta
SPECMARKS. In the past, I have obtained deltas from 10% to 10000%, depending
on the code complexity and situation.  That of course does not count those
improvements found by optimizing away the work all together (infinite %
improvement).

Spending a large part of my time improving system/compiler performance,
I have come to believe that (almost) any software can be made to
run substantially faster if a tuning specialist is motivated to apply enough
effort.  I say a tuning specialist, because I have discovered that not
everyone has the right mindset or training to tune code, as compared with
developing, documenting, developing, or testing it.

To avoid another series of tuning flames, I am aware of the many
tuning tradeoffs (as have been discussed before) which include:
1) Cost of tuning vs frequency of execution and relative speed improvement
2) Value of tuning vs likelyhood of introducing bugs
3) Value of tuning vs cost of added complexity in maintaining tuned code
4) Value of tuning vs added cost of future porting of tuned code (especially
	when converting high level language to assembly) 
5) Value of tuning this code vs tuning some other code
6) Value of tuning vs value of similar effort in other feature enhancements
7) etc.

I see the real value of the SPECmark programs to be that system (hardware or
software) improvements which increase the SPECmark numbers are likely to
improve the performance of typical customer applications on workstations.
The same is true for the Perfect Club benchmark suite for supercomputer
applications.  Taking a mix of applications from the target market for a
system and developing a standard benchmark based on these programs is an
excellent (but expensive) way to reduce the noise level in comparing system
performance.  It also provides vendors with incentives and targets for
improvements.  Of course, all benchmarks become outdated with time.
In 10 to 20 years when we are running 100+GigaHz clocks and have Gbyte
systems on our desktops, which performance issues are relevant will have
changed again.