[comp.arch] How to REALLY compute a SPECmark

rcg@lpi.liant.com (Rick Gorton) (04/16/91)

In article <8840027@hpfcso.FC.HP.COM> mjs@hpfcso.FC.HP.COM (Marc Sabatella) writes:
>>  2.  A number based on how many integer and floating point operations 
>>      the program *actually* performs when being run.  Instead of getting
>>      "credit" for the number of operations to be executed as defined by
>>      the source code, "credit" is given for the runtime frequency of ops
>>      in the executable.
>
>Two obvious flaws:
>
>a) How on earth would you measure that?  Have someone disasseble the compiled
>   code and hand trace its execution, counting operations?  Or perhaps supply
>   hand coded assembly versions of the program for each architecture?
>

There do exist tools which can be used to get actual instruction
execution statistics, which would presumably permit an accurate
count of how many integer instructions and how many floating point
instructions are used.

But there IS a catch to doing this.  If integer load/store
instructions are integer operations, and floating point
load/store instructions are floating operations, it is then
possible to skew the results with a clever code generator.

If the [single-processor] CPU the compiler is targeted for has the
following cycle timings:  (Assume 32 bit ints, 32 bit single precision,
and 64 bit double precision)

	LDint		4 cycles	STint		4 cycles
	LDsingle	5 cycles	STsingle	5 cycles
	LDdouble	6 cycles	STdouble	6 cycles

It is then worth the effort to try to use LDint/STint on
single precision numbers when memory<-->memory data movement
is being performed.  Like array data movement operations.
Or for that matter, any 4 byte quantity being moved around.
And it is also worthwhile to use the LDdouble/STdouble instructions
on 8 byte items, like large structures, and double precision floating
point numbers.  The tough part is detecting when we are really
just doing an assignment of <A> to <B>, where <A> and <B> are
equivalent (same size, shape, datatypes).

Thus, the frequency of integer/floating point operations
is going to be compiler dependent for the SAME program
for the SAME CPU, even to the point of being dependent upon
the particular release of the compiler.

When you add other spices to the brew, so to speak,
like multiple CPUs, vector hardware, superscalar behavior,
etc., the number of alternatives to solve the "Move <A> to <B>"
problem becomes much more complex.

	rick

-- 
Richard Gorton               rcg@lpi.liant.com  (508) 626-0006
Language Processors, Inc.    Framingham, MA 01760
Hey!  This is MY opinion.  Opinions have little to do with corporate policy.