rcg@lpi.liant.com (Rick Gorton) (04/16/91)
In article <8840027@hpfcso.FC.HP.COM> mjs@hpfcso.FC.HP.COM (Marc Sabatella) writes: >> 2. A number based on how many integer and floating point operations >> the program *actually* performs when being run. Instead of getting >> "credit" for the number of operations to be executed as defined by >> the source code, "credit" is given for the runtime frequency of ops >> in the executable. > >Two obvious flaws: > >a) How on earth would you measure that? Have someone disasseble the compiled > code and hand trace its execution, counting operations? Or perhaps supply > hand coded assembly versions of the program for each architecture? > There do exist tools which can be used to get actual instruction execution statistics, which would presumably permit an accurate count of how many integer instructions and how many floating point instructions are used. But there IS a catch to doing this. If integer load/store instructions are integer operations, and floating point load/store instructions are floating operations, it is then possible to skew the results with a clever code generator. If the [single-processor] CPU the compiler is targeted for has the following cycle timings: (Assume 32 bit ints, 32 bit single precision, and 64 bit double precision) LDint 4 cycles STint 4 cycles LDsingle 5 cycles STsingle 5 cycles LDdouble 6 cycles STdouble 6 cycles It is then worth the effort to try to use LDint/STint on single precision numbers when memory<-->memory data movement is being performed. Like array data movement operations. Or for that matter, any 4 byte quantity being moved around. And it is also worthwhile to use the LDdouble/STdouble instructions on 8 byte items, like large structures, and double precision floating point numbers. The tough part is detecting when we are really just doing an assignment of <A> to <B>, where <A> and <B> are equivalent (same size, shape, datatypes). Thus, the frequency of integer/floating point operations is going to be compiler dependent for the SAME program for the SAME CPU, even to the point of being dependent upon the particular release of the compiler. When you add other spices to the brew, so to speak, like multiple CPUs, vector hardware, superscalar behavior, etc., the number of alternatives to solve the "Move <A> to <B>" problem becomes much more complex. rick -- Richard Gorton rcg@lpi.liant.com (508) 626-0006 Language Processors, Inc. Framingham, MA 01760 Hey! This is MY opinion. Opinions have little to do with corporate policy.