[comp.benchmarks] Interlanguage comparison

eugene@eos.arc.nasa.gov (Eugene Miya) (12/21/90)

Fortran? C?  Makes no sense to compare them if you don't have a basis.
(I had hoped to write an ICPP paper on this, tough luck).  I need a
break among the sandstone.  Remember my minimal post?  Seeing
Pat McGehearty's post got me motivated again ( Hi Pat! ).
FYI: Pat had what I regard as an interesting PhD thesis (yes I read
the whole thing a few years back, Blue CMU cover).  But see, we need
a basis for comparison.  The fact that Pat works for Convex got me
interested in this.  They are one of the few companies with an integrated
language system using a common code generating back end.

If you have the same program written in two languages are they equivalent?
What do we expect of equivalence? It ain't easy.  The LLNL Loops in
Fortran and C are nearly the same thing.  I think C versus Fortran
comparisons are relatively uninteresting.  Same imperative style of language.
LISP, now we stand at get interesting.  I would really like to see 
VAL (Jack Dennis) or SISAL (McGraw et al).

Any ways how to compare languages.  Minimally.  To quote Kernighan and
Plauger (The Elements of Programming style): DO nothing gracefully.
So I took an empty program: one which does nothing and compiled them:
Fortran
	PROGRAM EMPTY
	STOP
	END
C
main () {}
Pascal
	program empty;
	begin
	end.

Consider this:
ON a C-2:
-rwxr-xr-x  1 eugene      31302 Dec 20 14:06 ec
-rwxr-xr-x  1 eugene     155098 Dec 20 14:06 ef
Look at the sizes of the executables.  Should they be the same?
They have the same common back end.  Well they do have different
models of storage.

What about on a Cray (Y-MP)?
-rwxr-xr-x   1 eugene   npo       108112 Dec 20 14:45 ec
-rwxr-xr-x   1 eugene   npo       717064 Dec 20 14:47 ef
-rwxr-xr-x   1 eugene   npo       183584 Dec 20 14:57 ep

And execution time?
% time ec
0.00u 0.01s 0:00 100% 0+0k 0+0io 6pf+0w
% time fc
0.00u 0.01s 0:00 100% 0+0k 0+0io 13pf+0w

Identical functionality, different performance.  Inadequate tool for
timing (time(1)).
On the Y?
% time ec
0.0003u 0.0064s 0:00 0%
% time ef
 STOP  (called by EMPTY )
 CP: 0.001s,  Wallclock: 0.002s,  4.8% of 8-CPU Machine
 HWM mem: 97663, HWM stack: 2048, Stack overflows: 0
0.0012u 0.0069s 0:00 0%
% time ep
0.0005u 0.0038s 0:00 0%


But this does not tell you enough.  We need to count instructions (with
precision so we minimize use of statistics).  So I use the afford mentioned
hardware performance monitor
% hpm -g0 ef
 STOP  (called by EMPTY )
 CP: 0.001s,  Wallclock: 0.008s,  0.8% of 8-CPU Machine
 HWM mem: 97666, HWM stack: 2048, Stack overflows: 0
Group 0:  CPU seconds   :       0.00      CP executing     :         193415

Million inst/sec (MIPS) :      44.02      Instructions     :          51086
Avg. clock periods/inst :       3.79
% CP holding issue      :      42.74      CP holding issue :          82674
Inst.buffer fetches/sec :       0.78M     Inst.buf. fetches:            904
Floating adds/sec       :       0.21M     F.P. adds        :            246
Floating multiplies/sec :       0.23M     F.P. multiplies  :            267
Floating reciprocal/sec :       0.05M     F.P. reciprocals :             54
I/O mem. references/sec :       0.00M     I/O references   :              0
CPU mem. references/sec :      14.70M     CPU references   :          17058

Floating ops/CPU second :       0.49M

We are doing a lot of work to "do nothing gracefully."
Let's see the C and Pascal cases:
% hpm -g0 ec
Group 0:  CPU seconds   :       0.00      CP executing     :          35247

Million inst/sec (MIPS) :      46.92      Instructions     :           9923
Avg. clock periods/inst :       3.55
% CP holding issue      :      43.26      CP holding issue :          15249
Inst.buffer fetches/sec :       0.66M     Inst.buf. fetches:            140
Floating adds/sec       :       0.00M     F.P. adds        :              1
Floating multiplies/sec :       0.00M     F.P. multiplies  :              0
Floating reciprocal/sec :       0.00M     F.P. reciprocals :              0
I/O mem. references/sec :       0.00M     I/O references   :              0
CPU mem. references/sec :      17.24M     CPU references   :           3645

Floating ops/CPU second :       0.00M

% hpm -g0 ep
Group 0:  CPU seconds   :       0.00      CP executing     :          61878

Million inst/sec (MIPS) :      46.97      Instructions     :          17439
Avg. clock periods/inst :       3.55
% CP holding issue      :      42.90      CP holding issue :          26545
Inst.buffer fetches/sec :       0.89M     Inst.buf. fetches:            332
Floating adds/sec       :       0.00M     F.P. adds        :              1
Floating multiplies/sec :       0.00M     F.P. multiplies  :              0
Floating reciprocal/sec :       0.00M     F.P. reciprocals :              0
I/O mem. references/sec :       0.02M     I/O references   :              6
CPU mem. references/sec :      18.26M     CPU references   :           6781

Floating ops/CPU second :       0.00M

Now this is on a loaded system, but I assert you can get important information
even on a leaded system.  The important figure, BTW is the right most column,
this is the raw data.  The middle column of figures is a rounded approximation.

But other interesting information can be taken by the HPM,  I only used
the FORTRAN version of the code to describe this "universe."

% hpm -g1 ef
 STOP  (called by EMPTY )
 CP: 0.001s,  Wallclock: 0.004s,  1.5% of 8-CPU Machine
 HWM mem: 97666, HWM stack: 2048, Stack overflows: 0
Group 1:  CPU seconds  :        0.00116  CP executing:         193018

  Hold issue condition              % of all CPs       actual # of CPs
Waiting on semaphores              :   0.13                       249
Waiting on shared registers        :   0.00                         0
Waiting on A-registers/funct. units:   9.43                     18200
Waiting on S-registers/funct. units:  27.62                     53304
Waiting on V-registers             :   1.38                      2668
Waiting on vector functional units :   0.00                         9
Waiting on scalar memory references:   0.57                      1103
Waiting on block memory references :   1.91                      3677

% hpm -g2 ef
 STOP  (called by EMPTY )
 CP: 0.001s,  Wallclock: 0.002s,  4.1% of 8-CPU Machine
 HWM mem: 97666, HWM stack: 2048, Stack overflows: 0
Group 2:  CPU seconds   :        0.00116     CP executing  :          192818

Inst. buffer fetches/sec   :       0.78M  total fetches    :             904
                                          fetch conflicts  :            1396
I/O memory refs/sec        :       0.00M  actual refs      :               0
    avg conflict/ref   0.00:              actual conflicts :              37
Scalar memory refs/sec     :       5.59M  actual refs      :            6462
Block memory refs/sec      :       9.16M  actual refs      :           10600
CPU memory refs/sec        :      14.75M  actual refs      :           17062
    avg conflict/ref   0.07:              actual conflicts :            1161
  CPU memory writes/sec    :       8.99M  actual refs      :           10399
  CPU memory reads/sec     :       5.76M  actual refs      :            6663

% hpm -g3 ef
 STOP  (called by EMPTY )
 CP: 0.001s,  Wallclock: 0.003s,  2.1% of 8-CPU Machine
 HWM mem: 97666, HWM stack: 2048, Stack overflows: 0
Group 3:  CPU seconds  :        0.00116     CP executing:         192990

 (octal) type of instruction     inst./CPUsec      actual inst.  % of all inst.
(000-017)jump/special           :       5.35M             6190     12.10
(020-077)scalar functional unit :      33.12M            38350     74.96
(100-137)scalar memory          :       5.58M             6462     12.63
(140-157,175)vector integer/log.:       0.01M               14      0.03
(160-174)vector floating point  :       0.00M                2      0.00
(176-177)vector load and store  :       0.12M              141      0.28

  type of operation                ops/CPUsec       actual ops   avg. VL
Vector integer&logical          :       0.12M              138      9.86
Vector floating point           :       0.20M              232    116.00
Scalar functional unit          :      33.12M            38350


That took four executions to learn all that (for a simple program which
does nothing gracefully).  That's quite a cost.  You can't do that with
some real programs.

There is more to performance than execution time.
If we want to design faster machines (worry about buying them later)
we must potentially make observations with this detail.  This isn't
possible on many machines.  Future machines must have this kind of environment.

I hope you can begin to see why we MUST come to some kind of consensus
on equivalence or we won't get anywhere with our comparisons.

Do you really know what your programs are doing late in the evenings?
	8^)

Next... should we deal with the problem of resolution?
Adding things to these empty programs and seeing how influences
such as optimization, etc. affect execution.

--e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov
  {uunet,mailrus,most gateways}!ames!eugene
  AMERICA: CHANGE IT OR LOSE IT.

eugene@eos.arc.nasa.gov (Eugene Miya) (12/21/90)

BTW:
In case you missed the thread I am trying to show (that included
the summary of an unknown program (acutally EMPTY) using HPM Group 0).
Let me give you a few hints of things to come.
1) Starting with this calibration (not a fair one we shall see, but it
appears fair), I'll add (really) simple work.  2) I'll try to show
real examples of "over-optimization."  3) How to work around one or
two of these.  4) Try to show hardware and software artifacts.
5) Considering sampling strategies: one or two proposals which might be
radical. 6) Show how a few programs might have deceptive execution.
7) Consider interesting analogies of performance measurement
(Mine are photography: Muybridge [Stanford], Edgerton [MIT & EG&G].
Rafael's is audio equipment.  Othersm might use cars. etc.)
8) Cover real "hard" topics like parallelism, data flow languages and
machines, equivalence, timing scope, synchronization. Etc.

But first, skiing and climbing.

--e.n. miya, NASA Ames Research Center, eugene@eos.arc.nasa.gov
  {uunet,mailrus,most gateways}!ames!eugene
  AMERICA: CHANGE IT OR LOSE IT.