graham@convex.com (Marv Graham) (05/23/91)
No optimization:
Total time (sys+user) : 6.18 (bobstones)
Page faults (min/maj) : 1/62
Blocks in input/output : 2/0
Context switches (vol/invol): 3/383
Total time (sys+user) : 6.04 (bobstones)
Page faults (min/maj) : 1/62
Blocks in input/output : 1/0
Context switches (vol/invol): 2/186
Total time (sys+user) : 6.12 (bobstones)
Page faults (min/maj) : 1/62
Blocks in input/output : 2/0
Context switches (vol/invol): 3/369
Basic block but no global optimization
Total time (sys+user) : 5.50 (bobstones)
Page faults (min/maj) : 3/62
Blocks in input/output : 0/0
Context switches (vol/invol): 1/333
Total time (sys+user) : 5.48 (bobstones)
Page faults (min/maj) : 1/62
Blocks in input/output : 1/0
Context switches (vol/invol): 2/294
Total time (sys+user) : 5.34 (bobstones)
Page faults (min/maj) : 1/62
Blocks in input/output : 2/0
Context switches (vol/invol): 3/139
Global optimizaion
Total time (sys+user) : 4.64 (bobstones)
Page faults (min/maj) : 3/62
Blocks in input/output : 0/0
Context switches (vol/invol): 1/122
Total time (sys+user) : 4.67 (bobstones)
Page faults (min/maj) : 1/62
Blocks in input/output : 5/0
Context switches (vol/invol): 4/167
Total time (sys+user) : 4.70 (bobstones)
Page faults (min/maj) : 1/62
Blocks in input/output : 1/0
Context switches (vol/invol): 2/192
Marv Graham; Convex Computer Corp. {uunet,sun,uiucdcs,allegra}!convex!graham
graham@mozart.convex.comgraham@convex.com (Marv Graham) (05/23/91)
As several people have told me, I omitted to mention what machine my bobstone
number were for.
Convex C220...
Marv Graham; Convex Computer Corp. {uunet,sun,uiucdcs,allegra}!convex!graham
graham@mozart.convex.comkcollins@convex.com (Kirby L. Collins) (05/24/91)
In posting results for the Convex C220, Marv neglected to mention that
these results are SCALAR only, with vectorization and parallelization
inhibited. In fact, the inner loop in this benchmark is quite amenable
to vectorization and parallelization:
Script started on Thu May 23 12:46:35 199
hurst [32]cc -ds -O3 -o bobstone bobstone.c
Optimization by Loop for Routine main
Line Iter. Reordering Optimizing / Special Exec.
Num. Var. Transformation Transformation Mode
-----------------------------------------------------------------------------
13 i Scalar
16 loc PARA/VECTOR SVZ
Line Iter. Analysis
Num. Var.
-----------------------------------------------------------------------------
13 i Inner loop has induction value with varying base or step
16 loc Parallel outer strip mine loop
hurst [33]uptime
12:47pm up 1 day, 19:38, 3 users, load average: 0.01, 0.35, 0.96
hurst [34]/bin/time bobstone
Total time (sys+user) : 1.66 (bobstones)
Page faults (min/maj) : 5/69
Blocks in input/output : 0/0
Context switches (vol/invol): 178/16
0.7 real 1.4 user 0.1 sys
script done on Thu May 23 12:47:15 199
Note that the wall clock time is less than the CPU time, since the
CPU cycles were distributed across multiple heads. Hurst is a C240,
with four processors, and was lightly loaded at the time. The speedup
from parallel execution was only a bit more than 2X, not uncommon for
loops which are both vectorized and executed in parallel. The speedup
would likely only approach 4X for much larger trip counts for the loc
loop.
Please note that the above is the result of exactly five minutes of
compile-execute-analysis. Thus I fall into the same trap I often
complain about...generating benchmark numbers without any meaningful
analysis of the results 8-{.
Kirby Collins
Strategic Planner
Convex Computerckp@grebyn.com (Checkpoint Technologies) (05/24/91)
Don't you really mean "bhobstone?" :-) -- Richard Krehbiel, private citizen ckp@grebyn.com (Who needs a fancy .signature?)