graham@convex.com (Marv Graham) (05/23/91)
No optimization: Total time (sys+user) : 6.18 (bobstones) Page faults (min/maj) : 1/62 Blocks in input/output : 2/0 Context switches (vol/invol): 3/383 Total time (sys+user) : 6.04 (bobstones) Page faults (min/maj) : 1/62 Blocks in input/output : 1/0 Context switches (vol/invol): 2/186 Total time (sys+user) : 6.12 (bobstones) Page faults (min/maj) : 1/62 Blocks in input/output : 2/0 Context switches (vol/invol): 3/369 Basic block but no global optimization Total time (sys+user) : 5.50 (bobstones) Page faults (min/maj) : 3/62 Blocks in input/output : 0/0 Context switches (vol/invol): 1/333 Total time (sys+user) : 5.48 (bobstones) Page faults (min/maj) : 1/62 Blocks in input/output : 1/0 Context switches (vol/invol): 2/294 Total time (sys+user) : 5.34 (bobstones) Page faults (min/maj) : 1/62 Blocks in input/output : 2/0 Context switches (vol/invol): 3/139 Global optimizaion Total time (sys+user) : 4.64 (bobstones) Page faults (min/maj) : 3/62 Blocks in input/output : 0/0 Context switches (vol/invol): 1/122 Total time (sys+user) : 4.67 (bobstones) Page faults (min/maj) : 1/62 Blocks in input/output : 5/0 Context switches (vol/invol): 4/167 Total time (sys+user) : 4.70 (bobstones) Page faults (min/maj) : 1/62 Blocks in input/output : 1/0 Context switches (vol/invol): 2/192 Marv Graham; Convex Computer Corp. {uunet,sun,uiucdcs,allegra}!convex!graham graham@mozart.convex.com
graham@convex.com (Marv Graham) (05/23/91)
As several people have told me, I omitted to mention what machine my bobstone number were for. Convex C220... Marv Graham; Convex Computer Corp. {uunet,sun,uiucdcs,allegra}!convex!graham graham@mozart.convex.com
kcollins@convex.com (Kirby L. Collins) (05/24/91)
In posting results for the Convex C220, Marv neglected to mention that these results are SCALAR only, with vectorization and parallelization inhibited. In fact, the inner loop in this benchmark is quite amenable to vectorization and parallelization: Script started on Thu May 23 12:46:35 199 hurst [32]cc -ds -O3 -o bobstone bobstone.c Optimization by Loop for Routine main Line Iter. Reordering Optimizing / Special Exec. Num. Var. Transformation Transformation Mode ----------------------------------------------------------------------------- 13 i Scalar 16 loc PARA/VECTOR SVZ Line Iter. Analysis Num. Var. ----------------------------------------------------------------------------- 13 i Inner loop has induction value with varying base or step 16 loc Parallel outer strip mine loop hurst [33]uptime 12:47pm up 1 day, 19:38, 3 users, load average: 0.01, 0.35, 0.96 hurst [34]/bin/time bobstone Total time (sys+user) : 1.66 (bobstones) Page faults (min/maj) : 5/69 Blocks in input/output : 0/0 Context switches (vol/invol): 178/16 0.7 real 1.4 user 0.1 sys script done on Thu May 23 12:47:15 199 Note that the wall clock time is less than the CPU time, since the CPU cycles were distributed across multiple heads. Hurst is a C240, with four processors, and was lightly loaded at the time. The speedup from parallel execution was only a bit more than 2X, not uncommon for loops which are both vectorized and executed in parallel. The speedup would likely only approach 4X for much larger trip counts for the loc loop. Please note that the above is the result of exactly five minutes of compile-execute-analysis. Thus I fall into the same trap I often complain about...generating benchmark numbers without any meaningful analysis of the results 8-{. Kirby Collins Strategic Planner Convex Computer
ckp@grebyn.com (Checkpoint Technologies) (05/24/91)
Don't you really mean "bhobstone?" :-) -- Richard Krehbiel, private citizen ckp@grebyn.com (Who needs a fancy .signature?)