[comp.sys.sgi] What is the real Speed of a 4D/240 ?

tim@VAX1.CC.UAKRON.EDU (Timothy H Smith) (04/25/89)

We have just obtained a 4D/240 and are using it for scaler
floating point operations.  We do not need a vector machine.
We were told the floating point speed was about 12 mflops.
Well it turns out to be about 4 mflops. 
When the fortran compiler runs at any optimzation level the numbers
it generates are all bad.  Turn off the optimzation and all is fine.
Also when the machine is running some reasable jobs the interactive
response goes to nothing.  I mean 5 minutes for a ls.
I know the machine need more memory, but I don't expect this.


What is the state of the SGI compilers.  Not so good as far as I can 
tell.  Try to run power fortran and none of the results are good.
Power fortran was bought to utilize all 4 processors..  It would
be nice if it worked.  Can anyone from SGI comment on this.

Our machine is a 4D/240S with 16meg of memory.


				thanks,

				tim@vax1.cc.uakron.edu

*** my comments are mine and do not reflect my orginization ***

bron@bronze.SGI.COM (Bron Campbell Nelson) (04/28/89)

In article <139@VAX1.CC.UAKRON.EDU>, tim@VAX1.CC.UAKRON.EDU (Timothy H Smith) writes:
> We have just obtained a 4D/240 and are using it for scaler
> floating point operations.  We do not need a vector machine.
> We were told the floating point speed was about 12 mflops.
> Well it turns out to be about 4 mflops. 

The double-precision linpack benchmark number for 1 processor is
4mflops.  The sales person was probably talking about collective
speed for the machine as a whole (although why they didn't therefore
claim it was a 16mflop machine I don't know).  I have run some benchmark
jobs that see a 3x speedup when running on 4cpus (using the automatic
parallelizing FORTRAN tools).  Such a job can indeed run at 12mflops.
In fact, I get over 20mflops on one of the llnl kernels (disclaimer:
parallel speed up is *very* application dependent; your mileage may vary).

> When the fortran compiler runs at any optimzation level the numbers
> it generates are all bad.  Turn off the optimzation and all is fine.

There is a known bug in the optimizer.  If you have the Power Fortran
option, there is something you can try: move /usr/lib/uopt to
/usr/lib/uopt.orig, and link /usr/lib/uopt_mp to /usr/lib/uopt.
This will cause the Multi-Processing optimizer to be used in place
of the normal optimizer (uopt_mp also works on normal scalar code). It
should do all the same optimizations as the normal optimizer, and in
addition has the bug (we know about) fixed.  We do not (yet) ship this
as the standard optimizer since at the time of the software release, we
had not had enough time to be sure the bug fix wouldn't break something
else in one of the other languages (C, Pascal, PL/I, etc.).  Fortran
codes should be fine.  In fact, I believe that no problems have been
uncovered in any other language either up to this point, so the other
languages should be fine too.

> ...   Try to run power fortran and none of the results are good.
> Power fortran was bought to utilize all 4 processors..  It would
> be nice if it worked.  Can anyone from SGI comment on this.

I've been running power fortran for a long time and I get quite good
results; perhaps I can help.  If you have a specific question or problem,
send me email and we'll try to resolve it.

--
Bron Campbell Nelson
bron@sgi.com  or possibly  ..!ames!sgi!bron
These statements are my own, not those of Silicon Graphics.

thant@horus.SGI.COM (Thant Tessman) (04/28/89)

In article <139@VAX1.CC.UAKRON.EDU>, tim@VAX1.CC.UAKRON.EDU (Timothy H Smith) writes:
> We have just obtained a 4D/240 and are using it for scaler
> floating point operations.  We do not need a vector machine.
[stuff deleted]
> Also when the machine is running some reasable jobs the interactive
> response goes to nothing.  I mean 5 minutes for a ls.
> I know the machine need more memory, but I don't expect this.
[stuff deleted]
> 
> Our machine is a 4D/240S with 16meg of memory.
> 

I once worked with a 4D/120 with 8 meg.  It was next to useless.  I don't 
think they should sell them like that (with that little memory).

You have twice as much memory but you also have twice as many processors.
Run gr_osview to see if it is spending all its time swapping.

The compilers are from MIPS and are generally considered excelent.
If you really are getting different answeres with optimised versus 
non-optimised code, you should report it to the hotline as a bug.  
(Narrow it down and post it to the net?)

> 
> 				thanks,
> 
> 				tim@vax1.cc.uakron.edu
> 
> *** my comments are mine and do not reflect my orginization ***

ditto

thant@sgi.com