[comp.sys.sgi] Questions about the 4D/120SX

ssroy@phoenix.Princeton.EDU (Steve Scot Roy) (05/03/89)

We are considering buying a 4D/120SX with 4 processors as a compute
server here and I would like to know if anyone out there has more
information about them.

How well do these things parallelize?  There were some postings a while
ago with one person claiming 75% ( factor of 3 speedup with 4
processors) and another claiming that his parallelizer had bugs.  
How code dependant is the speedup? 
These things claim to be -executable- compatible with single processor
machines, how is this done?  If you use the parallelizing optimizer on
it and then run it on a single processor machine, does it work?
Can you tell each processor to run
independantly for so each of many users sees a single processor?
Do all languages parallelize?  Specificly, does C parallelize?

How much better is the 2XX series than the 1XX series?  What is the
difference?  What speed do people -really- see on these beasts?

How much memory do these things need?  Some postings said they were
useless with 8Meg, another was complaining at 16Meg; how much is enough
and how much does it cost?

In general, how stable and bug free are these things?  Will I wonder
every time a program isn't working whether it is my fault or its?

Thanks a lot in advance.

Steve Roy
ssr@acm.princeton.edu

bron@bronze.SGI.COM (Bron Campbell Nelson) (05/03/89)

In article <8089@phoenix.Princeton.EDU>, ssroy@phoenix.Princeton.EDU (Steve Scot Roy) writes:
> 
> We are considering buying a 4D/120SX with 4 processors as a compute
> server here and I would like to know if anyone out there has more
> information about them.
Actually, I believe the marketing designation is 4D/140 for the 4cpu
version (the 4D/120 has 2cpus).

> How code dependant is the speedup <for parallelism> ?
Totally code dependent.  Parallelization is done at the Fortran DO loop
level.  Different iterations of a loop are executed in parallel on different
processors.  If your code spends most of its time executing inside
such a loop (or loops), and the loop(s) can be parallelized (not all
can be), you should see good speed up.

> These things claim to be -executable- compatible with single processor
> machines, how is this done?  If you use the parallelizing optimizer on
> it and then run it on a single processor machine, does it work?

Yes, it does.  In fact, you can compile and run the code on any of the
4D series of machines, and run the same executable on any other.  For
example, do code development on a Personal Iris (4D/20), and run the
result on the multi-processor.

As to how it works:  When the program starts up, the intialization
routines figure out how many processes you want.  The default is to
ask the o.s. how many cpus are on the machine, and use that.  Alternately,
you can set a shell environment variable specifying the number.  This
number is remembered.  When the parallel loop is encountered,
the iterations are divided among the processes that are participating
in the job.  Division by 1 is perfectly ok.  Each process does its
piece, and then they all synchronize at the end.  This means that if
you run the version compiled for a multi-processor on a single processor
machine, it works, but runs a little bit slower than that same program
compiled for a uni-processor (you incur the multi-processing overhead
without benifit of an extra processor).  However, that executable can
now be transported and run on a multi-processor without change.

> Do all languages parallelize?  Specificly, does C parallelize?
Right now, only Fortran has compiler support for parallelism. C (or
any other language) can make use of the multi-processing library
routines, but you have to do the parallelism "by hand".

> How much better is the 2XX series than the 1XX series?  What is the
> difference?
The 2xx series uses the 25MHz chips, the 1xx uses 16MHz.  The 2xx also
has some significant memory interface changes, and a bigger 2nd level
cache.  For the number crunching codes I run, the 2xx cpus are pretty
much uniformly twice as fast as the 1xx cpus (much more than the ratio
of their clock speeds would make you think).  Of course, your mileage
may vary.

> How much memory do these things need?  Some postings said they were
> useless with 8Meg, another was complaining at 16Meg; how much is enough
Depends on the applications you run of course, but I for one wouldn't put
any less than 16meg on a 2cpu system, nor less than 24meg with 4cpus.
(Of course, more is always better! :-)

> In general, how stable and bug free are these things?  Will I wonder
> every time a program isn't working whether it is my fault or its?
I have never had the production hardware fail me.  There is (was) the
optimizer bug mentioned earlier, but amusingly enough, this was never
a problem for multi-processed codes! (The mp optimizer had the bug fix.)
Of course, I have a strong bias.

--
Bron Campbell Nelson
bron@sgi.com  or possibly  ..!ames!sgi!bron
These statements are my own, not those of Silicon Graphics.