[net.arch] Cray-2 impressions

eugene@ames.UUCP (Eugene Miya) (10/10/85)

I had call to go visit the Cray-2 downstairs.  It's tiny. It's
overwhelmed by the disk drives which surround the thing.  The
foot-print (C) takes up the space of 1 large disk drive.  We have
a beautiful blue and metal finish with clear windows showing the
fluorinert.  The console is an AT&T PC 6300.  I've run two
programs thus far for testing memory contention, but this was on
the LLNL 1 quad C-2.  Back to processing my travel forms.....

I don't think I offended any one by saying this.

From the Rock of Ages Home for Retired Hackers:
--eugene miya
  NASA Ames Research Center
  {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene
  emiya@ames-vmsb

rchrd@well.UUCP (Richard Friedman) (10/15/85)

In article <1189@ames.UUCP>, eugene@ames.UUCP (Eugene Miya) writes:
> I had call to go visit the Cray-2 downstairs.  It's tiny. ...

Gene:
  Is it true that the CRAY-2 cpu is really a CRAY-1  (not an X-MP)
cpu, meaning that it has only one path to memory and doesnt do
chaining?  So the only major speedup between the X-mp and -2
cpu's is the faster clock cycle of the 2,  4.1 ns.
Also, I understand that the 256K 64-bit memory is slower than the
memory on the x-mp, but there is a fast 16K memory cache per processor.
So it really looks like a CDC 7600!!
Question is, will the Y-MP be faster?  (16 processors, 64Mwords)

Have you looked at the Cray-2 compiler... I hear its based on the old
CFT1.10 and doesnt have character data (yet).
I'd like to see some comparison timings between the x-mp and the 2.



-- 
     
    [rchrd] = Richard Friedman
              Pacific-Sierra Research, 2855 Telegraph #415
              Berkeley, CA 94705 (415) 540 5216
    UUCP: {hplabs,ptsfa,dual}!well!rchrd

brooks@lll-crg.ARpA (Eugene D. Brooks III) (10/16/85)

>Gene:
>  Is it true that the CRAY-2 cpu is really a CRAY-1  (not an X-MP)
>cpu, meaning that it has only one path to memory and doesnt do
>chaining?
Yes, it true.

>So the only major speedup between the X-mp and -2
>cpu's is the faster clock cycle of the 2,  4.1 ns.
yes

>Also, I understand that the 256K 64-bit memory is slower than the
>memory on the x-mp, but there is a fast 16K memory cache per processor.
Yes, the latency of the main memory is a real problem.

>So it really looks like a CDC 7600!!
I'm sure you would prefer the Cray 2.  The user does not
see the 16k local memory, the compiler does.

>I'd like to see some comparison timings between the x-mp and the 2.
When the xmp is benchmarked against the 2 the xmp usually wins unless
one can manage to effectively buffer vectors through the 16k cache and
make a lot of uses of the vector data.  If the cache can't be effectively
used and the 3 port architecture is useable on the the loop the xmp
wins.

PS  I haven't heard of a single person who has stood in the middle of the
Cray 2 and was not impressed.

eugene@ames.UUCP (Eugene Miya) (10/16/85)

I posted my original message after returning from the CUG meeting at
a moment of sweeping beauty upon seeing the C-2. Truly a sight to behold.

> >  Is it true that the CRAY-2 cpu is really a CRAY-1  (not an X-MP)
> >cpu, meaning that it has only one path to memory and doesnt do
> >chaining?
> Yes, it true.

Agreed, but is a single data path the only critereon for a Cray-1?
The Cray-2 is in someways a new machine and not instruction set compatable
with 1s or Xs thus upsetting many existing batch-oriented sites.

> >So the only major speedup between the X-mp and -2
> >cpu's is the faster clock cycle of the 2,  4.1 ns.
> yes

Oversimplification in some ways.  The 2 has four CPUs, so is the thing
4 times faster?  Architects have yet to discover Brooks's [not Eugene's]
Law [I guess it derives as Amdahl's Law].  The 2 also got rid of two
banks of CPU registers.  It is my understanding that there was controversy
inside Cray about the real effectiveness in chaining and those registers.
Time will tell.  

> >Also, I understand that the 256K 64-bit memory is slower than the
> >memory on the x-mp, but there is a fast 16K memory cache per processor.
> Yes, the latency of the main memory is a real problem.

We have an X-MP/1 [MOS] and an X-MP/2 [Bipolar] with exactly 2 MW memory
so they have precisely 16 banks of memory.  I have a memory contention test
which plots like the following:

Access time is the vertical dimension.

                     /\
                    /  \
         /\        /    \        /\  
 ___/\__/  \__/\__/      \__/\__/  \ . . .
 ---+----+----+------+------+----+---+---+---+--------
    4    8   12     16     20   24  28  32  36
		Stride

I knew I should have used dataplot.  I hate ploting on an ASCII device.
The beauty of this plot is that curve is identical for the 12 as in the 22.
The X-MP/12 takes about 50% longer to do a memory access than the X-MP/22.
I've been given various explanation, but I suspect it's strictly because
of the MOS vs bipolar memory technology.  Note the proportions of the peaks
are precisely factors of 2 higher than the surrounding overhead.
The floor of the graph is not 0, but the peaks are correctly positioned
over those numbers I indicated.

I can also see noise on the 22 because [we think] it's a multiprocessor
and bank contention takes place because of the second CPU. 
So it's mostly in the memory speed.  [Again grossly oversimplified.]

> >So it really looks like a CDC 7600!!
> I'm sure you would prefer the Cray 2.  The user does not
> see the 16k local memory, the compiler does.
Some call it a cache.
> > Question is, will the Y-MP be faster?  (16 processors, 64Mwords)

Y-MP? What's a Y-MP?
Sorry, I cannot comment on the Y-MP.  Write cray, they are on the net.
I have not signed non-disclosure, but I once opened my mouth a tiny bit
too wide [emphasis on tiny] once this net and a tidal wave from the
MN/WI area hit me.
 
> > Have you looked at the Cray-2 compiler... I hear its based on the old
> > CFT1.10 and doesnt have character data (yet).
> > I'd like to see some comparison timings between the x-mp and the 2.
> >   [rchrd] = Richard Friedman
> >             Pacific-Sierra Research, 2855 Telegraph #415
> When the xmp is benchmarked against the 2 the xmp usually wins unless
> one can manage to effectively buffer vectors through the 16k cache and
> make a lot of uses of the vector data.  If the cache can't be effectively
> used and the 3 port architecture is useable on the the loop the xmp
> wins.

CFT2 is based currently on the 1.09 version I believe.  I am uncertain
about all plans for upgrade.  CFT77 [formerly NFT] is written in Cray
Pascal in attempt to ease maintenance, easily add new vectorization and
multi-tasking features, and so forth.  CFT77 will have to have a considerable
shakedown as CFT (written in CAL) is quite mature in some ways.

Regarding performance: See my above test as to why.  Richard,
I've stopped by your office, and I welcome you to see my other performance
stuff on the X-MP and 2.  I just showed some of it to the LLNL people
[George Michael] the other day.  Bring your German parallel processor
bibliography with you.

From the Rock of Ages Home for Retired Hackers:
--eugene miya
  NASA Ames Research Center
  {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene
  emiya@ames-vmsb

husmann@uicsrd.CSRD.UIUC.EDU (10/17/85)

> In article <1189@ames.UUCP>, eugene@ames.UUCP (Eugene Miya) writes:
> > I had call to go visit the Cray-2 downstairs.  It's tiny. ...
> 
> Gene:
>   Is it true that the CRAY-2 cpu is really a CRAY-1  (not an X-MP)
> cpu, meaning that it has only one path to memory and doesnt do
> chaining?  So the only major speedup between the X-mp and -2
> cpu's is the faster clock cycle of the 2,  4.1 ns.
> Also, I understand that the 256K 64-bit memory is slower than the
> memory on the x-mp, but there is a fast 16K memory cache per processor.
> So it really looks like a CDC 7600!!
> Question is, will the Y-MP be faster?  (16 processors, 64Mwords)
> 
> Have you looked at the Cray-2 compiler... I hear its based on the old
> CFT1.10 and doesnt have character data (yet).
> I'd like to see some comparison timings between the x-mp and the 2.
> 
> 
> 
> -- 
>      
>     [rchrd] = Richard Friedman
>               Pacific-Sierra Research, 2855 Telegraph #415
>               Berkeley, CA 94705 (415) 540 5216
>     UUCP: {hplabs,ptsfa,dual}!well!rchrd

The Cray-2 brochure I have lists the following facts about the Cray-2:

 o a 256 *million* word Common memory (not 256K),
 o *4* Background CPU's,
 o 1 Foreground CPU; it appears the background CPU's handle the computation and
   the Foreground CPU "supervises overall system activity among the Foreground
   Processor, Background Processors, Common Memory, and peripheral
   controllers,"
 o it looks like each Background CPU has only *one* memory port.

New things I noticed in the brochure:

 o UNIX (of course),
 o gather and scatter instructions,
 o semaphores for synchronization.

I couldn't find any information about chaining which suggests it's gone.

The brochure states the Cray-2 throughput is 6-12 times that of the Cray-1.
I have a note scribble on the side that say "X-MP about 3 times."  I can't
remember if that means the Cray-2 is about three times the X-MP, or if the
X-MP is about three times the Cray-1. (?!)



Harlan Husmann

Center for SuperComputer Research and Development
University of Illinois at Urbana-Champaign

usenet: husmann@uicsrd
csnet:  husmann@uicsrd.bitnet
bitnet: husmann at uiucvme