eugene@ames.UUCP (Eugene Miya) (10/10/85)
I had call to go visit the Cray-2 downstairs. It's tiny. It's overwhelmed by the disk drives which surround the thing. The foot-print (C) takes up the space of 1 large disk drive. We have a beautiful blue and metal finish with clear windows showing the fluorinert. The console is an AT&T PC 6300. I've run two programs thus far for testing memory contention, but this was on the LLNL 1 quad C-2. Back to processing my travel forms..... I don't think I offended any one by saying this. From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene emiya@ames-vmsb
rchrd@well.UUCP (Richard Friedman) (10/15/85)
In article <1189@ames.UUCP>, eugene@ames.UUCP (Eugene Miya) writes: > I had call to go visit the Cray-2 downstairs. It's tiny. ... Gene: Is it true that the CRAY-2 cpu is really a CRAY-1 (not an X-MP) cpu, meaning that it has only one path to memory and doesnt do chaining? So the only major speedup between the X-mp and -2 cpu's is the faster clock cycle of the 2, 4.1 ns. Also, I understand that the 256K 64-bit memory is slower than the memory on the x-mp, but there is a fast 16K memory cache per processor. So it really looks like a CDC 7600!! Question is, will the Y-MP be faster? (16 processors, 64Mwords) Have you looked at the Cray-2 compiler... I hear its based on the old CFT1.10 and doesnt have character data (yet). I'd like to see some comparison timings between the x-mp and the 2. -- [rchrd] = Richard Friedman Pacific-Sierra Research, 2855 Telegraph #415 Berkeley, CA 94705 (415) 540 5216 UUCP: {hplabs,ptsfa,dual}!well!rchrd
brooks@lll-crg.ARpA (Eugene D. Brooks III) (10/16/85)
>Gene: > Is it true that the CRAY-2 cpu is really a CRAY-1 (not an X-MP) >cpu, meaning that it has only one path to memory and doesnt do >chaining? Yes, it true. >So the only major speedup between the X-mp and -2 >cpu's is the faster clock cycle of the 2, 4.1 ns. yes >Also, I understand that the 256K 64-bit memory is slower than the >memory on the x-mp, but there is a fast 16K memory cache per processor. Yes, the latency of the main memory is a real problem. >So it really looks like a CDC 7600!! I'm sure you would prefer the Cray 2. The user does not see the 16k local memory, the compiler does. >I'd like to see some comparison timings between the x-mp and the 2. When the xmp is benchmarked against the 2 the xmp usually wins unless one can manage to effectively buffer vectors through the 16k cache and make a lot of uses of the vector data. If the cache can't be effectively used and the 3 port architecture is useable on the the loop the xmp wins. PS I haven't heard of a single person who has stood in the middle of the Cray 2 and was not impressed.
eugene@ames.UUCP (Eugene Miya) (10/16/85)
I posted my original message after returning from the CUG meeting at a moment of sweeping beauty upon seeing the C-2. Truly a sight to behold. > > Is it true that the CRAY-2 cpu is really a CRAY-1 (not an X-MP) > >cpu, meaning that it has only one path to memory and doesnt do > >chaining? > Yes, it true. Agreed, but is a single data path the only critereon for a Cray-1? The Cray-2 is in someways a new machine and not instruction set compatable with 1s or Xs thus upsetting many existing batch-oriented sites. > >So the only major speedup between the X-mp and -2 > >cpu's is the faster clock cycle of the 2, 4.1 ns. > yes Oversimplification in some ways. The 2 has four CPUs, so is the thing 4 times faster? Architects have yet to discover Brooks's [not Eugene's] Law [I guess it derives as Amdahl's Law]. The 2 also got rid of two banks of CPU registers. It is my understanding that there was controversy inside Cray about the real effectiveness in chaining and those registers. Time will tell. > >Also, I understand that the 256K 64-bit memory is slower than the > >memory on the x-mp, but there is a fast 16K memory cache per processor. > Yes, the latency of the main memory is a real problem. We have an X-MP/1 [MOS] and an X-MP/2 [Bipolar] with exactly 2 MW memory so they have precisely 16 banks of memory. I have a memory contention test which plots like the following: Access time is the vertical dimension. /\ / \ /\ / \ /\ ___/\__/ \__/\__/ \__/\__/ \ . . . ---+----+----+------+------+----+---+---+---+-------- 4 8 12 16 20 24 28 32 36 Stride I knew I should have used dataplot. I hate ploting on an ASCII device. The beauty of this plot is that curve is identical for the 12 as in the 22. The X-MP/12 takes about 50% longer to do a memory access than the X-MP/22. I've been given various explanation, but I suspect it's strictly because of the MOS vs bipolar memory technology. Note the proportions of the peaks are precisely factors of 2 higher than the surrounding overhead. The floor of the graph is not 0, but the peaks are correctly positioned over those numbers I indicated. I can also see noise on the 22 because [we think] it's a multiprocessor and bank contention takes place because of the second CPU. So it's mostly in the memory speed. [Again grossly oversimplified.] > >So it really looks like a CDC 7600!! > I'm sure you would prefer the Cray 2. The user does not > see the 16k local memory, the compiler does. Some call it a cache. > > Question is, will the Y-MP be faster? (16 processors, 64Mwords) Y-MP? What's a Y-MP? Sorry, I cannot comment on the Y-MP. Write cray, they are on the net. I have not signed non-disclosure, but I once opened my mouth a tiny bit too wide [emphasis on tiny] once this net and a tidal wave from the MN/WI area hit me. > > Have you looked at the Cray-2 compiler... I hear its based on the old > > CFT1.10 and doesnt have character data (yet). > > I'd like to see some comparison timings between the x-mp and the 2. > > [rchrd] = Richard Friedman > > Pacific-Sierra Research, 2855 Telegraph #415 > When the xmp is benchmarked against the 2 the xmp usually wins unless > one can manage to effectively buffer vectors through the 16k cache and > make a lot of uses of the vector data. If the cache can't be effectively > used and the 3 port architecture is useable on the the loop the xmp > wins. CFT2 is based currently on the 1.09 version I believe. I am uncertain about all plans for upgrade. CFT77 [formerly NFT] is written in Cray Pascal in attempt to ease maintenance, easily add new vectorization and multi-tasking features, and so forth. CFT77 will have to have a considerable shakedown as CFT (written in CAL) is quite mature in some ways. Regarding performance: See my above test as to why. Richard, I've stopped by your office, and I welcome you to see my other performance stuff on the X-MP and 2. I just showed some of it to the LLNL people [George Michael] the other day. Bring your German parallel processor bibliography with you. From the Rock of Ages Home for Retired Hackers: --eugene miya NASA Ames Research Center {hplabs,ihnp4,dual,hao,decwrl,allegra}!ames!aurora!eugene emiya@ames-vmsb
husmann@uicsrd.CSRD.UIUC.EDU (10/17/85)
> In article <1189@ames.UUCP>, eugene@ames.UUCP (Eugene Miya) writes: > > I had call to go visit the Cray-2 downstairs. It's tiny. ... > > Gene: > Is it true that the CRAY-2 cpu is really a CRAY-1 (not an X-MP) > cpu, meaning that it has only one path to memory and doesnt do > chaining? So the only major speedup between the X-mp and -2 > cpu's is the faster clock cycle of the 2, 4.1 ns. > Also, I understand that the 256K 64-bit memory is slower than the > memory on the x-mp, but there is a fast 16K memory cache per processor. > So it really looks like a CDC 7600!! > Question is, will the Y-MP be faster? (16 processors, 64Mwords) > > Have you looked at the Cray-2 compiler... I hear its based on the old > CFT1.10 and doesnt have character data (yet). > I'd like to see some comparison timings between the x-mp and the 2. > > > > -- > > [rchrd] = Richard Friedman > Pacific-Sierra Research, 2855 Telegraph #415 > Berkeley, CA 94705 (415) 540 5216 > UUCP: {hplabs,ptsfa,dual}!well!rchrd The Cray-2 brochure I have lists the following facts about the Cray-2: o a 256 *million* word Common memory (not 256K), o *4* Background CPU's, o 1 Foreground CPU; it appears the background CPU's handle the computation and the Foreground CPU "supervises overall system activity among the Foreground Processor, Background Processors, Common Memory, and peripheral controllers," o it looks like each Background CPU has only *one* memory port. New things I noticed in the brochure: o UNIX (of course), o gather and scatter instructions, o semaphores for synchronization. I couldn't find any information about chaining which suggests it's gone. The brochure states the Cray-2 throughput is 6-12 times that of the Cray-1. I have a note scribble on the side that say "X-MP about 3 times." I can't remember if that means the Cray-2 is about three times the X-MP, or if the X-MP is about three times the Cray-1. (?!) Harlan Husmann Center for SuperComputer Research and Development University of Illinois at Urbana-Champaign usenet: husmann@uicsrd csnet: husmann@uicsrd.bitnet bitnet: husmann at uiucvme