[comp.arch] Cray 2 has 2GW address

fouts@orville.nas.nasa.gov (Marty Fouts) (02/24/88)

In article <9653@steinmetz.steinmetz.UUCP> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:

>While there are always problems which could be solved to another
>significant digit with more power, even on the Cray2, which can have up
>to 2GB of memory, few problems larger than 500MB are run because the CPU
>takes a lot of real time to clear/search/etc it. Until faster CPUs are
>common, I doubt that there will be a switch to much larger (ie. 64 bit)
>address space, because of market pressures.
>

You appear to be onfusing two different limits on the Cray 2.  The
address space is 2^32 words, which are 64 bits.  The Cray 2 can
address 2GW = 16GB of memory.  The 2GB limit on the current
implementation is the number of 256kbit parts that can fit into the
form factor.  Given 8 times the memory density, (at the same power
consumption, etc) it would be possible to build a 16GB machine. . .

Actually on the each of the two Cray 2s at NASA Ames, there is
typically 1 problem running which occupies between 1/3 and 1/2 of
memory, (2/3 to 1 GB,) and several more problems in the 10 to 80 GB
range, in addition to the interactive unix load of 100+ processes of
~1MB in length.  (and the machines don't *ever* swap)

Having a large memory is advantageous even if a single problem can't
occupy all of it.  The work load we run every day would cause most
supercomputers to spend much of their time doing memory management.

By keeping around 20 active jobs, we can still give good interactive
response, and "batch" throughput an order of magnitude better than
most mainframes.  You haven't lived until you've run on a machine
where GNU emacs is considered a small process. ;-)

One of the ways having a lot of memory helps with slow processor
machines is in the space/time tradeoff.  It is possible for an
application to compute all of the sin/cos values it is going to need
and stuff them away in a table, and then do table lookup, for example.

dave@micropen (David F. Carlson) (02/25/88)

In article <235@amelia.nas.nasa.gov>, fouts@orville.nas.nasa.gov (Marty Fouts) writes:
> In article <9653@steinmetz.steinmetz.UUCP> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:
> 
> >While there are always problems which could be solved to another
> >significant digit with more power, even on the Cray2, which can have up
> 
> By keeping around 20 active jobs, we can still give good interactive
> response, and "batch" throughput an order of magnitude better than
> most mainframes.  You haven't lived until you've run on a machine
> where GNU emacs is considered a small process. ;-)

I know most of these CRAYs are used in DoD research on important things like
bombs and SDI, but my running an EDITOR (ie. slow interactive process)
on a CRAY--presumably payed for with my tax dollars.  Ouch!  Can't you
find any good emacs for a VT100 on a VAX11/780 to run twenty editor jobs?
(I bet every government facility has tons of workhorse CPU for editor
sessions rather than that premium CRAY time.)

-- 
David F. Carlson, Micropen, Inc.
...!{ames|harvard|rutgers|topaz|...}!rochester!ur-valhalla!micropen!dave

"The faster I go, the behinder I get." --Lewis Carroll

davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (02/26/88)

In article <235@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes:
| In article <9653@steinmetz.steinmetz.UUCP> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:
| 
| >While there are always problems which could be solved to another
| >significant digit with more power, even on the Cray2, which can have up
| >to 2GB of memory, few problems larger than 500MB are run because the CPU
| > [...]
|
| You appear to be onfusing two different limits on the Cray 2.  The
| address space is 2^32 words, which are 64 bits.  The Cray 2 can
| address 2GW = 16GB of memory.  The 2GB limit on the current
| implementation is the number of 256kbit parts that can fit into the

As I said, the Cray2 can have 2GB. Since the memory is virtual I don't
*care* how much it can address, only what can be used. Currently that's
2GB. I don't *feel* confused ;->
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

swami@uiucdcsp.cs.uiuc.edu (02/27/88)

> Why run editors (slow interactive processes) on premium Cray-2 time?
Because if you only need a couple of small changes - like fixing that little
syntax error or bug in your fortran program - the cost of transferring the
file to a workstation for editing, and back again, can be more than that of
running an editor on the Cray.  Of course, for extended editing sessions, you
wouldn't want to waste Cray time.

swami@b.cs.uiuc.edu
{ihnp4, pur-ee, convex}!uiucdcs!swami

"Maybe this opinion does represent that of my employers?" :-)

brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (02/29/88)

In article <76700008@uiucdcsp> swami@uiucdcsp.cs.uiuc.edu writes:
>syntax error or bug in your fortran program - the cost of transferring the
>file to a workstation for editing, and back again, can be more than that of
>running an editor on the Cray.  Of course, for extended editing sessions, you
>wouldn't want to waste Cray time.
You mean that for extended editing sessions, you wouldn't want to WAIT for
Cray time.

elg@killer.UUCP (Eric Green) (02/29/88)

in article <416@micropen>, dave@micropen (David F. Carlson) says:
[NASA dude discusses running GNU Emacs on his Cray:]

>> By keeping around 20 active jobs, we can still give good interactive
>> response, and "batch" throughput an order of magnitude better than
>> most mainframes.  You haven't lived until you've run on a machine
>> where GNU emacs is considered a small process. ;-)
> 
> I know most of these CRAYs are used in DoD research on important things like
> bombs and SDI, but my running an EDITOR (ie. slow interactive process)
> on a CRAY--presumably payed for with my tax dollars.  Ouch!  Can't you
> find any good emacs for a VT100 on a VAX11/780 to run twenty editor jobs?

The reason GNU Emacs is slow on a Vax11/780 is simple: thrashing. A
well-engendered Vaxen has maybe 8 megabytes of main memory. The GNU Emacs core
image can get up to 2 megabytes large with no problem. Run 20 of those on a
Vax 780... well, you can see that you better have a hefty swap space, because
that baby's gonna be swappin' her heart out :-}.

But of course you don't have that thrashing on a Cray. Not with 2 Gigawords of
RAM! Not to mention that the lack of paging eliminates the task-switching task
of flushing (and eventually reloading) the MMU TLB's (and replaces it with the
chore of flushing the vector registers, alas -- although I wonder if processes
that do not use the vector registers do, in fact, flush them). I assume that
the "Ouch" that you're talking about is the character-at-a-time task switch
overhead. It appears that task switch overhead on a Cray would be no worse
than on a Vax 780 -- on a machine that's, well, slightly faster :-). Somehow,
I think the task switch time for 20 interactive Emacsen would be in the noise,
insofar as percentage of CPU time is concerned. "Slow interactive process"?
Surely you jest. You've been using an IBM 370 too long... 2 second interrupt
latencies went out with the 60's! (though IBM doesn't seem to have 
noticed :-). 

Somehow, I don't think that 20 Emacs processes are about to bring a Cray ][ to
it's knees... I really truly doubt that the CPU time saved by doing editing
elsewhere, could be justified by paying all those programmers and scientists
for the time involved in moving that text over to the Cray (and it can be a
hassle... at the very least, you have to invoke FTP over a network, and at the
worst, we're talking major troubles).

Hmm. An architectural issue here, maybe. Does the lack of MMU refills etc.
REALLY speed up process switching time? Are Unix processes "lightweight" as
far as a Cray 2 is concerned? 

--
Eric Lee Green  elg@usl.CSNET     Snail Mail P.O. Box 92191      
{cbosgd,ihnp4}!killer!elg         Lafayette, LA 70509            

Come on, girl, I ain't looking for no fight/You ain't no beauty, but hey
you're alright/And I just need someone to love/tonight

msf@amelia.nas.nasa.gov (Michael S. Fischbein) (02/29/88)

In article <3534@killer.UUCP> elg@killer.UUCP (Eric Green) writes:
>in article <416@micropen>, dave@micropen (David F. Carlson) says:
>[NASA dude discusses running GNU Emacs on his Cray:]
>
>> I know most of these CRAYs are used in DoD research on important things like
>> bombs and SDI, but my running an EDITOR (ie. slow interactive process)
>> on a CRAY--presumably payed for with my tax dollars.  Ouch!  Can't you
>> find any good emacs for a VT100 on a VAX11/780 to run twenty editor jobs?
>
>the "Ouch" that you're talking about is the character-at-a-time task switch

One problem here that no one's brought up (so I will :-)).  The character
at a time problem isn't the Cray-2 context switch, (though it isn't a
context switching speed demon, it does do much better than a 780), but the
communications channel.  You don't hook a terminal to a Cray-2; you don't
even hook a CSMA/CD LAN to it.  You hook a fast token net (such as hyperchannel)
up so those big files (400MB and up) can be transferred in a reasonable
amount of time.  Of course, this leaves you scrambling for those one
character interactive packets.

If I remember correctly (I've got the reference here somewhere, but I'm
sure I'll get flamed if I'm far off -- or even just a little off), tests
on the first Ames Cray-2 showed twenty SIMULATED hot-and-heavy interactive
edits used about 8% of one cpu.  These were all running as canned scripts.
Unfortunately, they also simulated using more than half of the available
"outside world" i/o bandwidth.

		mike

-- 
Michael Fischbein                 msf@ames-nas.arpa
                                  ...!seismo!decuac!csmunix!icase!msf
These are my opinions and not necessarily official views of any
organization.

davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (02/29/88)

In article <416@micropen> dave@micropen (David F. Carlson) writes:
|  [...]
| I know most of these CRAYs are used in DoD research on important things like
| bombs and SDI, but my running an EDITOR (ie. slow interactive process)
| on a CRAY--presumably payed for with my tax dollars.  Ouch!  Can't you
| find any good emacs for a VT100 on a VAX11/780 to run twenty editor jobs?
| (I bet every government facility has tons of workhorse CPU for editor
| sessions rather than that premium CRAY time.)

In the 50's the idea was to maximize use of the CPU, because it was
expensive.  The cost performance of all hardware has dropped, and the
price of software has gone up.  The investment of 2-4 minutes of a
programmer or engineer or physicist's time to move the file to a
"suitable" machine, edit, and move it back is simply not cost effective
as compared with doing short edits on the target machine.

When we first got Cray2 access the administrators didn't want to support
editors. Our argument was that we were paying for the resources and
wanted to use the CPU cycles as we saw fit. We now have a number of
editors on the Cray2, including MicroEMACS, and we feel that it is a
good investment on our part. The edits use a tiny fraction of the total
CPU and memory k-sec we need, and improve the productivity of software
developers.

I agree that the idea of using a Cray2 as an editor is intuitively poor,
but after consideration it is quite cost effective.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (02/29/88)

In article <3534@killer.UUCP> elg@killer.UUCP (Eric Green) writes:
| [...]
| The reason GNU Emacs is slow on a Vax11/780 is simple: thrashing. A
| well-engendered Vaxen has maybe 8 megabytes of main memory. The GNU Emacs core
| image can get up to 2 megabytes large with no problem. Run 20 of those on a
| Vax 780... well, you can see that you better have a hefty swap space, because
| that baby's gonna be swappin' her heart out :-}.

You're right, but the situation isn't as bad as you make it seem. First,
a well engineered VAX would have more memory than that... we use 16MB on
workstations now. Second, a big part of gemacs is the text portion,
which is sharable. The data is quite a bit smaller.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

fouts@orville.nas.nasa.gov (Marty Fouts) (03/01/88)

In article <3534@killer.UUCP> elg@killer.UUCP (Eric Green) writes:
>in article <416@micropen>, dave@micropen (David F. Carlson) says:
>[NASA dude discusses running GNU Emacs on his Cray:]
>
>Hmm. An architectural issue here, maybe. Does the lack of MMU refills etc.
>REALLY speed up process switching time? Are Unix processes "lightweight" as
>far as a Cray 2 is concerned? 

A reply for the NASA dude:

Unix processes on the 2 are variable weight.  The vector registers are
only a small part of context.  The 2 has a 32 KWord (256Kbyte) "local
memory" per processor which is part of context.  This memory is
protected by high water marks, so that the cpu can tell if all, 1/2,
1/4, etc of it were used and only memory up to the dirty highwater
mark is stored and flushed.  I don't have statistics on the typical
case, although I suspect that it is small, because our machines don't
spent very much time context switching (On the order of .01% of all
cycles appear to go to contex switching) 

Worst case is pretty bad however.  The best local to common memory
transfer rate is one word / clock cycle, so a 32K word transfer takes
64K clock cycles (32K out - 32K in) which at 4.2 ns is 275
microseconds.  This puts a limit on the machine of about 3600 context
switches per second.  Since there are four processors, but only one
can be executing the context switch code, this come down to about 900
switch per second on each processor.  I have verified this number by
running a program which forces context switches,  I get about 800 per
second, with the difference as a result of the "work" the program is
doing between context switches.

fouts@orville.nas.nasa.gov (Marty Fouts) (03/01/88)

In article <305@amelia.nas.nasa.gov> msf@amelia.nas.nasa.gov (Michael S. Fischbein) writes:

>If I remember correctly (I've got the reference here somewhere, but I'm
>sure I'll get flamed if I'm far off -- or even just a little off), tests
>on the first Ames Cray-2 showed twenty SIMULATED hot-and-heavy interactive
>edits used about 8% of one cpu.  These were all running as canned scripts.
>Unfortunately, they also simulated using more than half of the available
>"outside world" i/o bandwidth.
>

Actually we never ran those tests.  I believe that numbers from a test
like this were derived at the University of Minnesota, although I
don't know their results.  In attempting to drive the machine with
character traffic from a Vax, we couldn't drive it hard enough to have
a noticable impact using 4 11/780s doing nothing but sending one
character packets to the 2.  The Vaxen just couldn't sent packets fast
enough.

Back when we had an engineering branch they did some simulations that
could be interpretted as showing that interactive editing was very
expensive, but of course these started with assumptions that could be
loosely stated "interactive editing will be very expensive" and went
on to prove precisely that.

We don't have an engineering branch anymore. (;-)

leech@unc.cs.unc.edu (Jonathan Leech) (03/01/88)

Expires:

Sender:

Distribution:

Keywords:


In article <9720@steinmetz.steinmetz.UUCP> davidsen@crdos1.UUCP (bill davidsen) writes:
>he investment of 2-4 minutes of a
>programmer or engineer or physicist's time to move the file to a
>"suitable" machine, edit, and move it back is simply not cost effective
>as compared with doing short edits on the target machine.
>...
>I agree that the idea of using a Cray2 as an editor is intuitively poor,
>but after consideration it is quite cost effective.

    Um, does Cray Unix support NFS? This is another way to offload
editing, assuming that people are not using dumb terminals directly
connected to the Cray, of course. Is this a bad assumption?

    I redirected this to comp.misc since it seems to have nothing to
do with architecture anymore.

    Jon Leech (leech@cs.unc.edu)    __@/
    ``After all, the best part of a holiday is perhaps not so much to be
      resting yourself as to see all the other fellows busy working.''
	- Kenneth Grahame, _The Wind in the Willows_

aglew@ccvaxa.UUCP (03/02/88)

> ...although I wonder if processes that do not use the vector registers 
> do, in fact, flush them...

Gould NP has a VRIU (Vector Register IN Use) bit that is set whenever a
vector register is written to. So processes that do not use the vector
registers do not have to have them flushed (although for security they 
are cleared). Ditto, interrupt handlers do not have to save/restore
the vector registers unless they were in use, and the ISR wants to use them
for something like high speed copies. Moreover, this can apply to a
process that uses vectors in some phases, but not others.
    I imagine other vector machines have similar mechanisms. Can anybody
describe them for us?

While I'm at it, it occurs to me that another flag to cause a trap if a
vector register is read might have advantages.

ohbuchi@unc.cs.unc.edu (Ryutarou Ohbuchi) (03/03/88)

<aglew@ccvaxa.UUCP> writes;
>> ...although I wonder if processes that do not use the vector registers 
>> do, in fact, flush them...
>
>Gould NP has a VRIU (Vector Register IN Use) bit that is set whenever a
>vector register is written to. So processes that do not use the vector
>registers do not have to have them flushed (although for security they 
>are cleared). ......

As I recall, IBM people faced the same problem with their extension
of 370 architecture into VF (Vector Feature (or, was it 'Facility' ?)),
and did the same kind of thing (in 3090/VF).  I do not remember exactly,
but there is some kind of flag that tells whether the vector registers
are used or not, and not every context switch have to save/restore the
vector register.  The literature describing this was in IBM Journal of 
Res. & Dev. (or something like that), in 1987.

==============================================================================
Any opinion expressed here is my own.
------------------------------------------------------------------------------
Ryutarou Ohbuchi	"Life's rock."   "Climb now, work later." and, now,
			"Life's snow."   "Ski now, work later."
ohbuchi@cs.unc.edu	<on csnet>
Department of Computer Science, University of North Carolina at Chapel Hill
==============================================================================

alan@mn-at1.UUCP (Alan Klietz) (03/05/88)

In article <308@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes:
<In article <305@amelia.nas.nasa.gov> msf@amelia.nas.nasa.gov (Michael S. Fischbein) writes:
<>If I remember correctly (I've got the reference here somewhere, but I'm
<>sure I'll get flamed if I'm far off -- or even just a little off), tests
<>on the first Ames Cray-2 showed twenty SIMULATED hot-and-heavy interactive
<>edits used about 8% of one cpu.  These were all running as canned scripts.
<>Unfortunately, they also simulated using more than half of the available
<>"outside world" i/o bandwidth.
<
<Actually we never ran those tests.  I believe that numbers from a test
<like this were derived at the University of Minnesota, although I
<don't know their results. 

I ran those tests in 1985.  The important results were,

1) Cray CPU time is not a significant overhead factor when performing
   simple single-keystroke operations by a reasonable number (10-20) of
   users.

2) The NSC Hyperchannel is not designed for transferring small packets
   of data.  An A130 will saturate at 300 keystrokes/sec, due to the
   large overhead of setting up and tearing down a virtual circuit for
   each message.  The Hyperchannel also does not perform well with
   asynchronous full duplex transmissions (e.g. TCP/IP).  This is due to
   reservation deadlocks between pairs of adapters that attempt to send
   to each other simultaneously over what is really a half duplex trunk.

Hence the development of "rvi" - remote vi that runs "ed" on the Cray-2.

The general solution is to multiplex large numbers of smaller packets
into larger Hyperchannel messages and send them in synchronous alternating
trains.  The problem is a general one, and applies to HSX, HSC, ULTRA,
CNT, VMEbus, as well as Hyperchannel.  See the paper, "An Investigation
and Analysis of High Performance Data-links for Supercomputers", MSCTR112
(MSC Tech Report 112) for a more detailed discussion.

--
Alan Klietz
Minnesota Supercomputer Center (*)
1200 Washington Avenue South
Minneapolis, MN  55415    UUCP:  alan@mn-at1.k.mn.org
Ph: +1 612 626 1836       ARPA:  alan@uc.msc.umn.edu  (was umn-rei-uc.arpa)

(*) An affiliate of the University of Minnesota