fouts@orville.nas.nasa.gov (Marty Fouts) (02/24/88)
In article <9653@steinmetz.steinmetz.UUCP> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes: >While there are always problems which could be solved to another >significant digit with more power, even on the Cray2, which can have up >to 2GB of memory, few problems larger than 500MB are run because the CPU >takes a lot of real time to clear/search/etc it. Until faster CPUs are >common, I doubt that there will be a switch to much larger (ie. 64 bit) >address space, because of market pressures. > You appear to be onfusing two different limits on the Cray 2. The address space is 2^32 words, which are 64 bits. The Cray 2 can address 2GW = 16GB of memory. The 2GB limit on the current implementation is the number of 256kbit parts that can fit into the form factor. Given 8 times the memory density, (at the same power consumption, etc) it would be possible to build a 16GB machine. . . Actually on the each of the two Cray 2s at NASA Ames, there is typically 1 problem running which occupies between 1/3 and 1/2 of memory, (2/3 to 1 GB,) and several more problems in the 10 to 80 GB range, in addition to the interactive unix load of 100+ processes of ~1MB in length. (and the machines don't *ever* swap) Having a large memory is advantageous even if a single problem can't occupy all of it. The work load we run every day would cause most supercomputers to spend much of their time doing memory management. By keeping around 20 active jobs, we can still give good interactive response, and "batch" throughput an order of magnitude better than most mainframes. You haven't lived until you've run on a machine where GNU emacs is considered a small process. ;-) One of the ways having a lot of memory helps with slow processor machines is in the space/time tradeoff. It is possible for an application to compute all of the sin/cos values it is going to need and stuff them away in a table, and then do table lookup, for example.
dave@micropen (David F. Carlson) (02/25/88)
In article <235@amelia.nas.nasa.gov>, fouts@orville.nas.nasa.gov (Marty Fouts) writes: > In article <9653@steinmetz.steinmetz.UUCP> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes: > > >While there are always problems which could be solved to another > >significant digit with more power, even on the Cray2, which can have up > > By keeping around 20 active jobs, we can still give good interactive > response, and "batch" throughput an order of magnitude better than > most mainframes. You haven't lived until you've run on a machine > where GNU emacs is considered a small process. ;-) I know most of these CRAYs are used in DoD research on important things like bombs and SDI, but my running an EDITOR (ie. slow interactive process) on a CRAY--presumably payed for with my tax dollars. Ouch! Can't you find any good emacs for a VT100 on a VAX11/780 to run twenty editor jobs? (I bet every government facility has tons of workhorse CPU for editor sessions rather than that premium CRAY time.) -- David F. Carlson, Micropen, Inc. ...!{ames|harvard|rutgers|topaz|...}!rochester!ur-valhalla!micropen!dave "The faster I go, the behinder I get." --Lewis Carroll
davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (02/26/88)
In article <235@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes: | In article <9653@steinmetz.steinmetz.UUCP> davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes: | | >While there are always problems which could be solved to another | >significant digit with more power, even on the Cray2, which can have up | >to 2GB of memory, few problems larger than 500MB are run because the CPU | > [...] | | You appear to be onfusing two different limits on the Cray 2. The | address space is 2^32 words, which are 64 bits. The Cray 2 can | address 2GW = 16GB of memory. The 2GB limit on the current | implementation is the number of 256kbit parts that can fit into the As I said, the Cray2 can have 2GB. Since the memory is virtual I don't *care* how much it can address, only what can be used. Currently that's 2GB. I don't *feel* confused ;-> -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
swami@uiucdcsp.cs.uiuc.edu (02/27/88)
> Why run editors (slow interactive processes) on premium Cray-2 time?
Because if you only need a couple of small changes - like fixing that little
syntax error or bug in your fortran program - the cost of transferring the
file to a workstation for editing, and back again, can be more than that of
running an editor on the Cray. Of course, for extended editing sessions, you
wouldn't want to waste Cray time.
swami@b.cs.uiuc.edu
{ihnp4, pur-ee, convex}!uiucdcs!swami
"Maybe this opinion does represent that of my employers?" :-)
brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (02/29/88)
In article <76700008@uiucdcsp> swami@uiucdcsp.cs.uiuc.edu writes: >syntax error or bug in your fortran program - the cost of transferring the >file to a workstation for editing, and back again, can be more than that of >running an editor on the Cray. Of course, for extended editing sessions, you >wouldn't want to waste Cray time. You mean that for extended editing sessions, you wouldn't want to WAIT for Cray time.
elg@killer.UUCP (Eric Green) (02/29/88)
in article <416@micropen>, dave@micropen (David F. Carlson) says: [NASA dude discusses running GNU Emacs on his Cray:] >> By keeping around 20 active jobs, we can still give good interactive >> response, and "batch" throughput an order of magnitude better than >> most mainframes. You haven't lived until you've run on a machine >> where GNU emacs is considered a small process. ;-) > > I know most of these CRAYs are used in DoD research on important things like > bombs and SDI, but my running an EDITOR (ie. slow interactive process) > on a CRAY--presumably payed for with my tax dollars. Ouch! Can't you > find any good emacs for a VT100 on a VAX11/780 to run twenty editor jobs? The reason GNU Emacs is slow on a Vax11/780 is simple: thrashing. A well-engendered Vaxen has maybe 8 megabytes of main memory. The GNU Emacs core image can get up to 2 megabytes large with no problem. Run 20 of those on a Vax 780... well, you can see that you better have a hefty swap space, because that baby's gonna be swappin' her heart out :-}. But of course you don't have that thrashing on a Cray. Not with 2 Gigawords of RAM! Not to mention that the lack of paging eliminates the task-switching task of flushing (and eventually reloading) the MMU TLB's (and replaces it with the chore of flushing the vector registers, alas -- although I wonder if processes that do not use the vector registers do, in fact, flush them). I assume that the "Ouch" that you're talking about is the character-at-a-time task switch overhead. It appears that task switch overhead on a Cray would be no worse than on a Vax 780 -- on a machine that's, well, slightly faster :-). Somehow, I think the task switch time for 20 interactive Emacsen would be in the noise, insofar as percentage of CPU time is concerned. "Slow interactive process"? Surely you jest. You've been using an IBM 370 too long... 2 second interrupt latencies went out with the 60's! (though IBM doesn't seem to have noticed :-). Somehow, I don't think that 20 Emacs processes are about to bring a Cray ][ to it's knees... I really truly doubt that the CPU time saved by doing editing elsewhere, could be justified by paying all those programmers and scientists for the time involved in moving that text over to the Cray (and it can be a hassle... at the very least, you have to invoke FTP over a network, and at the worst, we're talking major troubles). Hmm. An architectural issue here, maybe. Does the lack of MMU refills etc. REALLY speed up process switching time? Are Unix processes "lightweight" as far as a Cray 2 is concerned? -- Eric Lee Green elg@usl.CSNET Snail Mail P.O. Box 92191 {cbosgd,ihnp4}!killer!elg Lafayette, LA 70509 Come on, girl, I ain't looking for no fight/You ain't no beauty, but hey you're alright/And I just need someone to love/tonight
msf@amelia.nas.nasa.gov (Michael S. Fischbein) (02/29/88)
In article <3534@killer.UUCP> elg@killer.UUCP (Eric Green) writes: >in article <416@micropen>, dave@micropen (David F. Carlson) says: >[NASA dude discusses running GNU Emacs on his Cray:] > >> I know most of these CRAYs are used in DoD research on important things like >> bombs and SDI, but my running an EDITOR (ie. slow interactive process) >> on a CRAY--presumably payed for with my tax dollars. Ouch! Can't you >> find any good emacs for a VT100 on a VAX11/780 to run twenty editor jobs? > >the "Ouch" that you're talking about is the character-at-a-time task switch One problem here that no one's brought up (so I will :-)). The character at a time problem isn't the Cray-2 context switch, (though it isn't a context switching speed demon, it does do much better than a 780), but the communications channel. You don't hook a terminal to a Cray-2; you don't even hook a CSMA/CD LAN to it. You hook a fast token net (such as hyperchannel) up so those big files (400MB and up) can be transferred in a reasonable amount of time. Of course, this leaves you scrambling for those one character interactive packets. If I remember correctly (I've got the reference here somewhere, but I'm sure I'll get flamed if I'm far off -- or even just a little off), tests on the first Ames Cray-2 showed twenty SIMULATED hot-and-heavy interactive edits used about 8% of one cpu. These were all running as canned scripts. Unfortunately, they also simulated using more than half of the available "outside world" i/o bandwidth. mike -- Michael Fischbein msf@ames-nas.arpa ...!seismo!decuac!csmunix!icase!msf These are my opinions and not necessarily official views of any organization.
davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (02/29/88)
In article <416@micropen> dave@micropen (David F. Carlson) writes: | [...] | I know most of these CRAYs are used in DoD research on important things like | bombs and SDI, but my running an EDITOR (ie. slow interactive process) | on a CRAY--presumably payed for with my tax dollars. Ouch! Can't you | find any good emacs for a VT100 on a VAX11/780 to run twenty editor jobs? | (I bet every government facility has tons of workhorse CPU for editor | sessions rather than that premium CRAY time.) In the 50's the idea was to maximize use of the CPU, because it was expensive. The cost performance of all hardware has dropped, and the price of software has gone up. The investment of 2-4 minutes of a programmer or engineer or physicist's time to move the file to a "suitable" machine, edit, and move it back is simply not cost effective as compared with doing short edits on the target machine. When we first got Cray2 access the administrators didn't want to support editors. Our argument was that we were paying for the resources and wanted to use the CPU cycles as we saw fit. We now have a number of editors on the Cray2, including MicroEMACS, and we feel that it is a good investment on our part. The edits use a tiny fraction of the total CPU and memory k-sec we need, and improve the productivity of software developers. I agree that the idea of using a Cray2 as an editor is intuitively poor, but after consideration it is quite cost effective. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
davidsen@steinmetz.steinmetz.UUCP (William E. Davidsen Jr) (02/29/88)
In article <3534@killer.UUCP> elg@killer.UUCP (Eric Green) writes: | [...] | The reason GNU Emacs is slow on a Vax11/780 is simple: thrashing. A | well-engendered Vaxen has maybe 8 megabytes of main memory. The GNU Emacs core | image can get up to 2 megabytes large with no problem. Run 20 of those on a | Vax 780... well, you can see that you better have a hefty swap space, because | that baby's gonna be swappin' her heart out :-}. You're right, but the situation isn't as bad as you make it seem. First, a well engineered VAX would have more memory than that... we use 16MB on workstations now. Second, a big part of gemacs is the text portion, which is sharable. The data is quite a bit smaller. -- bill davidsen (wedu@ge-crd.arpa) {uunet | philabs | seismo}!steinmetz!crdos1!davidsen "Stupidity, like virtue, is its own reward" -me
fouts@orville.nas.nasa.gov (Marty Fouts) (03/01/88)
In article <3534@killer.UUCP> elg@killer.UUCP (Eric Green) writes: >in article <416@micropen>, dave@micropen (David F. Carlson) says: >[NASA dude discusses running GNU Emacs on his Cray:] > >Hmm. An architectural issue here, maybe. Does the lack of MMU refills etc. >REALLY speed up process switching time? Are Unix processes "lightweight" as >far as a Cray 2 is concerned? A reply for the NASA dude: Unix processes on the 2 are variable weight. The vector registers are only a small part of context. The 2 has a 32 KWord (256Kbyte) "local memory" per processor which is part of context. This memory is protected by high water marks, so that the cpu can tell if all, 1/2, 1/4, etc of it were used and only memory up to the dirty highwater mark is stored and flushed. I don't have statistics on the typical case, although I suspect that it is small, because our machines don't spent very much time context switching (On the order of .01% of all cycles appear to go to contex switching) Worst case is pretty bad however. The best local to common memory transfer rate is one word / clock cycle, so a 32K word transfer takes 64K clock cycles (32K out - 32K in) which at 4.2 ns is 275 microseconds. This puts a limit on the machine of about 3600 context switches per second. Since there are four processors, but only one can be executing the context switch code, this come down to about 900 switch per second on each processor. I have verified this number by running a program which forces context switches, I get about 800 per second, with the difference as a result of the "work" the program is doing between context switches.
fouts@orville.nas.nasa.gov (Marty Fouts) (03/01/88)
In article <305@amelia.nas.nasa.gov> msf@amelia.nas.nasa.gov (Michael S. Fischbein) writes: >If I remember correctly (I've got the reference here somewhere, but I'm >sure I'll get flamed if I'm far off -- or even just a little off), tests >on the first Ames Cray-2 showed twenty SIMULATED hot-and-heavy interactive >edits used about 8% of one cpu. These were all running as canned scripts. >Unfortunately, they also simulated using more than half of the available >"outside world" i/o bandwidth. > Actually we never ran those tests. I believe that numbers from a test like this were derived at the University of Minnesota, although I don't know their results. In attempting to drive the machine with character traffic from a Vax, we couldn't drive it hard enough to have a noticable impact using 4 11/780s doing nothing but sending one character packets to the 2. The Vaxen just couldn't sent packets fast enough. Back when we had an engineering branch they did some simulations that could be interpretted as showing that interactive editing was very expensive, but of course these started with assumptions that could be loosely stated "interactive editing will be very expensive" and went on to prove precisely that. We don't have an engineering branch anymore. (;-)
leech@unc.cs.unc.edu (Jonathan Leech) (03/01/88)
Expires: Sender: Distribution: Keywords: In article <9720@steinmetz.steinmetz.UUCP> davidsen@crdos1.UUCP (bill davidsen) writes: >he investment of 2-4 minutes of a >programmer or engineer or physicist's time to move the file to a >"suitable" machine, edit, and move it back is simply not cost effective >as compared with doing short edits on the target machine. >... >I agree that the idea of using a Cray2 as an editor is intuitively poor, >but after consideration it is quite cost effective. Um, does Cray Unix support NFS? This is another way to offload editing, assuming that people are not using dumb terminals directly connected to the Cray, of course. Is this a bad assumption? I redirected this to comp.misc since it seems to have nothing to do with architecture anymore. Jon Leech (leech@cs.unc.edu) __@/ ``After all, the best part of a holiday is perhaps not so much to be resting yourself as to see all the other fellows busy working.'' - Kenneth Grahame, _The Wind in the Willows_
aglew@ccvaxa.UUCP (03/02/88)
> ...although I wonder if processes that do not use the vector registers > do, in fact, flush them... Gould NP has a VRIU (Vector Register IN Use) bit that is set whenever a vector register is written to. So processes that do not use the vector registers do not have to have them flushed (although for security they are cleared). Ditto, interrupt handlers do not have to save/restore the vector registers unless they were in use, and the ISR wants to use them for something like high speed copies. Moreover, this can apply to a process that uses vectors in some phases, but not others. I imagine other vector machines have similar mechanisms. Can anybody describe them for us? While I'm at it, it occurs to me that another flag to cause a trap if a vector register is read might have advantages.
ohbuchi@unc.cs.unc.edu (Ryutarou Ohbuchi) (03/03/88)
<aglew@ccvaxa.UUCP> writes; >> ...although I wonder if processes that do not use the vector registers >> do, in fact, flush them... > >Gould NP has a VRIU (Vector Register IN Use) bit that is set whenever a >vector register is written to. So processes that do not use the vector >registers do not have to have them flushed (although for security they >are cleared). ...... As I recall, IBM people faced the same problem with their extension of 370 architecture into VF (Vector Feature (or, was it 'Facility' ?)), and did the same kind of thing (in 3090/VF). I do not remember exactly, but there is some kind of flag that tells whether the vector registers are used or not, and not every context switch have to save/restore the vector register. The literature describing this was in IBM Journal of Res. & Dev. (or something like that), in 1987. ============================================================================== Any opinion expressed here is my own. ------------------------------------------------------------------------------ Ryutarou Ohbuchi "Life's rock." "Climb now, work later." and, now, "Life's snow." "Ski now, work later." ohbuchi@cs.unc.edu <on csnet> Department of Computer Science, University of North Carolina at Chapel Hill ==============================================================================
alan@mn-at1.UUCP (Alan Klietz) (03/05/88)
In article <308@amelia.nas.nasa.gov> fouts@orville.nas.nasa.gov (Marty Fouts) writes: <In article <305@amelia.nas.nasa.gov> msf@amelia.nas.nasa.gov (Michael S. Fischbein) writes: <>If I remember correctly (I've got the reference here somewhere, but I'm <>sure I'll get flamed if I'm far off -- or even just a little off), tests <>on the first Ames Cray-2 showed twenty SIMULATED hot-and-heavy interactive <>edits used about 8% of one cpu. These were all running as canned scripts. <>Unfortunately, they also simulated using more than half of the available <>"outside world" i/o bandwidth. < <Actually we never ran those tests. I believe that numbers from a test <like this were derived at the University of Minnesota, although I <don't know their results. I ran those tests in 1985. The important results were, 1) Cray CPU time is not a significant overhead factor when performing simple single-keystroke operations by a reasonable number (10-20) of users. 2) The NSC Hyperchannel is not designed for transferring small packets of data. An A130 will saturate at 300 keystrokes/sec, due to the large overhead of setting up and tearing down a virtual circuit for each message. The Hyperchannel also does not perform well with asynchronous full duplex transmissions (e.g. TCP/IP). This is due to reservation deadlocks between pairs of adapters that attempt to send to each other simultaneously over what is really a half duplex trunk. Hence the development of "rvi" - remote vi that runs "ed" on the Cray-2. The general solution is to multiplex large numbers of smaller packets into larger Hyperchannel messages and send them in synchronous alternating trains. The problem is a general one, and applies to HSX, HSC, ULTRA, CNT, VMEbus, as well as Hyperchannel. See the paper, "An Investigation and Analysis of High Performance Data-links for Supercomputers", MSCTR112 (MSC Tech Report 112) for a more detailed discussion. -- Alan Klietz Minnesota Supercomputer Center (*) 1200 Washington Avenue South Minneapolis, MN 55415 UUCP: alan@mn-at1.k.mn.org Ph: +1 612 626 1836 ARPA: alan@uc.msc.umn.edu (was umn-rei-uc.arpa) (*) An affiliate of the University of Minnesota