[net.arch] Reasons For Large Main Memories

rjn@duke.UUCP (R. James Nusbaum) (08/30/86)

Some very good reasons for large main memories seem to have been ignored
in the recent discussion.  For a multiprocess machine large main memories
obviously eliminate paging, which is a significant system overhead.  There
are also certain single user systems which can easily use 10 meg or more
of address space.  These systems are Lisp systems.  It is easy and in fact
desirable to have a very large main memory so that the entire system can
reside in silicon and not on disk.  This very problem crops up in HP's new
line of AI workstations based on 68020s with 8 meg of main memory.  A simple
application can easily build a core image of well over 8 meg.  When it comes
time to garbage collect, the garbage collection process must swap pages in
and out of main memory.  Since the disks in these machines are fairly slow
garbage collection can take 30 or 40 seconds.  Paging can also cause problems
when traversing a large data structure (data structures of over 2 meg are
easy to build) which resides on multiple pages.  While it is true that
sophisticated software can overcome some of the paging problems, in many
cases the cost of more memory is less than the total cost of developing
this software.  In todays world silicon is often cheaper than programmer time.

Jim Nusbaum
Duke University CS Dept.
Durham,  NC

peters@cubsvax.UUCP (Peter S. Shenkin) (09/01/86)

In article <duke.8494> rjn@duke.UUCP (R. James Nusbaum) writes:
>
>Some very good reasons for large main memories seem to have been ignored
>in the recent discussion....  					There
>are... certain single user systems which can easily use 10 meg or more
>of address space.  These systems are Lisp systems....

And how about APL machines?

Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027
{philabs,rna}!cubsvax!peters		cubsvax!peters@columbia.ARPA

hsu@eneevax.UUCP (Dave Hsu) (09/01/86)

In article <8494@duke.duke.UUCP> rjn@duke.UUCP (R. James Nusbaum) writes:
>
>Some very good reasons for large main memories seem to have been ignored
>in the recent discussion.
>[deleted text about paging; large data structures]
>...  While it is true that
>sophisticated software can overcome some of the paging problems, in many
>cases the cost of more memory is less than the total cost of developing
>this software.  In todays world silicon is often cheaper than programmer time.
>
>Jim Nusbaum
>Duke University CS Dept.
>Durham,  NC

All good and well, Jim, but not the topic of discussion.  By `large' we
mean on the order of Gbytes, not Mbytes, at which point various other
factors in the system design can degrade performance substantially.
With a database of hundreds of Gbytes, the paging problems facing a
hypothetical programmer with one or two Gbytes of RAM can actually be much
worse than those facing a programmer with one or two Mbytes worth.
All it takes is a few zillion page faults, and the time lost in page-
search-and-replacement can easily exceed the time gained by a slightly
reduced number of accesses.

babbling again,

-dave
-- 
David Hsu  (301) 454-1433 || -8798 || -8715	"I know no-thing!" -eneevax
Communications & Signal Processing Laboratory	/ EE Systems Staff
Systems Research Center, Bldg 093		/ Engineering Computer Facility
The University of Maryland   -~-   College Park, MD 20742
ARPA: hsu@eneevax.umd.edu    UUCP: [seismo,allegra,rlgvax]!umcp-cs!eneevax!hsu

"Electro-nuclear carburetion seems fine..."

sewilco@mecc.UUCP (Scot E. Wilcoxon) (09/02/86)

Since most net.arch readers probably don't usually read Business Week
(I was reading it for tax change info :-) I'll point out "Giving Computers
an Elephant's Memory" in BW, Sept 1 86, pg 50.  Mostly about Princeton's M3
(Massive Memory Machine), but also mentions some applications for it and
several companies working on huge memories.

As a mostly-application programmer, I'm deferring comments on the present
pagefaults-and-other-problems discussion to those more knowledgeable.
As a system administrator and sometimes OS tinkerer, I expect a machine
with huge memories will let it be tuned through OS parameters (we're
currently replacing three machines partially because SCO didn't give
us a way to configure XENIX* on them).


* XENIX is a trademark of Microsoft Corporation.  Nobody else wanted it.
-- 
Scot E. Wilcoxon    Minn Ed Comp Corp  {quest,dicome,meccts}!mecc!sewilco
45 03 N  93 08 W (612)481-3507 {{caip!meccts},ihnp4,philabs}!mecc!sewilco
	Laws are society's common sense, recorded for the stupid.
	The alert question everything anyway.

rjn@duke.UUCP (R. James Nusbaum) (09/04/86)

In article <147@eneevax.UUCP> hsu@eneevax.UUCP (Dave Hsu) writes:
>
>All good and well, Jim, but not the topic of discussion.  By `large' we
>mean on the order of Gbytes, not Mbytes, at which point various other
>factors in the system design can degrade performance substantially.
>With a database of hundreds of Gbytes, the paging problems facing a
>hypothetical programmer with one or two Gbytes of RAM can actually be much
>worse than those facing a programmer with one or two Mbytes worth.
>All it takes is a few zillion page faults, and the time lost in page-
>search-and-replacement can easily exceed the time gained by a slightly
>reduced number of accesses.
>
>babbling again,
>
>-dave
>-- 
>David Hsu  (301) 454-1433 || -8798 || -8715	"I know no-thing!" -eneevax

Your point is also true.  However many Lisp systems are single user systems.
Having even a couple of hundred megawords of memory could totally eliminate
paging, as the entire system could be resident.  Take the Symbolics for
instance.  It has a 200Mb disk used for paging and some middle amount of 
memory (maybe 16Mb).  We have run many complex systems in this 200Mb size.
If we had 200Mb of memory then we would have no need for paging at all!  This
would significantly speed up many applications.

Even 200Mb won't be enough in the near future for some applications, mainly
the VLSI cad systems I've worked on.  And considering the large word size
of some Lisp machine designs (in order to have tag bits) 200Mb can work out
to be only 40Mw.  My whole point, which I admit I didn't get across well, was
that some systems could totally eliminate paging by having huge amounts of
actual memory.  These systems would generally be single user systems, like
Lisp, APL and possibly Prolog machines.

Jim Nusbaum

bzs@bu-cs.BU.EDU (Barry Shein) (09/06/86)

From: rjn@duke.UUCP (R. James Nusbaum)
>Your point is also true.  However many Lisp systems are single user systems.
>Having even a couple of hundred megawords of memory could totally eliminate
>paging, as the entire system could be resident.  Take the Symbolics for
>instance.  It has a 200Mb disk used for paging and some middle amount of 
>memory (maybe 16Mb).  We have run many complex systems in this 200Mb size.
>If we had 200Mb of memory then we would have no need for paging at all!  This
>would significantly speed up many applications.

But you still miss the point. In the first place, move this to GigaBytes,
obviously adding a few more megabytes is not interesting (so why do people
keep going back to it?) Even 200MB is quite different than 5X that or
more.

Now, ok, you eliminated paging, garbage collection has become a field
service thing. But, could you do anything useful with all that memory
and that (relatively) itty-bitty processor? How long would it take you
to do a MEMQ of a list of a few HUNDRED MILLION lisp objects long?
etc.  Paging is only an advantage at some interim and relatively low
point (100-200MB perhaps, probably less.)

Now, I could see a lisp-machine type saying that only garbage
collecting on Xmas day would be a big win, if they could actually
accomplish that, and would otherwise run 'reasonable' programs.
Somehow though I suspect it won't work that way. More like the machine
down every Sunday for garbage collection (though I suppose many
long-running applications could be checkpointed to disk and a simple
re-boot would suffice.)

There might be hope here, it's more or less like using memory in
a way similar to a write-once CD, you "never" reuse scratch memory,
when you run out you put a new disk in (ie. re-boot.)

Ok, but still obviously not cost-effective (the magic ingredient that
gets thrown out whenever this topic starts getting discussed, 1GB of
memory is going to cost you around $1M [there's more than chips
involved, don't just tell me the 1MB price * 1,000, you'll need
backplane, power-supplies etc etc, if your figures come to $500K,
fine, same thing], how many of us would buy a lisp-machine to run one
application for $500-$1M? $250K?)

	-Barry Shein, Boston University

henry@utzoo.UUCP (Henry Spencer) (09/07/86)

> ...many Lisp systems are single user systems.
> Having even a couple of hundred megawords of memory could totally eliminate
> paging, as the entire system could be resident...

Well, today it could be.  Two years from now, in the Lisp community, who
knows?  If you drop the ability to do virtual memory, then there is an
awfully sharp cost jump when a program goes from 200MW of memory to 200M+1W.
All of a sudden none of those 200MW machines can run it any more.

There is a lot to be said for memory-rich systems, particularly for Lisp
applications, but discarding virtual memory is throwing out the baby with
the bathwater.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,decvax,pyramid}!utzoo!henry

preece@ccvaxa.UUCP (09/08/86)

> /* Written  4:40 pm  Sep  5, 1986 by bzs@bu-cs.BU.EDU in ccvaxa:net.arch */

> Now, ok, you eliminated paging, garbage collection has become a field
> service thing. But, could you do anything useful with all that memory
> and that (relatively) itty-bitty processor? How long would it take you
> to do a MEMQ of a list of a few HUNDRED MILLION lisp objects long?
> etc.
----------
Have you not heard of indexes?  Hash tables?  Arrays?  There are any
number of ways to use memory that don't involve scanning it.  A 2GB
database of bibliographic references (such as Dialog or LEXIS might
have) would run A LOT FASTER if it lived in main memory and wasn't
pounding on the disk interface all the time.  That's not a single-user
example in today's world (it could be eventually, marketed to
libraries and universities), but I have no doubt you can think of
applications that are fully indexed, doing few scans, but using
many, many bytes of data.  The reason you want it in memory is that
you don't know which bytes you'll want or what order you'll want them
in.
----------
> Ok, but still obviously not cost-effective (the magic ingredient that
> gets thrown out whenever this topic starts getting discussed, 1GB of
> memory is going to cost you around $1M
----------
Please allow the user to determine what's cost effective.  If what
you're doing is evaluating huge matrices of seismic data and you
need to do them in N hours because there's a rights auction coming
up, a million may not be an impediment, especially if the machine
can be used as a time-shared engine at times when there isn't a need
for haste.

Betting on those prices holding is about as sensible as betting on
access times holding...

-- 
scott preece
gould/csd - urbana
uucp:	ihnp4!uiucdcs!ccvaxa!preece
arpa:	preece@gswd-vms

rjn@duke.UUCP (R. James Nusbaum) (09/08/86)

In article <1161@bu-cs.bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>
>>From: rjn@duke.UUCP (R. James Nusbaum)
>>Your point is also true.  However many Lisp systems are single user systems.
>>Having even a couple of hundred megawords of memory could totally eliminate
	[ omitted to save space]
>>If we had 200Mb of memory then we would have no need for paging at all!  This
>>would significantly speed up many applications.
>
>But you still miss the point. In the first place, move this to GigaBytes,
>obviously adding a few more megabytes is not interesting (so why do people
>keep going back to it?) Even 200MB is quite different than 5X that or
>more.
>Now, ok, you eliminated paging, garbage collection has become a field
>service thing. But, could you do anything useful with all that memory
>and that (relatively) itty-bitty processor? How long would it take you
>to do a MEMQ of a list of a few HUNDRED MILLION lisp objects long?
>etc.  Paging is only an advantage at some interim and relatively low
>point (100-200MB perhaps, probably less.)

Anyone who would have a list 100,000,000 objects long is a a poor Lisp
programmer.  In case you haven't heard Lisp has things called arrays and
yes, I know of many things which I could do with that itty-bitty
processor.  The following is an example:

I have a VLSI CAD system which divides the chip surface into small squares.
Each square can be occupied by a cell (a circuit description).  I would
love to have at least a 512x512 array of these cells.  Note: a cell may
be an empty cell, no circuitry.  That means I need 262,144 locations in
my array, each containing a cell pointer (32 bits).  In a full array
each pointer would point to an object (not a Lisp object, an object-oriented
programming object) which I would like to be large.  The larger the
object the faster it is to do things like repaint the screen because of
less information sharing and pointer chasing.  Let's assume a 1k object.
We're already up to 268,440,000 bytes.  Now add simulation data to the 
objects, now add color to the visual description of the objects, now
add different representations, now add the code to handle these objects.
We are already over 1Gb, I assure you.

Now throw in the monkey wrench which would kill any smart paging scheme of
a virtual memory system.  It's called the user.  The user is viewing
these objects on his screen.  He can move his view to any location on
the array, he can scroll the screen, he can zoom in or out, he can
change voltage levels at any node in the circuit.

Now if I can hold all these objects in main memory then the user won't
have to wait for multiple pages to be made resident when he decides
to look at a different part of the circuit.  Users, especially non-
technical ones, hate to wait.

I'm not saying that an application like this cannot be done in much
less memory, it has been.  But even on a Symbolics (pretty much
the fastest widely available Lisp machine) it is slow, slow, SLOW.

>Now, I could see a lisp-machine type saying that only garbage
>collecting on Xmas day would be a big win, if they could actually
>accomplish that, and would otherwise run 'reasonable' programs.
>Somehow though I suspect it won't work that way. More like the machine
>down every Sunday for garbage collection (though I suppose many
>long-running applications could be checkpointed to disk and a simple
>re-boot would suffice.)

This is pretty much exactly what happens.  To my knowledge the Symbolics
at a certain university (not the one I'm at now) has never had to
garbage collect during a run session.  When a session is over the world
is saved and a garbage collection is run.  This is really a function
of a large virtual memory space, and has nothing much to do with
physical memory space.  

It certainly would take a long time to garbage collect 1-4Gb. Fortunately
there are things like incremental garbage collection, parallel garbage
collection, and garbage collection hardware to help out.  All I've ever 
said is that a global garbage collection would be much faster if the whole 
world were resident in physical memory.

>Ok, but still obviously not cost-effective (the magic ingredient that
>gets thrown out whenever this topic starts getting discussed, 1GB of
>memory is going to cost you around $1M [there's more than chips
>involved, don't just tell me the 1MB price * 1,000, you'll need
>backplane, power-supplies etc etc, if your figures come to $500K,
>fine, same thing], how many of us would buy a lisp-machine to run one
>application for $500-$1M? $250K?)
>
>	-Barry Shein, Boston University

If you're going to use todays cost figures the whole discussion, which
originally asked about large physical memories for minicomputers, becomes
a little ridiculous.  I certainly expect memory prices to keep going
down until the physical limits of chip manufacturing are reached.  I
think we are still a good ways away from that wall.

I will argue no more.  I think that if you give me memory for free, I'll
take as much as I can get for a single-user system, up to the limit of
the logical address space.

Jim Nusbaum

jqj@gvax.cs.cornell.edu (J Q Johnson) (09/08/86)

Barry Shein writes, discussing large-memory lisp applications:

>Now, ok, you eliminated paging, garbage collection has become a field
>service thing. But, could you do anything useful with all that memory
>and that (relatively) itty-bitty processor? How long would it take you
>to do a MEMQ of a list of a few HUNDRED MILLION lisp objects long?
>etc.  Paging is only an advantage at some interim and relatively low
>point (100-200MB perhaps, probably less.)

I think I, and perhaps others, have lost the thread of the argument.  Barry's
original attack on very large memories seemed to be that any system frequently
needed to perform background/service tasks that were linear in the size of
the memory (e.g. zeroing memory, or scanning for dirty pages to be written
to disk).  I'm not convinced that he has offered any evidence that any service
task in a reasonable lisp machine needs to be linear in memory.  GC can be
incremental (hence linear in CONSing rate).  CONSing itself is constant-time.
User access to memory is by definition linear in use, hence more (unused)
memory doesn't hurt. 

The quoted paragraph makes a slightly different argument, I think. It claims
that a lisp application can't benefit from a very large address space, WHETHER
OR NOT it is virtual or backed by real memory.  In fact, I think this argument
is incorrect; none of the really big lisp systems that I know of organize
data into a really big list to MEMQ down.

Having a few hundred million lisp objects is quite reasonable.  But generally
they will be in a very complex data structure, so that access to an arbitrary
lisp object requires sublinear time (e.g. a binary tree, implying that getting
a particular object would take only log(10^8) [maybe order 30?] accesses for
that 100 million example.

Have I missed something important here?

tuba@ur-tut.UUCP (Jon Krueger) (09/11/86)

In article <8529@duke.duke.UUCP> rjn@duke.UUCP (R. James Nusbaum) writes:
>I have a VLSI CAD system which divides the chip surface into small squares.
>The user is viewing these objects on his screen.  He can move his view to
>any location on the array, he can scroll the screen, he can zoom in or out,
>he can change voltage levels at any node in the circuit.
>
>Now if I can hold all these objects in main memory then the user won't
>have to wait for multiple pages to be made resident when he decides
>to look at a different part of the circuit.  Users, especially non-
>technical ones, hate to wait.

1) You have non-technical users who use your VLSI CAD system?

2) You find it unacceptable to wait for objects to page in, but it's
acceptable to wait for the entire working set to be made resident
every time you run the system or look at a new chip?  Seems like
the latter will take a lot longer than the former.  Now you'll
argue, "Sure, but it'll happen less often!"  How true.  It'll
also happen less often that a user will examine an less-recently
examined object than a more-recently-examined.

In other words, you don't want to make the user wait for some objects
more than others.  He should be able to perform any operation on any
object with good, predictable, response time.  It's ok that you may
waste memory in keeping resident most objects which are never required.
Perhaps your application is small enough and your memory is large
enough so you can afford to waste memory.  But there will be a cost
associated with doing business this way: startup time will be
the time required to make the entire chip representation resident,
roughly the speed of your disk.  You'll pay this price whenever you
save changes to your chip, examine a new chip, or quit the application.
If you think paging in a few pages is slow, wait until you see
how long it takes to completely read in or write out every page,
whether needed or not, accessed or not, modified or not.

>I will argue no more.  I think that if you give me memory for free, I'll
>take as much as I can get for a single-user system, up to the limit of
>the logical address space.
>
>Jim Nusbaum

If you give me address bits for free, I'll take as much as I can, up to some
reasonable limit (2^200 perhaps?), and build environments and applications
that make smooth and intelligent tradeoffs between current memory prices,
application sizes, and required performance levels, and that automatically
move the tradeoffs to reflect changes in memory prices.  A virtual memory
system with LRU demand paging is the popular solution.  I'm sure better ones
will emerge.  Ever-bigger physical configurations are not a solution.
They're a new set of problems...

					-- jon
-- 
--> Jon Krueger
uucp: {seismo, allegra, decvax, cmcl2, topaz, harvard}!rochester!ur-tut!tuba
Phone: (716) 275-2811 work, 473-4124 home	BITNET: TUBA@UORDBV
USMAIL:  Taylor Hall, University of Rochester, Rochester NY  14627 

rjn@duke.UUCP (R. James Nusbaum) (09/11/86)

In article <494@gvax.cs.cornell.edu> jqj@gvax.cs.cornell.edu (J Q Johnson) writes:
>Barry Shein writes, discussing large-memory lisp applications:
>
>>Now, ok, you eliminated paging, garbage collection has become a field
>>service thing. But, could you do anything useful with all that memory
>>and that (relatively) itty-bitty processor? How long would it take you
>>to do a MEMQ of a list of a few HUNDRED MILLION lisp objects long?
>>etc.  Paging is only an advantage at some interim and relatively low
>>point (100-200MB perhaps, probably less.)
>
>I think I, and perhaps others, have lost the thread of the argument.  Barry's
>original attack on very large memories seemed to be that any system frequently
>needed to perform background/service tasks that were linear in the size of

	[deleted rebuttal to Mr. Shein]

>they will be in a very complex data structure, so that access to an arbitrary
>lisp object requires sublinear time (e.g. a binary tree, implying that getting
>a particular object would take only log(10^8) [maybe order 30?] accesses for
>that 100 million example.
>
>Have I missed something important here?

No, that's exactly what I have been trying to say.  Memory access in many
symbolic applications uses complex indexing and search strategies.  The access
is neither linear nor complete.  An application may build a complex data
structure and then not even use most of it!!  There are too many number
crunchers out there.  We aren't all doing matrix and array equations.

Jim Nusbaum


-- 
R. James Nusbaum, Duke University Computer Science Department,
Durham NC 27706-2591. Phone (919)684-5110.
CSNET: rjn@duke        UUCP: {ihnp4!decvax}!duke!rjn
ARPA: rjn%duke@csnet-relay

jlg@lanl.ARPA (Jim Giles) (09/12/86)

In article <672@ur-tut.UUCP> tuba@ur-tut.UUCP (Jon Krueger) writes:
>...
>2) You find it unacceptable to wait for objects to page in, but it's
>acceptable to wait for the entire working set to be made resident
>every time you run the system or look at a new chip?  Seems like
>the latter will take a lot longer than the former.  Now you'll
>argue, "Sure, but it'll happen less often!"  How true.  It'll
>also happen less often that a user will examine an less-recently
>examined object than a more-recently-examined.
>...

This assumes that the CAD system is written VERY badly.  The system
should be able to function perfectly well without it's entire working
set.  It should only read in that part of the data which is required
to satisfy the users request.

"Aha!" you say.  "That's what virtual memory does!"  And you're right.
But, by making this paging a responsibility of the CAD system instead
of the hardware, you allow those applications which don't need virtual
memory to proceed without the overhead of a slow memory interface.

J. Giles
Los Alamos

tuba@ur-tut.UUCP (Jon Krueger) (09/14/86)

In article <7418@lanl.ARPA> jlg@a.UUCP (Jim Giles) writes:
>The CAD system should be able to function perfectly well without its entire
>working set...It should only read in that part of the data which is required
>to satisfy the user's request...By making this paging a responsibility of the
>CAD system instead of the hardware, you allow those applications which don't
>need virtual memory to proceed without the overhead of a slow memory
>interface.

I can arbitrarily divide up applications into types as follows:

	1. applications which fit into physical spaces and
	   available physical memory sizes

	2. applications which can trivially manage their own
	   spaces, with acceptable development and maintainence
	   costs, which achieve comparable levels of performance
	   to their virtual memory equivalents

	3. applications which can trivially manage their own
	   spaces, with acceptable development and maintainence
	   costs, which achieve higher levels of performance
	   than their virtual memory equivalents

	4. applications which can be made to manage their own
	   spaces, but cost more or perform less than their
	   virtual memory equivalents

	5. applications with no known algorithms to manage their
	   own space, which could conceivably by made to do so at
	   some cost in development or performance

Applications of type 1 and 2 don't require large spaces, either physical or
virtual.  You point out that they would run faster without virtual address
translation.  You've got me there.  For a given space into which an
application fits, it will run faster if the space is physical.  If your
applications don't require any other benefits of virtual memory, such as
memory protection or easy relocation, you're free to run them in your
favorite physical space.  You can now buy small physical spaces cheaply and
easily.  The demands of such applications are not a significant
architectural problem in 1986.

I invite anyone to submit examples of type 3.  I haven't got any, but I
suspect there are a few.

Applications of type 4 and 5 require or benefit from large spaces.  It has
been shown elsewhere that virtual spaces provide the same spaces as physical
ones, at 10% of the cost for 90% of the performance.

The problem seems to be that you're running applications of type 1 and 2 on
systems designed to handle types 4 and 5.  I suggest you move them to small,
cheap, systems.  If you don't need the benefits of virtual memory, you
shouldn't be asked to pay its costs.  Has anyone seen a multiprocessor scheme
where some processors maintain physical spaces and others virtual?  I'll
take one where the kernel moves each image to the first free processor that
supports an appropriately sized space, where the larger spaces are more
likely to be virtual, and the smaller ones physical.  Anyone heard of such a
beast?

					-- jon
-- 
--> Jon Krueger
uucp: {seismo, allegra, decvax, cmcl2, topaz, harvard}!rochester!ur-tut!tuba
Phone: (716) 275-2811 work, 235-1495 home	BITNET: TUBA@UORDBV
USMAIL:  Taylor Hall, University of Rochester, Rochester NY  14627 

jlg@lanl.ARPA (Jim Giles) (09/16/86)

In article <684@ur-tut.UUCP> tuba@ur-tut.UUCP (Jon Krueger) writes:
>...
>	3. applications which can trivially manage their own
>	   spaces, with acceptable development and maintainence
>	   costs, which achieve higher levels of performance
>	   than their virtual memory equivalents
>...
>I invite anyone to submit examples of type 3.  I haven't got any, but I
>suspect there are a few.

Yes, quite a few!  It is my opinion that a large share of all scientific
computing falls into this category.  Certainly several of those I've
written (parts of) or used.

I wrote an EMP simulation code (Electro-Magnetic Pulse), for example, which
was essentially a large 3-d grid with a satellite model inside - you hit it
with x-rays and iterate Maxwell's equations on the grid.  Maxwell's
equations require you to keep 9 numbers for each position in the grid: 3
Electric field components, 3 Magnetic field components, and 3 current
density components (x-rays knock free charges off the body of the satellite
and the drift of these charges through the EM field cause current).

Now, take a rectangular grid that's 100 cells on a side: or 1 Million cells
(100^3).  That's 9 million floating point numbers for the grid.  This
clearly didn't fit in the CDC 7600 I was using at the time (1975), so
I had to 'page' the grid.  This was made fairly easy by the cyclical
nature of the required accesses: as soon as I was done with a plane
(that is:  I had updated a plane of the grid and all its neighbors) I
could flush it out to disk and start reading the furthest ahead plane
that would now fit (so that the input MIGHT complete before I needed the
data).

Note that the 'paging' activity took place in a natural dividing point
of the code: outside the 'plane' loop.  As a result of this, the inner
2 loops of the code (a 3-d loop, remember) were optimized as if the whole
problem fit memory.  Smaller grids were handled similarly - only the
'paging' might be done on a number of planes of the grid instead of one at
a time.  On a large grid, the 'paging' scheme could be moved into the next
lower loop and the 'pages' would have been half-planes or even strips of
the grid.  For a very small grid (one that almost fits memory), the paging
scheme could be modified to swap only a few planes (the actual technique
was to keep the whole grid of each component of the field that would fit
and only page the remaining components of the field).

Since all the paging activity was isolated in the outer loops of the code,
and since there was no additional address decoding circuitry in the memory
interface, I claim that this code clearly works faster than it would
have on a VM system.  I had no VM computer to benchmark against at the
time, so I don't have numbers to prove this claim (even if you were
likely to believe benchmark results that you couldn't test yourself).

Very many scientific codes have similar structure to the one described
above: all finite differencing codes - of course, finite element codes,
etc..  Since this is the case I claim that your class 3 set is well
occupied by real world production programs.

J. Giles
Los Alamos

aglew@ccvaxa.UUCP (09/17/86)

... > Large main memories... Scientific codes can effectively do their
... > own memory management, and do not need to be penalized by VM HW.

J. Giles et al have pretty much persuaded me that virtual memory isn't
always a good thing (it wasn't hard - I was already there).

But, unless you are selling computers strictly to the non-virtual scientific
crowd, a computer manufacturer will probably want to have virtual memory for
his general purpose customers. Your market offering might run like this:
    1.	A general purpose machine/OS with virtual memory
    1'. The ability to turn off virtual memory in machine/OS 1,
	for efficiency of important scientific codes.
    2.  A strictly scientific machine that doesn't have virtual memory
	hardware at all, so scientific codes that don't use virtual
	memory pay no penalty at all.
Assumption: you will probably want these machines to be architecturally
compatible, to save development cost for peripherals and assembly language
code, and so that marketing can tell customers that you have a binary
compatible family of machines... (sigh).

Q1: What features should a machine have so that it can span both virtual and
    non-virtual systems?
Q2: In the non-virtual, scientific, machine 2 do you really want to be
    running a multi-tasking system?
Q3: In a multi-tasking OS, can you mix virtual and non-virtual processes?

If the answer to Q2 is no, you're running a batch system, and you don't
need much special support in the instruction set, at least, to span virtual
and non-virtual systems. Same thing even if you are doing batch style 
multitasking, so long as you don't need to swap processes out and move
them around in core.

What if your customers' answer to Q2 is yes? (Cray seems to think so, with 
their UNIX). If you need to swap processes out and dynamically relocate them
in core, what IS features do you need? 
    I think this question was answered a long time ago - base registers.
Now, my real question is what exactly you need to make base registers useful.
Any takers?

I'm not sure that Q3 is an important question. You're only going to mix
virtual and non-virtual processes if there is a speed advantage to be gained
by turning off translation. Ie. running with the identity map would be
equivalent, except for TLB misses. 

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms

joel@peora.UUCP (09/23/86)

>cheap, systems.  If you don't need the benefits of virtual memory, you
>shouldn't be asked to pay its costs.  Has anyone seen a multiprocessor scheme
>where some processors maintain physical spaces and others virtual?  I'll
>--> Jon Krueger

	I haven't seen it done on a processor basis, but I've seen it
	done on a task by task basis. On our OS32MT operating most tasks
	run in real memory, but you can specify a virtual task manange-
	ment option at link time. All tasks run with address relocation
	hardware enabled, while virtual tasks actually have routines
	to handle page faults linked in. This means that the size of
	the working set of the task is fixed at load time, rather
	than sharing a common pool with other virtual tasks. There is
	an additional problem in that the page size is a rather
	unwieldy 64KB.

	Also on MVS back on IBM mainframes there is a V=R where the
	whole task is loaded into a contiguous chunk of real memory.
	This was generally used with small tasks with critical
	performance.
-- 
     Joel Upchurch @ CONCURRENT Computer Corporation (A Perkin-Elmer Company)
     Southern Development Center
     2486 Sand Lake Road/ Orlando, Florida 32809/ (305)850-1031
     {decvax!ucf-cs, ihnp4!pesnta, vax135!petsd, akgua!codas}!peora!joel