[net.arch] VERY LARGE main memories: crypt

aglew@ccvaxa.UUCP (09/05/86)

>As I think I've mentioned before, it is believed that there are
>approximately 2^200 electrons in the universe.  Since it is unlikely that
>anybody would want to reference more things than there are electrons in the
>universe, 200 bits seems like a good upper bound for the length of a memory
>address.
>
>Roy Smith, {allegra,philabs}!phri!roy

At last, an upper limit on the number of address bits! And I thought that
the sequence 16, 32, 64, 128... would continue forever. But hold on - 200
bits will be the address size for one process's virtual address space;
how many processes are you going to have, ie. how big will process id +
virtual address for the TLB be?

I'm smiling. Aren't you?

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms

bzs@bu-cs.BU.EDU (Barry Shein) (09/07/86)

>As I think I've mentioned before, it is believed that there are
>approximately 2^200 electrons in the universe.  Since it is unlikely that
>anybody would want to reference more things than there are electrons in the
>universe, 200 bits seems like a good upper bound for the length of a memory
>address.
>
>Roy Smith, {allegra,philabs}!phri!roy

Yeah, but now think more like a physicist (I actually have little idea
how to do that):

We need snapshots of those 2^200 electrons from the point of the big
bang to the present in picoseconds. And then there are the "what-ifs"...
Not to mention their x,y,z position and various qualities like energy
level and spin...

You laugh? too bad, no?

	-Barry Shein, Boston University

bwong@ihwpt.UUCP (bruce wong) (09/08/86)

> 
> >As I think I've mentioned before, it is believed that there are
> >approximately 2^200 electrons in the universe.  Since it is unlikely that
> >anybody would want to reference more things than there are electrons in the
> >universe, 200 bits seems like a good upper bound for the length of a memory
> >address.
> >
> >Roy Smith, {allegra,philabs}!phri!roy
> 
> Yeah, but now think more like a physicist (I actually have little idea
> how to do that):
> 
> We need snapshots of those 2^200 electrons from the point of the big
> bang to the present in picoseconds. And then there are the "what-ifs"...
> Not to mention their x,y,z position and various qualities like energy
> level and spin...
> 
> You laugh? too bad, no?
> 
> 	-Barry Shein, Boston University

If there are only that many electrons in the whole universe, how
can you develop storage, using electrons, that is as large ?

philm@astroatc.UUCP (Phil Mason) (09/09/86)

In article <1087@ihwpt.UUCP> bwong@ihwpt.UUCP (bruce wong) writes:
>> 
>> >As I think I've mentioned before, it is believed that there are
>> >approximately 2^200 electrons in the universe.  Since it is unlikely that
>> >anybody would want to reference more things than there are electrons in the
>> >universe, 200 bits seems like a good upper bound for the length of a memory
>> >address.
>> >
>> >Roy Smith, {allegra,philabs}!phri!roy
>> 
>> Yeah, but now think more like a physicist (I actually have little idea
>> how to do that):
>> 
>> We need snapshots of those 2^200 electrons from the point of the big
>> bang to the present in picoseconds. And then there are the "what-ifs"...
>> Not to mention their x,y,z position and various qualities like energy
>> level and spin...
>> 
>> You laugh? too bad, no?
>> 
>> 	-Barry Shein, Boston University
>
>If there are only that many electrons in the whole universe, how
>can you develop storage, using electrons, that is as large ?

Isn't 10^200 more like it? I didn't think those physicists used base two
very often!

If you were to build a memory THAT big, I guess you'd have one electron
in each cell, leaving the rest of the universe devoid of electrons, thus
blowing the universe up due to all of the exposed positive charges repelling
each other.

Whew, let's not think of that (im?)possibility!


-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Kirk  : Bones ?                |  Phil Mason, Astronautics Technical Center
Bones : He's dead Jim.         |  Madison, Wisconsin - "Eat Cheese or Die!"
- - - - - - - - - - - - - - - -|  
...seismo-uwvax-astroatc!philm |  I would really like to believe that my
...ihnp4-nicmad/               |  employer shares all my opinions, but . . . 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

bzs@bu-cs.BU.EDU (Barry Shein) (09/09/86)

From: bwong@ihwpt.UUCP (bruce wong)
>If there are only that many electrons in the whole universe, how
>can you develop storage, using electrons, that is as large ?

Obviously it has to be virtual :-)

And on another note, I write...
>How long would it take you
> to do a MEMQ of a list of a few HUNDRED MILLION lisp objects long?
> etc.

In reply To which everyone introduces me to arrays, thank you. Fine, now
do an (ADD1) to a few hundred million using hash arrays or some other
random storage scheme and find much speed up (sheesh, it's amazing
how many people can't see the forest for the trees :-) This isn't a
pop-quiz, it's a discussion, give the other side a little slack on
the trivia and stick to the issues, it is interesting though.

	-Barry Shein, Boston University

preece@ccvaxa.UUCP (09/09/86)

> /* Written  4:16 pm  Sep  7, 1986 by bwong@ihwpt.UUCP in ccvaxa:net.arch */
> If there are only that many electrons in the whole universe, how can
> you develop storage, using electrons, that is as large ?
----------
Virtual memory.

-- 
scott preece
gould/csd - urbana
uucp:	ihnp4!uiucdcs!ccvaxa!preece
arpa:	preece@gswd-vms

roy@phri.UUCP (Roy Smith) (09/09/86)

In article <553@astroatc.UUCP> philm@astroatc.UUCP (Phil Mason) writes:
> Isn't 10^200 more like it? I didn't think those physicists used base two
> very often!

	No, physicists don't usually think in binary, and no, I didn't mean
10^200th.  A physicist would say there are about 10^40 electrons and a
computer scientist would call that about 2^200 because it's more useful
that way.  And please, no nitpicking from people claiming these numbers are
off by an order of magnitude or two.
-- 
Roy Smith, {allegra,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

aglew@ccvaxa.UUCP (09/10/86)

> So VAX/Unix oriented, so blinded!
> 
> The Cray is a word oriented machine.  Take out the factor of 8 from all
> of your calculations.  You can tell a person's thinking by the words they
> choose.  The Cray-2 is a 256 MW not a 2 GB machine.  These are not the same
> because the conversion is non trivial.  Crays are word oriented machines.
> Anyone who says "2 GB" is showing a great deal of naive.  Get off your
> VAXen an try Univacs (36-bit), IBM's inverted bit order, and other systems.
> 
> eugene miya

You're off track here, Eugene. 2GB = 256 M on any machine where a double
precision floating point number occupies 64 bits. Even on VAXen. The Cray's
`large' memory is comparable, wherever you're talking about large scientific
problems (which about half of us are).

Your exasperation is worthy, but the timing is off.

Maybe I'm UNIX oriented, but I'm hardly VAX preoccupied. Look who I work for.

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms

jin@hropus.UUCP (Jerry Natowitz) (09/10/86)

> 	No, physicists don't usually think in binary, and no, I didn't mean
> 10^200th.  A physicist would say there are about 10^40 electrons and a
> computer scientist would call that about 2^200 because it's more useful
> that way.  And please, no nitpicking from people claiming these numbers are
> off by an order of magnitude or two.
> -- 
> Roy Smith, {allegra,philabs}!phri!roy
> System Administrator, Public Health Research Institute
> 455 First Avenue, New York, NY 10016

A mathematically trained ex-psychologist says that 10^40 is
2^132, more or less.  a^x = b^((x*log(a))/log(b)) - in this case
10^40 = 2^(40*1/.303) = 2^132 to three place precision.

I think correcting an error of over 20 orders of magnitude (200*.303 -40)
is not nitpicking, wouldn't you agree?
-- 
Jerry Natowitz (HASA - A division)
Bell Labs HR 2A-214
201-615-5178 (no CORNET)
ihnp4!houxm!hropus!jin
or ihnp4!opus!jin

Isn't it interesting how the beautiful little red flower in the forest
becomes so ugly when you discover it's a candy wrapper.

preece@ccvaxa.UUCP (09/11/86)

I think Miya's point, in complaining that we were talking about
the Cray size in bytes when it is really in words, is that when
you're talking about address decoding, 256MW is NOT the same
problem as 2GB.  It's three bits less.  Whether that's an
important distinction or not, though...

-- 
scott preece
gould/csd - urbana
uucp:	ihnp4!uiucdcs!ccvaxa!preece
arpa:	preece@gswd-vms

aglew@ccvaxa.UUCP (09/11/86)

> I think Miya's point, in complaining that we were talking about
> the Cray size in bytes when it is really in words, is that when
> you're talking about address decoding, 256MW is NOT the same
> problem as 2GB.  It's three bits less.  Whether that's an
> important distinction or not, though...
> 
> scott preece

Yeah, but then, for any machine with a cache, the true address is 
reduced by the cache line size. So, a 2GB memory in a machine with
a 128 bit cache line is actually only 27 bits, 4 bits less.

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms

waynekn@tekig5.UUCP (Wayne Knapp) (09/11/86)

In article <2431@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes:
> In article <553@astroatc.UUCP> philm@astroatc.UUCP (Phil Mason) writes:
> > Isn't 10^200 more like it? I didn't think those physicists used base two
> > very often!
> 
> 	No, physicists don't usually think in binary, and no, I didn't mean
> 10^200th.  A physicist would say there are about 10^40 electrons and a
> computer scientist would call that about 2^200 because it's more useful
> that way.  And please, no nitpicking from people claiming these numbers are
> off by an order of magnitude or two.
> -- 
> Roy Smith, {allegra,philabs}!phri!roy
> System Administrator, Public Health Research Institute
> 455 First Avenue, New York, NY 10016


     Try 10^40 electrons in 100 qubic kilometers of water.  You can do the
math.  Sorry to nitpick but I think you are only  by a hunderd or two hunderd
orders of magnitude.  You must think the universe is a very small place! 

                             Wayne Knapp

waynekn@tekig5.UUCP (Wayne Knapp) (09/11/86)

>      Try 10^40 electrons in 100 qubic kilometers of water.  You can do the
> math.  Sorry to nitpick but I think you are only  by a hunderd or two hunderd
> orders of magnitude.  You must think the universe is a very small place! 
> 
>                              Wayne Knapp

Better fix my mistakes before I'm flamed:

     Try 10^40 electrons in 100 cubic kilometers of water.  You can do the
math.  Sorry to nitpick but I think you are only off by a hunderd or two hunderd
orders of magnitude.  You must think the universe is a very small place! 
  
                               Wayne Knapp

beckenba@tybalt.caltech.edu.Caltech.Edu (Joseph R. Beckenbach) (09/12/86)

In article <5100124@ccvaxa> preece@ccvaxa.UUCP writes:
>
>> /* Written  4:16 pm  Sep  7, 1986 by bwong@ihwpt.UUCP in ccvaxa:net.arch */
>> If there are only that many electrons in the whole universe, how can
>> you develop storage, using electrons, that is as large ?
>----------
>Virtual memory.
>
>-- 
>scott preece
>gould/csd - urbana
>uucp:	ihnp4!uiucdcs!ccvaxa!preece
>arpa:	preece@gswd-vms

Newsgroups: net.arch
Subject: Re: VERY LARGE main memories: crypt
Summary: virtual memory ~> #electrons in universe
Expires: 3 October 1986
References: <15505@ucbvax.BERKELEY.EDU> <5100124@ccvaxa>
Reply-To: beckenba@tybalt.caltech.edu.UUCP (Joseph R. Beckenbach)
Organization: Calfornia Institute of Technology
Keywords: memory

In article <5100124@ccvaxa> preece@ccvaxa.UUCP writes:
>> If there are only that many electrons in the whole universe, how can
>> you develop storage, using electrons, that is as large ?
>Virtual memory.
>scott preece

The point is that our memory is based on electrons.
Open eyes, please. Virtual memory must be supported somewhere, or else
when we turn our attention from that section of memory, it gets moved to
oblivion. In current (small on this scale) virtual memory implemetations
this support is hardware or media-- physical things-- with electrons
(is this a chemistry/physics/science review for anyone?), or else 
ALL THAT INFORMATION GOES AWAY.  Permanently, which is not memory by any
imaginable definition. Meaning: virtual memory < #electrons in universe.
We need a few for electron-positron research :-), computer power,
and TV power :-(. 

End digression from virtual implementations. Please.
			--- JRB3

john@datacube.UUCP (09/12/86)

>As I think I've mentioned before, it is believed that there are
>approximately 2^200 electrons in the universe.  Since it is unlikely that
>anybody would want to reference more things than there are electrons in the
>universe, 200 bits seems like a good upper bound for the length of a memory
>address.
>
>Roy Smith, {allegra,philabs}!phri!roy

Oh, good, there is a physical upper bound! There are about 2^200
electrons in the universe, and the the most efficient possible memory
element uses one electron, then it is impossible to have more than 
2^200 bits of memory :-).

				John Bloomfield

Datacube Inc. 4 Dearborn Rd. Peabody, Ma 01960 	617-535-6644
	
ihnp4!datacube!john
decvax!cca!mirror!datacube!john
{mit-eddie,cyb0vax}!mirror!datacube!john

desj@brahms.BERKELEY.EDU (David desJardins) (09/12/86)

> /* Written  4:16 pm  Sep  7, 1986 by bwong@ihwpt.UUCP in ccvaxa:net.arch */
> If there are only that many electrons in the whole universe, how can
> you develop storage, using electrons, that is as large ?

   There is no reason why the state of a single electron cannot be used
to store many, many bits of information.

   -- David desJardins

ken@argus.UUCP (Kenneth Ng) (09/12/86)

In article <5100124@ccvaxa>, preece@ccvaxa.UUCP writes:
> 
> > /* Written  4:16 pm  Sep  7, 1986 by bwong@ihwpt.UUCP in ccvaxa:net.arch */
> > If there are only that many electrons in the whole universe, how can
> > you develop storage, using electrons, that is as large ?
> ----------
> Virtual memory.
> scott preece
> gould/csd - urbana
> uucp:	ihnp4!uiucdcs!ccvaxa!preece
> arpa:	preece@gswd-vms

Wouldn't this be more of virtual universe?

-- 
Kenneth Ng: Post office: NJIT - CCCC, Newark New Jersey  07102
uucp(for a while) ihnp4!allegra!bellcore!argus!ken
     ***   WARNING:  NOT ken@bellcore.uucp ***
           !psuvax1!cmcl2!ciap!andromeda!argus!ken
bitnet(prefered) ken@orion.bitnet

--- Please resend any mail between 10 Aug and 16 Aug:
--- the mailer broke and we had billions and billions of
--- bits scattered on the floor.

john@datacube.UUCP (09/12/86)

>/* Written  9:26 am Sep  9, 1986 by preece@ccvaxa.UUCP in datacube:net.arch */
>> /* Written  4:16 pm  Sep  7, 1986 by bwong@ihwpt.UUCP in ccvaxa:net.arch */
>> If there are only that many electrons in the whole universe, how can
>> you develop storage, using electrons, that is as large ?
>---------
>Virtual memory.

Well OK, but don't expect your nonvolitle storage, disk, organic memory,
or whatever, to be that big either, it to requires at least one electron
for each bit stored :-)

				John Bloomfield

Datacube Inc. 4 Dearborn Rd. Peabody, Ma 01960 	617-535-6644
	
ihnp4!datacube!john
decvax!cca!mirror!datacube!john
{mit-eddie,cyb0vax}!mirror!datacube!john

roy@phri.UUCP (Roy Smith) (09/12/86)

	I claimed that 200 bits is a good upper limit on address size
because there are 2^200 electrons in the universe.  I was taken to task on
this and it was suggested that I meant 10^200.  I reiterated that I meant
2^200, which is what other people would call about 10^40th.

	In article <668@hropus.UUCP> jin@hropus.UUCP (Jerry Natowitz)
pointed out that 10^40 is off by rather a lot.  My apologies.  I used a
calculator program on Unix to calculate 2^200 and came up with 1.7e38.
Stupid me, by now I should recognize that number as something magic -- the
largest number you can represent in floating point on a Vax.  It seems that
2^200 produces floating point overflow and the calculator program I used
doesn't catch overflows.  And yes, I did think 1.7e38 was a bit low (I was
expecting more like 1e60, or about 3.something bits per decimal digit).
There is a discussion going on in RISKS-DIGEST (aka mod.risks) about people
trusting computers too much.  Now I find myself just as guilty.  Mea Culpa.

	Anyway, I stand by my assertion that there are 2^200th electrons in
the universe.  If you prefer to work in decimal, work out the conversion
yourself.  For the purposes of this discussion, I consider any arithmetic
error in that conversion (even a factor of 10^20!) to be trivial.

	A reasonable argument that 200 bits is not a good upper bound has
been made by various other people, however.  There are often times when a
sparse address space is useful.  If you take my home address as a number,
RoySmith222UnionStreetBrooklynNY11231, you need more than 200 bits (7-bit
ASCII).  For a world data base, it might be convienent to use this directly
as an index into a table, and the hell with hashing.
-- 
Roy Smith, {allegra,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

robison@uiucdcsb.CS.UIUC.EDU (09/13/86)

For the curious, my freshman physics text[1] contains the estimate:

	No. of nucleons in known universe ~ 10^80

This would be ~ 2^266.  Even if a memory this size could be built, there
is still the problem of finding enough power for it.  There are quantum
limits for the minimum power of switching devices[2].   

[1] *Mechanics*, Kittel, Knight, Ruderman, Helmholz, and Moyer. 
    (2nd edition, 1973)

[2] "Physics of Computational Systems, " in *Introduction to VLSI Systems*,
    Mead and Conway, 1980.

Arch D. Robison
University of Illinois at Urbana-Champaign

guy@sun.uucp (Guy Harris) (09/13/86)

> I think Miya's point, in complaining that we were talking about
> the Cray size in bytes when it is really in words, is that when
> you're talking about address decoding, 256MW is NOT the same
> problem as 2GB.  It's three bits less.

THAT depends on how your memory subsystem works.  One can imagine a memory
system that works in 32-bit or 64-bit words; the memory address would be the
address of a 32-bit or 64-bit memory word, and the low-order 2 or 3 bits
would be used by the CPU to pick off the byte from that word.

In fact, somebody *did* imagine such a memory subsystem; a company called
Digital Equipment Corporation came out with a machine in the Virtual Address
eXtension line of machines which had a Synchronous Backplane Interconnect to
which memory was attached; addresses were 28-bit addresses of 32-bit
longwords.

Actually, in this case the memory controller may have handled byte selection
and stuffing; the SBI has "read masked" and "write masked" commands that
used a 4-bit mask to indicate which bytes appeared on the data lines during
the operation.  However, this stuff may have been there mostly for the
benefit of the UBA, where the bytes selected would actually affect the sorts
of UNIBUS operations performed; the memory controller could always fetch a
32-bit longword and just use some NAND gates or something on the "read
masked" and use read-modify-write cycles and 2-1 multiplexors on the "write
masked" operations; it doesn't have to select one of four bytes, just one of
two.

If the VAX hadn't descended from a line of machines with a byte-addressible
bus, it might have been possible not to bother giving the "which bytes are
of interest" information to nexi on the SBI at all, and do *all* of the byte
handling in the CPU.  
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

aglew@ccvaxa.UUCP (09/14/86)

...> Large memory making paging unnecessary.
...> Discussion of page replacement strategies.

...> OS designers being "too lazy" to develop good policies for scientific
...> code, vs. "it's easy" to predict data usage patterns.
The point is that OS mechanism are nearly always responsive, not predictive.
They observe a usage pattern, and assume that it is going to go on for a
while. LRU is a particularly bad example, for scientific codes, but it
does almost as badly for some AI type problems. Consider a large tree walk
- you want to use LRU for pages above you in the tree, but before throwing
any of those out you want to throw pages below you out that you've already
visited out first. Explicit code of the form

		visit(node)
		    prepare-for-prefetch(right,LRU)
		    visit(left)
		    throw-away(left)
		    visit(right)
		    throw-away(right)

is easy and obvious (though it gets more complicated if you try for left
prediction). And it accurately _predicts_, instead of responding.

If you've done control systems, you'll know that the ideal feedback mechanism
has negative delay, but in the real world you need integrators. Knowledge of
the algorithm is like knowing the correlation of your noise - it's not great,
but it's better than nothing.

A useful discussion might be, how can we extend vadvise() to be more useful?
A simple vadvise(CYCLIC) isn't satisfactory, since the operating system would
have to figure out the bounds of the cyclicaly accessed region. But maybe
vadvise(CYCLIC,a,b,d), where a-d is the bounds, and b indicates the dimension
of the wavefront. More primitive, a call I'm-going-to-need-this-page-soon()
and I'm-never-going-to-need-this-page-again() might be useful. For
abstraction, say memory region (indicated by bounds) instead of page.

Such things probably do not need to be system calls - they could be library
functions, although with a lot of system knowledge built in. As for the
actual page strategies used by hardware, they are likely to become simpler,
not more complex, since the devices you are paging from are getting faster.
Consider paging from a non-directly-addressible bank of `slow' CMOS memory
on an uninterruptible power supply...

...> What about code usage patterns?
I've seen automatic overlay generators on small systems - I even used one,
once, but not for long. It had the option of generating overlays completely
automatically. This was particularly easy to do for the top level of
classic Pascal programs:
	    PROGRAM foo;
		BEGIN
		    init();
		    stage1();
		    stage2();
		    stage3();
		    cleanup();
		 END.
where the overlay strategy for the top level is obvious if the stages don't
share to many routines - even if they do, call-graph decomposition isn't
too hard. And, as above, this was predictive, not responsive.

...> John Krueger: can we measure the win?
I'm sure my coworkers will correct me, but I've heard of as much as 50%
speed improvement using vmadvise() for non-scientific applications. More
precise control will probably do even better. Can anybody at NASA respond?

			    ---

I'd like to break for discussion for a bit. I'm fairly sure that memories
large enough to drastically reduce the importance of paging are imminent.
I am not so certain of how the page tables, the address translation
mechanism, will respond. How big can we make our TLBs? How quickly can we
handle TLB misses?

Certainly, tree structured page tables are unlikely to be any good. How about
discussion of inverted page tables, hashing, O(1) probabilistic translation
on TLB misses? Are the hash tables going to have to be so large that they
themselves will be paged? - something I do not like because it almost
definitely requires microcode.

As for TLBs, maybe we can squeeze more into them by comparing for groups of
pages instead of individual pages. I've proposed a scheme for power of two
sized page ranges that makes the range check into a strict AND/OR, carryless
operation - which isn't bad, except that it adds 2-3 gate delays to the most
critical path of the machine. I know the tradeoff curves, but I don't know
when we will cross them. Are there any other ideas out there?

Andy "Krazy" Glew. Gould CSD-Urbana.    USEnet:  ihnp4!uiucdcs!ccvaxa!aglew
1101 E. University, Urbana, IL 61801    ARPAnet: aglew@gswd-vms

In article <7331@lanl.ARPA> jlg@a.UUCP (Jim Giles) writes:
>It's not just that the operating system or hardware designers are too lazy
>to come up with a good scheme.  The problem is that any scheme they DO come
>up with must work for general cases.  That is, it can't take advantage of
>special knowledge of a specific algorithm....The individual applications
>programmer CAN take advantage of such knowledge.  To be sure, this is an
>expensive and difficult programming project...[but] there are some types of
>algorithm for which it is extremely easy to predict the data usage
>patterns.  Now, it might be convenient to implement virtual memory schemes
>which are useful in this context, but I doubt that the extra overhead in the
>memory interface would be justified - especially since the explicit methods
>for dealing with them are fairly easy to implement.

1) Doubtless you can show me cases where it's "extremely easy" to predict
data usage patterns.  Can you show me one where it's easy to predict the
code usage patterns?  I want VM to free me from overlays, or code space
management, not file structuring, or data space management.

>if you've just spent $10-$20 million on a fast machine, you aren't going
>to balk at a few million more in programmer man-hours to get the speed
>that you shelled out so much cash for.
2) Can we measure the win?  Can you provide figures on performance
improvements for either data or code space management by application
programmers over mechanisms provided by operating system or hardware
designers?  Have you any actual examples, how much was the improvement for a
specific application you're familiar with?  Can we state a general rule,
expected returns on applications programmers managing their own code and/or
data spaces?  Can you state the breakeven point, how many millions of
dollars I should be willing to spend before I improve on the operating
system's paging?

					-- jon
-- 
--> Jon Krueger
uucp: {seismo, allegra, decvax, cmcl2, topaz, harvard}!rochester!ur-tut!tuba
Phone: (716) 275-2811 work, 473-4124 home	BITNET: TUBA@UORDBV
USMAIL:  Taylor Hall, University of Rochester, Rochester NY  14627 
/* End of text from ccvaxa:net.arch */

bzs@bu-cs.BU.EDU (Barry Shein) (09/15/86)

Am I missing something? What's the huge problem with this hypothetical
vadvise(CYCLIC)? Isn't it just MRU (most recently used)?

It seems to me that the conjecture was that scientific number crunchers
tend to just

	DO 100 I=1,N
	ARR[I] = F(ARR[I])
100	CONTINUE

or something analagous to that, and when done do it again. Thus, the
most recently used page is the one that will be used again the
furthest in the future and the least recently used is the one that
will be used soon. Thus, they do hand to hand combat with LRU
algorithms. Therefore, it seems they want LRU for the text space
(being as there's no particular reason to believe the text is any
different than other programs, locality of reference applies) and MRU
for their (impure) data space (probably LRU for the rest.) Nothing's
perfect, but might that not be better? Anyone know if anyone's tried
it? I assume so.

	-Barry Shein, Boston University

peters@cubsvax.UUCP (Peter S. Shenkin) (09/15/86)

In article <uiucdcsb.5600054> robison@uiucdcsb.CS.UIUC.EDU writes:
>
>						...There are quantum
>limits for the minimum power of switching devices[2].   
>
>[2] "Physics of Computational Systems, " in *Introduction to VLSI Systems*,
>    Mead and Conway, 1980.

This has been the subject of long debate.  Rolf Landauer (I think) and someone 
else reviewed the subject in Scientific American about a year ago.  I believe
the bottom line is that everyone (i.e., even people who publicly supported the
opposite postion) now believes that there are no such limits.

Peter S. Shenkin	 Columbia Univ. Biology Dept., NY, NY  10027
{philabs,rna}!cubsvax!peters		cubsvax!peters@columbia.ARPA

bs@faron.UUCP (Robert D. Silverman) (09/15/86)

> >      Try 10^40 electrons in 100 qubic kilometers of water.  You can do the
> > math.  Sorry to nitpick but I think you are only  by a hunderd or two hunderd
> > orders of magnitude.  You must think the universe is a very small place! 
> > 
> >                              Wayne Knapp
> 
> Better fix my mistakes before I'm flamed:
> 
>      Try 10^40 electrons in 100 cubic kilometers of water.  You can do the
> math.  Sorry to nitpick but I think you are only off by a hunderd or two hunderd
> orders of magnitude.  You must think the universe is a very small place! 
>   
>                                Wayne Knapp

I'm not going to nitpick your spelling because it's irrelevent to the discussion
Currently, cosmologists are still debating over whether the universe is open
or closed. There are quite good estimates on a LOWER bound for the mass of the
universe. If the mass is actually TWICE the current known lower bound then
the universe IS closed. The point of all this is that the known mass can be
used to get an estimate on the number of atoms in the universe. It is something
like 10^78 or 10^79. There are therefore at least that number of electrons.

Reference: 
"Red Giants and White Dwarfs", can't remember author's name but it begins with a'J'. Possibly Jastrow?

jlg@lanl.ARPA (Jim Giles) (09/16/86)

In article <1276@bu-cs.bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>
>Am I missing something? What's the huge problem with this hypothetical
>vadvise(CYCLIC)? Isn't it just MRU (most recently used)?
>...
>most recently used page is the one that will be used again the
>furthest in the future and the least recently used is the one that
>will be used soon. Thus, they do hand to hand combat with LRU
>algorithms. Therefore, it seems they want LRU for the text space
>(being as there's no particular reason to believe the text is any
>different than other programs, locality of reference applies) and MRU
>for their (impure) data space (probably LRU for the rest.) ...

The are two thing wrong with this scheme.  1) Suppose that you problem
fails to fit in main memory by only a few page sizes.  The above scheme
will cycle each and every page of the whole data structure into memory
and back out again on every time step.  The knowledgable programmer,
however, can overlay just a few pages and therefore cut down on the
total I/O requirement of the code.  2) As with any built-in VM system,
ALL references to memory must be decoded for page address - which slows
down the memory interface.  This means that all memory traffic is slower
that it would have been without the VM system - even for those codes (or
parts of codes) which don't require the virtual memory.

J. Giles
Los Alamos

rb@cci632.UUCP (Rex Ballard) (09/18/86)

In article <7537@lanl.ARPA> jlg@a.UUCP (Jim Giles) writes:
>In article <1276@bu-cs.bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>>
>>Am I missing something? What's the huge problem with this hypothetical
>>vadvise(CYCLIC)? Isn't it just MRU (most recently used)?
>>...
>>most recently used page is the one that will be used again the
>>furthest in the future and the least recently used is the one that
>>will be used soon. Thus, they do hand to hand combat with LRU
>>algorithms. Therefore, it seems they want LRU for the text space
>>(being as there's no particular reason to believe the text is any
>>different than other programs, locality of reference applies) and MRU
>>for their (impure) data space (probably LRU for the rest.) ...

Has anybody tried MAU (Most Average Use) approaches?

>The are two thing wrong with this scheme.  1) Suppose that you problem
>fails to fit in main memory by only a few page sizes.  The above scheme
>will cycle each and every page of the whole data structure into memory
>and back out again on every time step.  The knowledgable programmer,
>however, can overlay just a few pages and therefore cut down on the
>total I/O requirement of the code.  2) As with any built-in VM system,
>ALL references to memory must be decoded for page address - which slows
>down the memory interface.  This means that all memory traffic is slower
>that it would have been without the VM system - even for those codes (or
>parts of codes) which don't require the virtual memory.
>
>J. Giles
>Los Alamos

Question, is it a requirement that the data and/or text structure be a loop? 
If there are too many in-line macro expansions, "almost the same" code,
or anything that can be put into a tree, MAU might be a solution.

This should ideally be combined with look-ahead pre-fetch.  For example,
on the code side,

	main:		main:
	do a		call a
	if cond		if cond
	macro b		call b
	macro x
	else		else
	macro c		call c
	macrox
	endif		endif 
	macro x		call x

			b:
			call j
			call x
			return
			c:
			call k
			call x

Under MAU, main and x would be in core, the prefetch could pick up the first
few instructions of b and c, could be pre-fetched, and only j and k would
cause slowdown.

In the data case:

	struct btree		struct hier
	{			{
	   key			   struct keyrange {keylow,keyhigh,*hierp }[N]
	   btree_less*		   hier_less*
	   btree_greater*	   hier_greater*
	}			}

allows several keys to be examined before looking at the next block.
this also eliminates the need to look at as many blocks.

other examples such as circular queues can "rubber band" into multiple
queues, each a convenient size, with the middle elements being flushed
until needed.

franka@mmintl.UUCP (Frank Adams) (09/19/86)

In article <7537@lanl.ARPA> jlg@a.UUCP (Jim Giles) writes:
>In article <1276@bu-cs.bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>>Am I missing something? What's the huge problem with this hypothetical
>>vadvise(CYCLIC)? Isn't it just MRU (most recently used)?
>The are two thing wrong with this scheme.  1) Suppose that you problem
>fails to fit in main memory by only a few page sizes.  The above scheme
>will cycle each and every page of the whole data structure into memory
>and back out again on every time step.

No it won't.  Only the number of pages which don't fit are read in each
cycle.  (E.g., if there are 103 pages and 100 will fit, 3 are read each
cycle.)  In fact, MRU behaves optimally[1][2] for a purely cyclic access
pattern.  In particular, it has less I/O overhead than setting aside a few
pages at the end to be overlaid.

>The knowledgable programmer,
>however, can overlay just a few pages and therefore cut down on the
>total I/O requirement of the code.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

[1] Assuming that only demand-loading is used.  As long as I/O can be done
simultaneously with computation, lookahead will improve performance.  Even
with lookahead, MRU is still optimal for the selection of the next page to
be swapped out.

[2] This also assumes that all accesses to external store are equivalent.
If the complete data set does not fit on one cylinder of the disk, this
assumption will likely be violated, at least if no other program is executing
on the same machine.  The optimal solution in the general case is not
obvious.

franka@mmintl.UUCP (Frank Adams) (09/19/86)

In article <645@faron.UUCP> bs@faron.UUCP writes:
> The point of all this is that the known mass can be
>used to get an estimate on the number of atoms in the universe. It is
>something like 10^78 or 10^79. There are therefore at least that number
>of electrons.

This assumes that most of the mass in the universe is in the form of
ordinary matter -- protons, neutrons, and electrons.  This assumption is
very much in question these days.  Proposed alternatives include everything
from neutrinos to topological singularities left over from shortly after the
big bang.

The visible matter in the universe is about an order of magnitude less than
amount needed to close the universe.  This is a better lower bound on the
number of electrons -- it seems unlikely that stars contain any significant
quantity of such exotic mass.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

jlg@lanl.ARPA (Jim Giles) (09/24/86)

In article <1819@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>In article <7537@lanl.ARPA> jlg@a.UUCP (Jim Giles) writes:
>>In article <1276@bu-cs.bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>>>Am I missing something? What's the huge problem with this hypothetical
>>>vadvise(CYCLIC)? Isn't it just MRU (most recently used)?
>>The are two thing wrong with this scheme.  1) Suppose that you problem
>>fails to fit in main memory by only a few page sizes.  The above scheme
>>will cycle each and every page of the whole data structure into memory
>>and back out again on every time step.
>
>No it won't.  Only the number of pages which don't fit are read in each
>cycle.  (E.g., if there are 103 pages and 100 will fit, 3 are read each
>cycle.)  In fact, MRU behaves optimally[1][2] for a purely cyclic access
>pattern.  In particular, it has less I/O overhead than setting aside a few
>pages at the end to be overlaid.
>
The MRU algorithm (as far as I can see) does the following:
    Assume M pages of data are to be used cyclically and only N pages
    will fit.  Initially, the first page is loaded and the code begins
    executing.  At the end of page 1, the next page is loaded, and so
    on for N pages.  Now you have a choice to make, do you flush page
    1 or page N to make room for N+1.  The MRU algorithm presumably
    flushes page N (the most recently used).  And so you continue up
    to page M.  Now on the second cycle through the data, the first N-
    1 pages are already there, so no page fault occurs until page N is
    needed.  Then a page fault occurs for every page until page M.

This doesn't seem optimal to me.  Or did I get the algorithm wrong?
In this case, even LRU (least recently used) seems more efficient since
some, at least, of the I/O is done asynchronously.  The most efficient
algorithm would be to load N pages at the start, when done with the first
page flush it out and load the N+1st page - and so on: always flushing the
most recently used and loading the most distantly required.  This makes
all the I/O as asynchronous as possible.

If M-N is small compared to N, you could reduce the I/O requirement by
loading N-M pages, skipping every M-Nth one, and then after every M-N
pages, flush the most recent and load the most distantly needed (as
before except that only M-N pages are being swapped in and out).

There are other techniques (that I've mentioned in previous
submissions) that reduce the I/O load or increase the amount of
asynchronous I/O in many cases.  Most of these techniques would be
difficult to provide in a general purpose VM system.

You might argue that MRU should be modified to include the technique
I just mentioned.  But how does a demand driven system know what the value
of M is?  How does it know that M remains unchanged from one cycle to the
next?  What is some other data structure (like an equation of state, for
example) is being referenced following a random pattern of use - at the
same time that the cyclically used data is being traversed: how do
I advise the VM system that some of the pages should be referenced
MRU and some should be LRU?

I can solve all the above problems explicitly if I have control over
memory contents.  I can't off hand see any solution that a VM system
could take advantage of.  A lot of work has been done in the
optimization of memory management for scientific codes.  Many of the
techniques that have resulted are not well suited to any demand driven
context but, instead, rely upon knowledge of the data use patterns of
specific algorithms.

J. Giles
Los Alamos

franka@mmintl.UUCP (Frank Adams) (09/25/86)

In article <7847@lanl.ARPA> jlg@a.UUCP (Jim Giles) writes:
>The MRU algorithm (as far as I can see) does the following:
>    Assume M pages of data are to be used cyclically and only N pages
>    will fit.  Initially, the first page is loaded and the code begins
>    executing.  At the end of page 1, the next page is loaded, and so
>    on for N pages.  Now you have a choice to make, do you flush page
>    1 or page N to make room for N+1.  The MRU algorithm presumably
>    flushes page N (the most recently used).  And so you continue up
>    to page M.  Now on the second cycle through the data, the first N-
>    1 pages are already there, so no page fault occurs until page N is
>    needed.  Then a page fault occurs for every page until page M.
>
>This doesn't seem optimal to me.  Or did I get the algorithm wrong?
>In this case, even LRU (least recently used) seems more efficient since
>some, at least, of the I/O is done asynchronously.

As I noted in my footnote, it is only optimal assuming no overlap of I/O and
processing.  This will be true for any demand-driven paging scheme.  In
particular, an LRU scheme will differ from this in having a page fault for
*every* page reference (after the first N); there will be no asynchronous
I/O.

>The most efficient
>algorithm would be to load N pages at the start, when done with the first
>page flush it out and load the N+1st page - and so on: always flushing the
>most recently used and loading the most distantly required.  This makes
>all the I/O as asynchronous as possible.

Right; and this is still a MRU algorithm in the sense that the page chosen
to be flushed is the most recently used.  It is not a demand-driven paging
algorithm.

>What if some other data structure (like an equation of state, for
>example) is being referenced following a random pattern of use - at the
>same time that the cyclically used data is being traversed: how do
>I advise the VM system that some of the pages should be referenced
>MRU and some should be LRU?

This is indeed the fatal flaw in this scheme -- it works fairly well for
cyclic access, but it is terrible if *any* non-cyclic access is being
performed.  It will similarly perform quite badly for certain nearly cyclic
access patterns; e.g.,

for j = 1 to M;
   for i = 1 to j;
      access page i;

Since *very* few applications have a *purely* cyclic access pattern, an MRU
option is es essentially worthless.

>I can solve all the above problems explicitly if I have control over
>memory contents.  I can't off hand see any solution that a VM system
>could take advantage of.  A lot of work has been done in the
>optimization of memory management for scientific codes.  Many of the
>techniques that have resulted are not well suited to any demand driven
>context but, instead, rely upon knowledge of the data use patterns of
>specific algorithms.

I believe that most, if not all, of these *can* be implemented in a VM
system which has a "load this page for later reference" primitive, albeit
with some performance loss.  Whether this is worthwhile depends on the
overall usage pattern for the machine in question.  I don't doubt that there
are many instances where a non-virtual memory machine is the correct choice.
Such cases may in fact be in the process of becoming more common, as
personal computers/workstations become a viable alternative for less
demanding tasks.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Multimate International    52 Oakland Ave North    E. Hartford, CT 06108

chris@umcp-cs.UUCP (Chris Torek) (09/25/86)

In article <7847@lanl.ARPA> jlg@a.UUCP (Jim Giles) writes:
>The MRU [Most Recently Used] algorithm (as far as I can see) does
>the following:
>    Assume M pages of data are to be used cyclically and only N pages
>    will fit.  Initially, the first page is loaded and the code begins
>    executing.  At the end of page 1, the next page is loaded, and so
>    on for N pages.  Now you have a choice to make, do you flush page
>    1 or page N to make room for N+1.  The MRU algorithm presumably
>    flushes page N (the most recently used).  And so you continue up
>    to page M.  Now on the second cycle through the data, the first N-
>    1 pages are already there, so no page fault occurs until page N is
>    needed.  Then a page fault occurs for every page until page M.
>
>This doesn't seem optimal to me.  Or did I get the algorithm wrong?

Yes, in two important ways: pageouts occur not when free pages are
exhausted, but when free pages are low; and pageins are performed
in advance if you have marked your process cyclic.  Let us assume
that N pages will fit, but that the OS wants to keep F pages free
and that it operates in `clusters' of C pages.  Initially, the
first C pages are loaded, then the next, and so on for N-F pages.
At this point pages N-F-2C through N-F-C+1 are flushed, with N-F
through N-F+C-1 being paged in.  The last few (V) pages are not
marked valid.  When the process touches page N-F+C-V, those V pages
are marked valid and the OS begins bringing in pages N-F+C through
N-F+2C-1, simultaneously paging out pages N-F-C through N-F-1.

To put it another way, the OS will try to keep a range of 2C pages
avialable.  If C is large enough, the only price is for the extra
faults that the O/S needs to discover that you are indeed about to
use the next group of pages.  (This is not necessarily negligible.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu