[comp.arch] 64-bit addresses

shekita@provolone.cs.wisc.edu (E Shekita) (02/10/90)

Could any of you hardware designers in the trenches out there 
tell me whether 64-bit addresses will become reality anytime soon.
If so, how soon. If not, why... And how about in the distant future, say,
10 years, which is virtually an eternity in hardware design.

Here are UW we have some database applications that might benefit 
from 64-bit addresses. We were wondering whether a research
design based on the assumption of 64-bit addresses in the near
future would be too optimistic.

thanks in advance -- Gene

joel@cfctech.cfc.com (Joel Lessenberry) (02/10/90)

In article <9708@spool.cs.wisc.edu> shekita@provolone.cs.wisc.edu (E Shekita) writes:
>Could any of you hardware designers in the trenches out there 
>tell me whether 64-bit addresses will become reality anytime soon.
>If so, how soon. If not, why... And how about in the distant future, say,
>10 years, which is virtually an eternity in hardware design.
>
	The IBM AS/400 processor with Single level storage utilizes 
	a 64 bit virtual address space.

 Joel Lessenberry, Distributed Systems | +1 313 948 3342
 joel@cfctech.UUCP                     | Chrysler Financial Corp.
 joel%cfctech.uucp@mailgw.cc.umich.edu | MIS, Technical Services
 {sharkey|mailrus}!cfctech!joel        | 2777 Franklin, Sfld, MI

baum@Apple.COM (Allen J. Baum) (02/10/90)

[]
>In article <20270@cfctech.cfc.com> joel@cfctech.cfc.com (Joel Lessenberry) writes:
>>Could any of you hardware designers in the trenches out there 
>>tell me whether 64-bit addresses will become reality anytime soon.
>>If so, how soon. If not, why... And how about in the distant future, say,
>>10 years, which is virtually an eternity in hardware design.

The HP precision has a 48 or 64 bit segmented address space. (2^32 byte segs)
The TERA computer will have 48 bit flat address space.

--
		  baum@apple.com		(408)974-3385
{decwrl,hplabs}!amdahl!apple!baum

jkenton@pinocchio.Encore.COM (Jeff Kenton) (02/10/90)

In article <9708@spool.cs.wisc.edu> shekita@provolone.cs.wisc.edu (E Shekita) writes:
>Could any of you hardware designers in the trenches out there 
>tell me whether 64-bit addresses will become reality anytime soon.
>If so, how soon. If not, why... And how about in the distant future, say,
>10 years, which is virtually an eternity in hardware design.
>

Here's a guess from the land of software, related to a discussion I had
yesterday with a friend (thanks Carl):

	o  When the change comes, it will be to 64 bits -- not 40, 48 or 60.

	o  Multi-processor systems are starting to overflow the 32 bit
		address space even now.

	o  Despite this, in the short term we will see patches instead of full
		scale solutions:

			o  32 bit processors for several more years.

			o  MMU's with a small number of extra address bits,
				extending total memory but keeping the
				32 bit limit for single processes.

	o  Within 5 years ( 2 or 3 ? ) we will certainly have 64 bit logical
		address space, along with 64 bit registers.


As long as we're inventing the future here, let's pose an extra question:

	When compute speed and disk and network speed have increased 1000
	times, and memory and disk capacity have done the same (5 years ?),
	how will it change what we do (and how we do it) with computers.
	And what new peripherals will we need to interact with?


Post your thoughts.

jkrueger@dgis.dtic.dla.mil (Jon) (02/12/90)

A question:

Are 64 bit spaces being held back by hardware: complexity, fabrication,
critical paths?  Or software (what we used to call systems software):
virtual memory operating systems that can give processes 64 bit virtual
spaces at acceptable cost?  E.g. without having to create 2 MB of
resident page tables per process, allow something lazier instead.

Anyone know which is the culprit here?

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.

nelson@udel.EDU (Mark Nelson) (02/12/90)

In article <11112@encore.Encore.COM> jkenton@pinocchio.UUCP (Jeff Kenton) writes:
>In article <9708@spool.cs.wisc.edu> shekita@provolone.cs.wisc.edu (E Shekita) writes:
>>Could any of you hardware designers in the trenches out there 
>>tell me whether 64-bit addresses will become reality anytime soon.
>>If so, how soon. If not, why... And how about in the distant future, say,
>>10 years, which is virtually an eternity in hardware design.
>>
>
>Here's a guess from the land of software, related to a discussion I had
>yesterday with a friend (thanks Carl):
>
>	o  When the change comes, it will be to 64 bits -- not 40, 48 or 60.
>
I would personally prefer a 64 bit word size with either straight
48 bit addresses or else 64 bit addresses with the top 16 bits used
as a segment number.  Why?  Because it is very hard to fit
64 bits of address into a 64 bit instruction and still have room
for e.g. the opcode.  I'm a real proponent of fixed size instructions,
and I don't think two instructions should be required just to load
an address.

Actually, I'll go further and say I'm not sure that a 64 bit machine
really needs to fully support 64 bit integers.  There should definitely
be 64 bit add/subtract, boolean, shifts, and anything else which scales
linearly with word size, but I think the largest integer multiply
supported need only be 48x48 -> 96, with the corresponding divide
96/48 -> 48,48.  A 48 bit multiplier is huge, let alone a 64 bit,
and this would let the floating point multiplier handle integer
multiplies (assuming a 16 bit exponent 48 bit fraction floating
point representation).

Does anyone have an application which frequently multiplies 49-64
bit integers?  I'm not talking about indefinite precision arithmetic,
but about numbers of exactly that many bits.  Please let me know.

Of course, I'm not an electrical engineer, and I'm sure that device
densities will probably get to the point where nobody worries about
space for an extra 64x64 multiplier.

wayne@dsndata.uucp (Wayne Schlitt) (02/12/90)

In article <753@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
> 
> Are 64 bit spaces being held back by hardware: complexity, fabrication,
> critical paths?  Or software (what we used to call systems software):
> virtual memory operating systems that can give processes 64 bit virtual
> spaces at acceptable cost?  

[ blatant opinion warning ]

i believe, and i have been saying it for about 5 years now, that 64
bit computers are not going to be common for a _long_ time to come.
(5 years ago when 32 bit micros started to become available a fair
number of people were real excited about when the 64 bit micros were
going to be here and wondering if they should skip the 32 bit micros
and wait for the 64 bit micros.  *gack*)

of course before you can argue about when 64 bit computers are common,
you must first define what a "64 bit computer" is, and what it means
to be "common".  to me, a computer system has become common when you
can buy a usable system for less than $10k.  i believe the different
parts of the computer will become "64 bit" at different times.  the
following is a list of the parts that i can think of and blatant
guesses about when i expect them to become 64 bit.

integer registers (ALU) 20-30 years
floating point regs     64 bit now, 128 bit in the next 3-10 years.
                        you will never see more than 128 bits.
address registers       10-20 years
virtual address space   the 386 has a 48 bit virtual address space via
                        segments now, but that's only because it was
                        segmented to begin with.  other architectures
                        like the 68k and most risc's will be forced to
                        have segment registers in the next 5-15 years.
                        HPPA (HP's risc) have them now.
physical address space  2-5 years for 48 bit, 5-20 for a full 64 bit.
data path               64 bit memory paths would be used to fill
                        caches quicker, but you might see harvard
                        style separate 32 bit data and instruction
                        paths coming off chip first.  from the
                        software point of view, this is the _least_
                        noticeable and when it will come around depends
                        on how well chip designers are at getting lots
                        of pins on the chips.
instruction path        see data path

the reason for all of this is simply practicalities.  8 bit cant hold
the time of day.  8 bit computers didnt last that long.  16 bits are
much better, but lots of numbers go beyond 5 digits and 64k is much
too small for program code.  64k isnt too bad for array sizes, but it
gets in the way fairly often.  32 bits can hold the income in pennies
of 99% of companies in the world.  very few programs need more than 4
gigabytes in either code or data.  sure you can do things like mapping
your 20 gig of disc into memory, but do you _really_ need to?

this all gets back to the risc idea of studying what you really need
and seeing if the extra stuff will slow you down.  although i am not
an expert, i would be willing to bet that 64 bit paths, registers and
ALU's are going to be _very_ expensive in terms of chip area and
speed.  you are going to see a lot more benefit from making 32 bit
computers faster than by making 64 bit computers.

using a regular 32 bit computer you can do 64 bit adds and subtracts
quite quickly using two instructions instead of one.  64 bit shifts
are a little bit harder depending on your instruction set, but on the
68k it only takes about a dozen instructions.  64 bit multiplies and
divides are slow, but hey, they are going to be slow anyway.  most of
the multiplies are going to be for array indexing anyway and you can
get around that via shifts, either explicit or implied.

i just dont believe that 64 bit flat address spaces are going to be
here any time soon.  in order to have a flat 64 bit address space you
are going to have to be able to do 64 bit address manipulation, which
means 64 bit ALU's and registers.  for the few times that you really
need 64 bits, you are putting a very large burden on the rest of the
system.

even now, the 386 is most run in 16 bit mode and people are living
with it long after it has become a problem.  the reason why they are
living with it is because of all the software that can actually run
and be useful in 16 bit's (and 20 bits of addressing).  just think
about how much software there is going to be when 32 bits is starting
to run out of steam.  ibm mainframes use 32 bits, and have been for 20
years and i dont see too much push for a full 64 bits computer even
there.  (mainframes do have 64/128 bit data paths and such, but from
the software's point of view, that doesnt make any difference).

i would love to see some more _fact_ to confirm or refute my
arguments.  these are mostly gut feelings that i have had for a long
time.  

-wayne

lm@snafu.Sun.COM (Larry McVoy) (02/12/90)

In article <11112@encore.Encore.COM> jkenton@pinocchio.UUCP (Jeff Kenton) writes:
>As long as we're inventing the future here, let's pose an extra question:
>
>	When compute speed and disk and network speed have increased 1000
>	times, and memory and disk capacity have done the same (5 years ?),
>	how will it change what we do (and how we do it) with computers.
>	And what new peripherals will we need to interact with?
>
>Post your thoughts.

OK, here's a thought.   Compute speed I see going up.  Network speed I see
going up (assuming we dump the current implementation of ethernet).  Disk
speed is another question.  Yeah, yeah, I know I'm going to get flamed by
the disk farm people.  You know, disk farms are cool, they make dd(1)'s go
really fast.  They don't do sh*t for the read of that first byte, in fact
they probably make it worse since there's an extra layer of software.  I
don't see disks getting significantly faster any time soon.

---
What I say is my opinion.  I am not paid to speak for Sun, I'm paid to hack.
    Besides, I frequently read news when I'm drjhgunghc, err, um, drunk.
Larry McVoy, Sun Microsystems     (415) 336-7627       ...!sun!lm or lm@sun.com

johnl@esegue.segue.boston.ma.us (John R. Levine) (02/12/90)

In article <753@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
>A question:
>
>Are 64 bit spaces being held back by hardware: complexity, fabrication,
>critical paths?  Or software ...

I suspect it's two things, both mostly software.  Hardware is no problem,
many new chips have 64 bit buses.  One problems is the enormous amount of
software written for 32 bit machines, the other is the lack of perceived
need.  Although the largest systems like IBM 3090s are running out of
address space, it is just these systems that have the largest pile of crufty
old software, so in the case of the 3090 we get ESA mode which is a
impressively kludgy segmented addressing scheme.

For smaller machines, e.g. workstations which are mostly programmed in C and
Fortran and are so less wedded to a particular architecture, most are a long
way away from overflowing a gigabyte address space.  I've never seen as much
as 2^27 bytes of memory on a workstation and that's still 4 or 5 bits away
from running out of address space.

I realize that it'd be nice to map your 4GB disk drive but hacks like the
386's segmented addressing provide address spaces considerably larger than
32 bits without having to go to full 64 bit addressing.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
"Now, we are all jelly doughnuts."

csimmons@jewel.oracle.com (Charles Simmons) (02/12/90)

In article <WAYNE.90Feb11165528@dsndata.uucp>, wayne@dsndata.uucp (Wayne
Schlitt) writes:
> of course before you can argue about when 64 bit computers are common,
> you must first define what a "64 bit computer" is, and what it means
> to be "common".  to me, a computer system has become common when you
> can buy a usable system for less than $10k.  i believe the different
> parts of the computer will become "64 bit" at different times.  the
> following is a list of the parts that i can think of and blatant
> guesses about when i expect them to become 64 bit.

While not common, 64-bit computers aren't completely uncommon.  I'm
sure someone will correct me if I'm wrong about the Cray being a 64-bit
computer.

Also, there's the NCube 2.  This little beastie implements an ALU, FPU,
memory controller, and 28 bit-serial DMA channels in about 250,000
transistors.  There are 16 64-bit registers that can hold either floating
point values or integer values.  You could probably buy these for
around $10,000 per node if you were willing to settle for 4 Megabytes
of memory per node.  [You can glue up to 8192 of these nodes into a
hypercube.]

> integer registers (ALU) 20-30 years
> floating point regs     64 bit now, 128 bit in the next 3-10 years.
>                         you will never see more than 128 bits.

There is something to be said for having truly general purpose 64-bit
registers that can hold either floating point values or integer values.
Now that FPUs and ALUs are being implemented on the same chip, maybe
it would make sense to implement a single register set instead of
two separate register sets.  It would probably simplify certain aspects
of register allocation in compilers.  Of course, the 88K seems to have
kept two separate register sets...

> -wayne

-- Chuck

mshute@r4.uucp (Malcolm Shute) (02/12/90)

Of course, once we are all programming our massively parallel MIMD machines,
addresses will have to include the name of the processor, as well as the
name of the data item on that processor.  These two addresses (the former
being p-bits long, and the latter d-bits) could be combined either as a
single address in (p+d)-bit address-space, or as p-bits of segment information
into 2^d sized segments

Malcolm Shute.         (The AM Mollusc:   v_@_ )        Disclaimer: all

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (02/12/90)

In article <10795@snow-white.udel.EDU> nelson@udel.edu (Mark Nelson) writes:

| Does anyone have an application which frequently multiplies 49-64
| bit integers?  I'm not talking about indefinite precision arithmetic,
| but about numbers of exactly that many bits.  Please let me know.

  Well, yes, but I freely agree that the operation is some 55 bits x 4
bits or things like that, since the result doesn't overflow on a Cray. I
like the idea of using part of the FPU, but I am not sure the gates
saved would justify the complexity, or that gates would be saved at all.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
            "Stupidity, like virtue, is its own reward" -me

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (02/13/90)

In article <WAYNE.90Feb11165528@dsndata.uucp> wayne@dsndata.uucp (Wayne Schlitt) writes:

|                                                                 the
| following is a list of the parts that i can think of and blatant
| guesses about when i expect them to become 64 bit.
| 
| integer registers (ALU) 20-30 years

  I have a hard time believing that anyone will not have the int able to
hold a physical address, so this doesn't match your prediction below. I
say 5 years.

| floating point regs     64 bit now, 128 bit in the next 3-10 years.
|                         you will never see more than 128 bits.

  I hate to say never, but It's hard to imagine a good use for it, other
than to cover the sins of bad numerical analysis. Lots of bits can hold
off roundoff. I did note that when we moved applications from Honeywell
to IBM and Cray, the IBM couldn't do some programs without recoding the
algorithm, and the Cray needed d.p. for some. This is due to the 72 bit
d.p. on the Honeywell.

| address registers       10-20 years

  Again, I think 5, and the arithmetic and address registers will be the
same size.

| virtual address space   the 386 has a 48 bit virtual address space via
|                         segments now, but that's only because it was
|                         segmented to begin with.  other architectures
|                         like the 68k and most risc's will be forced to
|                         have segment registers in the next 5-15 years.
|                         HPPA (HP's risc) have them now.

  I think you're right on about 48 bits, I'm not sure if seg regs will
come or just 64 bit addressing.

| physical address space  2-5 years for 48 bit, 5-20 for a full 64 bit.

  I think you were talking typical, and I doubt this. Typical is 24 bits
used (16MB) and large is 28 bits (256MB). Machines like Convex and MIPS
are using 32, but few machines are actually configured that way.

  Assuming that memory drops in cost by 1 bit every three years, the big
workstations will not run out of 32 bit addressing before the millenium.
If the same CPUs are used in personal applications (a distinct
possibility) then their growth will push past 32 bits. My guess is that
8 years for 48 bits, 12 for 64.

| data path               64 bit memory paths would be used to fill
|                         caches quicker, but you might see harvard
|                         style separate 32 bit data and instruction
|                         paths coming off chip first.  from the
|                         software point of view, this is the _least_
|                         noticeable and when it will come around depends
|                         on how well chip designers are at getting lots
|                         of pins on the chips.
| instruction path        see data path

  You make a very good point here, but I think the problem will be
solved quickly, because memory speed is not changing fast enough, and
because wide busses are probably cheaper to add than memory speed. I
predict 5 years on this one.

  How about a "popsicle" chip carrier, with connections on the bottom
and four faces of a chip, and a "stick" on top to ease removal? ;-)/2
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
            "Stupidity, like virtue, is its own reward" -me

jgk@osc.COM (Joe Keane) (02/13/90)

In article <10795@snow-white.udel.EDU> nelson@udel.edu (Mark Nelson) writes:
>I would personally prefer a 64 bit word size with either straight 48 bit
>addresses or else 64 bit addresses with the top 16 bits used as a segment
>number.

I view with great suspicion any comment which says that smaller or segmented
addresses are desirable.  Let's see the reasoning...

>Why?  Because it is very hard to fit 64 bits of address into a 64 bit
>instruction and still have room for e.g. the opcode.

This assumes that you want to put all your constant data in the instruction
stream.  Back on the PDP-11 it was a neat idea to use `PC auto-increment' to
get constants, because registers were so scarce.  Then on the VAX this idea
got bloated so the instruction stream contained all sizes of `immediate' data
plus three different sizes of offsets.  Did you know there's a format where a
floating point number takes up six bits?  (I'm using the PDP-11 and VAX only
as representative examples.)

If you think about it, there's no good reason to put instructions and constant
data together.  In fact, there are good reasons not to; on a pipelined machine
(any machine these days) you don't have to spend silicon trying to figure out
whether a particular word (or byte on the VAX) is going to be an instruction
or data.  If you've got good (small and fast) load instructions, use them.
Take another register and have it point to the `constant pool'.  I don't think
you'll miss this one register.

>I'm a real proponent of fixed size instructions, and I don't think two
>instructions should be required just to load an address.

I heartily agree with this.

>Actually, I'll go further and say I'm not sure that a 64 bit machine really
>needs to fully support 64 bit integers.  There should definitely be 64 bit
>add/subtract, boolean, shifts, and anything else which scales linearly with
>word size, but I think the largest integer multiply supported need only be
>48x48 -> 96, with the corresponding divide 96/48 -> 48,48.

You can do that, but what you get is not completely a 64-bit machine.  It's
somewhere between 48-bit and 64-bit, which in my opinion isn't desirable.  For
a while i've been tempted to design a machine with something like 67-bit
instructions, 53-bit integers, and 47-bit pointers.  I'd be neat, but somehow
i don't think it'd catch on.

dmocsny@uceng.UC.EDU (daniel mocsny) (02/13/90)

In article <WAYNE.90Feb11165528@dsndata.uucp> wayne@dsndata.uucp (Wayne Schlitt) writes:
> very few programs need more than 4
>gigabytes in either code or data.  sure you can do things like mapping
>your 20 gig of disc into memory, but do you _really_ need to?

Perhaps you mean to say, "very few programs *are able* to take
advantage of 4 gigabytes in either code or data, and this is due to
only arithmetic increases in the software industry's ability to
increase the scope and complexity of its code, while simultaneously
preserving or improving reliability and usability."

I have every reason to believe that I could benefit from a
well-written program that required >4 GB address space. However, given
the shortage of people with the type of scalable problems of low
Kolmogorov complexity that can readily expand to fill such an
architecture, I expect to see the "software gap" widening
continuously, until it becomes the main bottleneck to further progress
in the computer industry. Might we get to the point where
fantastically powerful hardware is widely available at practically no
cost, but the entire human race grinds to a standstill because all
available intellectual capacity must grapple with trying to program
it?

Dan Mocsny
dmocsny@uceng.uc.edu

henry@utzoo.uucp (Henry Spencer) (02/13/90)

In article <753@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
>Are 64 bit spaces being held back by hardware: complexity, fabrication,
>critical paths?  Or software (what we used to call systems software):
>virtual memory operating systems that can give processes 64 bit virtual
>spaces at acceptable cost?...

Neither, really.  They're being held back by limited customer demand.
Cranking up all the pointers to 64 bits -- and probably all the integers
too, given the sloppy assumptions in a lot of code -- will cost quite a
bit in space even if the time cost is nil.  Also, given certain other
sloppy assumptions, porting to such a machine will be a lot of work.
There are relatively few applications that would find this cost-effective
right now; *most* programs find 32 bits ample.
-- 
SVR4:  every feature you ever |     Henry Spencer at U of Toronto Zoology
wanted, and plenty you didn't.| uunet!attcan!utzoo!henry henry@zoo.toronto.edu

dgr@hpfcso.HP.COM (Dave Roberts) (02/13/90)

If a chip production and packaging person is reading this, their head
is probably spinning.  All those signals coming out of a chip will take
a lot of chip area around the edge of a die, hence driving physical die
sizes up (and chip yields down).  They will also take a good deal of
engineering to package them.  I don't doubt that we can eventually do it,
but there are problems other than just if people need the space.
Note that this assumes that you have true 64 bit flat addresses and
64 bit data registers.

The problem isn't just that you have the extra 128 lines (64 A, 64 D) but
that you also need the extra powers and grounds to drive all of it and
not have the whole chip bounce like super ball when you put out an
address and data.

Because of this, I don't think that you'll be seeing true 64 bit address
and data for a while, at least not in the easily available range of
pricing where standard PGA microprocessors are today.

Dave Roberts
Hewlett-Packard Co.
dgr@hpfcla

amos@nsc.nsc.com (Amos Shapir) (02/13/90)

In article <WAYNE.90Feb11165528@dsndata.uucp> wayne@dsndata.uucp (Wayne Schlitt) writes:
>instruction path        see data path
>

Not so fast.  Each byte in the instruction space was put there by a human,
or translated from code written by a human. I-space is the only(?) factor
whose growth will always be linear rather than exponential.

-- 
	Amos Shapir
National Semiconductor, 2900 semiconductor Dr.
Santa Clara, CA 95052-8090  Mailstop E-280
amos@nsc.nsc.com or amos@taux01.nsc.com

tihor@acf4.NYU.EDU (Stephen Tihor) (02/13/90)

My work with NYU Academnic Computing Facility's computer of the month
club membership (actually more of a mini-super of the month) has
exposed me to a number of vendors systems who tried to provide 64 bits
or better (because their FP was 64 bits or better and 64 bit integers
were cheap and very useful for their target codes).

Most of these systems have failed to support the 64bits for long (ELXSI
is partcularlt pblatant example.) Their problems were mostly the growth
of UNIX with the co-comittant use of C and the lousy portability of BSD
(although much GNUcode is also written with the ingrained knowledge
that int = long = 32bits, etc.  And FORTRAN deserves and least a third
of the blame.

mo@flash.bellcore.com (Michael O'Dell) (02/13/90)

Prisma's C compiler had (long)==int32 and (long long)==(int64)

making (long)==(int64) is just asking for trouble.

	-Mike

PS - yes, (long long) is the recommended type.
	-Mike O'Dell

"I can barely speak for myself, much less anyone else!"
----------------------------------------
The Center for Virtual Reality --
"Solving yesterday's problems tomorrow!"

cik@l.cc.purdue.edu (Herman Rubin) (02/13/90)

In article <13589@nsc.nsc.com>, amos@nsc.nsc.com (Amos Shapir) writes:
> In article <WAYNE.90Feb11165528@dsndata.uucp> wayne@dsndata.uucp (Wayne Schlitt) writes:
> >instruction path        see data path
> >
> 
> Not so fast.  Each byte in the instruction space was put there by a human,
> or translated from code written by a human. I-space is the only(?) factor
> whose growth will always be linear rather than exponential.

It is frequently the case that very long loops can be unrolled.  Also, in
some heavily branched situations, it is possible to combine multiple branches
as a single branch, and to use a transfer table.  It may even be advantageous
to have a fixed spacing between the branch addresses, which can add size to
the instruction code.  Thus, code written by a human can easily translate
into 2^20 or more instructions.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu (Internet, bitnet, UUCP)

jkrueger@dgis.dtic.dla.mil (Jon) (02/13/90)

dmocsny@uceng.UC.EDU (daniel mocsny) writes:

>Perhaps you mean to say, "very few programs *are able* to take
>advantage of 4 gigabytes in either code or data, and this is due to
>only arithmetic increases in the software industry's ability to
>increase the scope and complexity of its code, while simultaneously
>preserving or improving reliability and usability."

Nah, must be quantum leaps in our ability to produce space-efficient
codes :-)  :-)  :-)  :-)  :-)

>I have every reason to believe that I could benefit from a
>well-written program that required >4 GB address space. However, given
>the shortage of people with the type of scalable problems of low
>Kolmogorov complexity that can readily expand to fill such an
>architecture, I expect to see the "software gap" widening
>continuously, [until terrible things happen]

Sounds pretty scary.  Now, you'd never guess it from what appears
in comp.arch, but the chief use of cycles is pushing around 8 bit
unsigned quanitities that by convention stand for printable symbols
representing a curious code called the "alphabet".  So my challenge
to you (or anyone) is to tell me what your word processing would do
with 32+ addressing bits?  Ground rules: no Emacs jokes, can only
include the space costs of support tools when clearly part of the
word processing, and no credit for mere time advantages.  Well?

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.

mcdonald@aries.scs.uiuc.edu (Doug McDonald) (02/14/90)

In article <757@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:
>
>Sounds pretty scary.  Now, you'd never guess it from what appears
>in comp.arch, but the chief use of cycles is pushing around 8 bit
>unsigned quanitities that by convention stand for printable symbols
>representing a curious code called the "alphabet".  So my challenge
>to you (or anyone) is to tell me what your word processing would do
>with 32+ addressing bits?  Ground rules: no Emacs jokes, can only
>include the space costs of support tools when clearly part of the
>word processing, and no credit for mere time advantages.  Well?
>

You have left yourself wide open to an embarrassing attack on
your ability to come up with useful prognostications of the future.
You are in good company, along with e.g. IBM, who poo-pooed the 
Xerox machine, and those who claimed that the world would never need
more than 10 computers.

The answer to your question is simple: my word processor with 32+
address bits would take my spoken dictation in English and 
send it to the printer (or direct over the network, or whatever -
my crystal ball is hazy here) with all the spelling errors fixed,
in English, German, and Japanese.

Doug McDonald

jonah@db.toronto.edu (Jeffrey Lee) (02/14/90)

jkrueger@dgis.dtic.dla.mil (Jon) writes:
>Sounds pretty scary.  Now, you'd never guess it from what appears
>in comp.arch, but the chief use of cycles is pushing around 8 bit
>unsigned quanitities that by convention stand for printable symbols
>representing a curious code called the "alphabet".

Or at least until a few years ago.  Now most of the machines that I use
spend most of their time pushing around 1 to 32 bit unsigned quantities
that by convention stand for "pixels" so that *I* can spend most of my
time shuffling printable symbols representing a curious code called the
"alphabet".

>                                                    So my challenge
>to you (or anyone) is to tell me what your word processing would do
>with 32+ addressing bits?  Ground rules: no Emacs jokes, can only
>include the space costs of support tools when clearly part of the
>word processing, and no credit for mere time advantages.  Well?

Picture a "docuverse" [TM Xanadu Corp.] where all the machine readable
text in existence can be mmap()ed into your address space.  Mind you
even 64-bits isn't enough -- which is why Ted Nelson dreamed up
"tumblers" [Thedore H. Nelson, "Managing Immense Storage", in BYTE,
January 1988, pp225-238].  Why isn't 64-bits enough?  Because it only
allow for 4GB of shared "global" space per Internet host.

More plausibly, shared virtual address spaces (or *shudder* segments)
may allow programs and processors to exchange data with something akin
to a page-level equivalent of memory caching.  The word processor of
the future will *not* stand alone, but will be an integrated piece of
your environment -- potentially communicating with anything that wants
to edit or display text.  To which end, we need to ensure that the
systems of the future can support the following *very* cheaply:

	multi-processing and context switching
	inter-process communication / remote procedure calls
	"network services"

Of these, multi-processing and context switching, and IPC / RPC appear
to be comp.arch issues.  The use of a shared 64-bit address space is
merely one approach to IPC.

Jeff Lee -- jonah@cs.toronto.edu

jkrueger@dgis.dtic.dla.mil (Jon) (02/14/90)

jonah@db.toronto.edu (Jeffrey Lee) writes:

>Picture a "docuverse" [TM Xanadu Corp.] where all the machine readable
>text in existence can be mmap()ed into your address space.  Mind you
>even 64-bits isn't enough

Why would anyone want the docuverse mapped into address space?  It fits
into namespace.  The addressible unit of namespace is the document, not
any point in its contents.

Or do you really want to go back to the day when since name sizes were
tied to machine architecture?  As I remember, the 36 bit machines
tended to impose six character limits (times six bits per character).

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.

wilkes@mips.COM (John Wilkes) (02/14/90)

In article <8840009@hpfcso.HP.COM> dgr@hpfcso.HP.COM (Dave Roberts) writes:
>
>Because of [the amount of pins required] I don't think that you'll be
>seeing true 64 bit address and data for a while, at least not in the
>easily available range of pricing where standard PGA microprocessors are
>today.
>
>Dave Roberts

Perhaps this will become one of the primary differences between Killer
Micros and traditional Big Iron?

-wilkes
-- 

John Wilkes

wilkes@mips.com   -OR-   {ames, decwrl, pyramid}!mips!wilkes

lindsay@MATHOM.GANDALF.CS.CMU.EDU (Donald Lindsay) (02/14/90)

In article <8840009@hpfcso.HP.COM> dgr@hpfcso.HP.COM (Dave Roberts) writes:
>All those signals coming out of a chip will take
>a lot of chip area around the edge of a die, hence driving physical die
>sizes up (and chip yields down).

The "small" system of the future would map a 48-bit or 64-bit virtual
address to a 32-bit physical address. With an on-chip MMU, the pin
count would be the same as it is now.
-- 
Don		D.C.Lindsay 	Carnegie Mellon Computer Science

paulr@mips.COM (Paul Richardson) (02/14/90)

In article <35954@mips.mips.COM> wilkes@mips.COM (John Wilkes) writes:
>In article <8840009@hpfcso.HP.COM> dgr@hpfcso.HP.COM (Dave Roberts) writes:
>>
>>Because of [the amount of pins required] I don't think that you'll be
>>seeing true 64 bit address and data for a while, at least not in the
>>easily available range of pricing where standard PGA microprocessors are
>>today.
>>
>>Dave Roberts
>
>Perhaps this will become one of the primary differences between Killer
>Micros and traditional Big Iron?
>
>-wilkes
>-- 
>
>John Wilkes
>
>wilkes@mips.com   -OR-   {ames, decwrl, pyramid}!mips!wilkes


I think Real I/O (inside joke for above 2 folx) will be the other acid test
separating Big Iron from Killer Micro-based machines

---
-- 
/pgr
"A smilin' face,a thumpin' bass, for a lovin' race" - Jazzy B (soul II soul)
{ames,prls,pyramid,decwrl}!mips!paulr or paulr@mips.com

iyengar@grad2.cis.upenn.edu (Anand Iyengar) (02/18/90)

In article <29718@brunix.UUCP> phg@cs.brown.edu (Peter H. Golde) writes:
>One might note that implementing a 64-bit address space
>with 64 KB pages and 8 bytes/page in the page table requires
>2 billion megabytes of page-table space.  Hmmmmm.....
	This is for the final level of the translation, and it need not be
fully resident (!).  With each successive (K step) indirection you'd add
2^(64 - 13*k) { 1 <= K <= 4 } bytes in page tables.  Only the highest
level (lowest K) need be loaded.  However, you do buy the space with time
(successive look-ups).  

	What's the general feeling on future page sizes (I'm guessing that
64K was picked here to make the page-tables smaller)?  Is it reasonable
to assume that they are going to increase also (as bandwidth/latency
increases), or will we lose more by making pages larger?  

							Anand.  

--
"Surely you're not happy, you no longer play the game..."
{inter | bit}net: iyengar@eniac.seas.upenn.edu
uucp: !$ | uunet
--- Lbh guvax znlor vg'yy ybbx orggre ebg-guvegrrarg? ---

nelson@udel.EDU (Mark Nelson) (02/21/90)

In article <1990Feb9.235927.5984@ultra.com> ted@ultra.com (Ted Schroeder) writes:
>
>Use a Cray, they've got 64 bit addressing now.
>
>      Ted Schroeder                   ted@Ultra.com
>
>Disclaimer:  I don't even believe what I say, why should my company?

It's a good thing you don't believe what you say, because this is false.
Crays have either 24 or 32 bit addressing:

Cray-1:     24
Cray X/MP:  24
Cray-2:     32
Cray Y/MP:  24-bit Instruction, 32-bit Data
-- 

Mark Nelson                   ...!uunet!udel!nelson or nelson@udel.edu
This function is occasionally useful as an argument to other functions
that require functions as arguments. -- Guy Steele, Jr.

jgk@osc.COM (Joe Keane) (02/21/90)

In article <2027@osc.COM> i write:
>If you think about it, there's no good reason to put instructions and constant
>data together.  In fact, there are good reasons not to; on a pipelined machine
>(any machine these days) you don't have to spend silicon trying to figure out
>whether a particular word (or byte on the VAX) is going to be an instruction
>or data.  If you've got good (small and fast) load instructions, use them.
>Take another register and have it point to the `constant pool'.  I don't think
>you'll miss this one register.

In article <162@gollum.twg.com> warner@twg.com (Warner Losh) writes:
>I really don't see how this will solve your problems.  You still need a
>constant to offset into the constant pool.  It would also make your compilers
>much harder to produce since they would also have to manage a constant pool.

I should say that when i wrote the original post i was thinking specifically
about the ROMP architecture.  After IBM's recent announcements, it looks like
this will become one of the major architectures of the next couple years.
There's no reason why it shouldn't, even though their first implementation was
lame.  So if you want to know how a `constant pool' works, what a typical
linkage convention is, and how to compile for it, look at the RT.  The
one-line summary is that it's not hard to do, although there's a little bit of
hair for the RT because the BALAX instruction only has a 24-bit range.

>This isn't so bad until you try to optimize for space (why would I want to
>have five copies of the number four anyway) between modules.  I don't see that
>it would buy you anything, at the cost of additional complexity.

I agree this is hard, but it's an interesting optimization and can only
improve your performance.  Of course it's completely impossible if your
constants are embedded in the instruction stream.

>At best the solution to this problem is to always use int constants and not
>try to deal with saving a few bytes here or there.

I'm mostly trying to be a good RISC person and make the CPU simpler and
faster, and if we can save a few bytes along the way, that's good too.

seanf@sco.COM (Sean Fagan) (02/21/90)

In article <1990Feb9.235927.5984@ultra.com> ted@ultra.com (Ted Schroeder) writes:
>Use a Cray, they've got 64 bit addressing now.

Uhm, since when?  A registers are 32-bits (4 Gigawords).  Cray-1's had, I
believe, only 24-bit addressing (I think the A registers were still 32-bits,
though; however, I could be wrong).

-- 
Sean Eric Fagan  | "Time has little to do with infinity and jelly donuts."
seanf@sco.COM    |    -- Thomas Magnum (Tom Selleck), _Magnum, P.I._
(408) 458-1422   | Any opinions expressed are my own, not my employers'.

gd@geovision.uucp (Gord Deinstadt) (02/21/90)

In article <8275@cbnewsh.ATT.COM> dwc@cbnewsh.ATT.COM (Malaclypse the Elder) writes:

>i'm not a 'disk farmer' but with a certain amount of redundancy,
>the read of the first byte can be improved.  in the simplest case,
>if the byte is on two separate disks, then all you have to do is
>wait for the FIRST one to come.  what this does is decrease the
>expected seek and latency times which is the major component of
>of the delay to get the first byte.  and if you explicitly place
>that byte of 'opposite ends' of the two disks, then you have decreased
>the minimum and maximum delay.

From what I have read, I have the impression that the various disks in
the farm are NOT synchronized with each other.  They are just a bunch of
off the shelf drives, so they each rotate independently at their own
rate.  So, even if the blocks were opposite when written, they wouldn't
likely be later.

You could still do it within a single drive, though; just write the data
twice.  You would get a certain amount of fault tolerance, too, though
not as much as a real disk mirror.  Does anybody do this?

Ahhh, probably not... they'd only gain on the rotational latency
anyways, and seek time dominates.
--
Gord Deinstadt   gdeinstadt@geovision.UUCP

serafini@amelia.nas.nasa.gov (David B. Serafini) (02/21/90)

In article <1990Feb9.235927.5984@ultra.com> ted@ultra.com (Ted Schroeder) writes:
>
>Use a Cray, they've got 64 bit addressing now.
>

Not quite.  Cray Y-MP's, 2's and the last of the X's all have 32 bit addresses.
For various implementation reasons, none of them can actually have more than 
2^29 words of memory.  The older X's and 1's had 24 bit addresses.

>      Ted Schroeder                   ted@Ultra.com
>      Ultra Network Technologies      ...!ames!ultra!ted
>      101 Daggett Drive           
>      San Jose, CA 95134          
>      408-922-0100
>
>Disclaimer:  I don't even believe what I say, why should my company?

David B. Serafini			serafini@ralph.arc.nasa.gov
Rose Engineering and Research

Disclaimer: I don't speak for the U.S. Govt, but unfortunately they sometimes
speak for me.

mshute@r4.uucp (Malcolm Shute) (02/21/90)

>In article <36080@mips.mips.COM>, mash@mips.COM (John Mashey) writes:
>> Barry has a good analysis, but I'd observe a few other things:
and proceeds to observe that most people who think they want 64bit
addressing haven't thought through their reasoning, and realised that
they *could* live without it if they redesigned their proposed solution
properly.  But isn't this the point in current computer usage: hardware
is cheap, salaries are expensive.  If there is a 'natural' sledgehammer
approach which will ensure that the job is finished quickly, and
correctly (this follows from it being a 'natural' approach wrt to the
human minds that designed it) then it beats the more thought-intensive
solutions.

However, to change sides now, and argue the other way,
In article <168@csinc.UUCP> rpeglar@csinc.UUCP (Rob Peglar) writes:
>Today, I'm sure codes could use two-digit GB D-spaces (10-99 GB per
>file) if they had it.  [...]

I agree with this completely, as suggested by my paragraph above,
but this is just the first hurdle.  Assuming that it is accepted now
that applications *do* exist which *can* use such amounts of address
space; we now have the problem of establishing that the laws of
physics will *allow* them to use it.  "Barry's Good Analysis",
alluded to above, is a convincing one: how can we expect to see
machines with 64bit adrspc until we have machines that can do useful
things with it in small finite time?

The answer to the original poster's original question must surely
still be: "Not until instruction speeds are of the order of 4.0e9
times faster than they are at present (i.e. to cover 2^64 locations
in about the time that current processors take to cover 2^32 locations)",
or "Not until we can get 4.0e9 times as many processors sharing the
job of massaging all of those locations".

Malcolm Shute.         (The AM Mollusc:   v_@_ )        Disclaimer: all

ps@fps.com (Patricia Shanahan) (02/22/90)

In article <29718@brunix.UUCP> phg@cs.brown.edu (Peter H. Golde) writes:
>One might note that implementing a 64-bit address space
>with 64 KB pages and 8 bytes/page in the page table requires
>2 billion megabytes of page-table space.  Hmmmmm.....
>
>--Peter Golde


If the real memory is small compared to the address space, don't use page
tables. If you must use page tables, design them as sparse arrays.
If the real memory is large, you would probably want a bigger page
size anyway.

Those systems I am familiar with that seemed to have a good fit between
page size and memory size had the page size approximately the square root
of the memory size of a largish system. Anyone know of any counter-examples?
--
	Patricia Shanahan
	ps@fps.com
        uucp : {decvax!ucbvax || ihnp4 || philabs}!ucsd!celerity!ps
	phone: (619) 271-9940

gerry@zds-ux.UUCP (Gerry Gleason) (02/22/90)

In article <168@csinc.UUCP> rpeglar@csinc.UUCP (Rob Peglar) writes:
>In summary, there are lots of codes that could be "scaled" (e.g. finer
>grids) to consume just about any D-space feasible today.  If the
>vendors want a piece of that action, >32 bits of D-space is necessary.
>Not to mention issues like memory bandwidth, I/O rates, etc.  1/2 :-)

Arn't these typically compute bound problems also?  Doesn't this mean
supercomputers?  Isn't this discussion about high volume microprocessors?

I don't remember who it was that presented the argument based on the
length of time taken to zero, let alone process 4G, but this looks like
a valid line of reasoning.  Just how many MIPS does it take before you
can process a 4G space in less than a week.  I think we're a still far
from that point.

Gerry Gleason

preston@titan.rice.edu (Preston Briggs) (02/22/90)

In article <2054@osc.COM> jgk@osc.osc.COM (Joe Keane) writes:
>In article <162@gollum.twg.com> warner@twg.com (Warner Losh) writes:
>>I really don't see how this will solve your problems.  You still need a
>>constant to offset into the constant pool.  It would also make your compilers
>>much harder to produce since they would also have to manage a constant pool.

"Much harder to write" is an overstatement.  A constant pool adds a little 
bit of extra tedium, but it's no big deal.

>I should say that when i wrote the original post i was thinking specifically
>about the ROMP architecture.  After IBM's recent announcements, it looks like
>this will become one of the major architectures of the next couple years.
>There's no reason why it shouldn't, even though their first implementation was
>lame.

The announcement is for a new architecture (America, Rios, 6000, whatever).
It's not the same as the ROMP.  What will happen to the ROMP and RT?
I've heard they're gone.  Does anybody know?

Preston Briggs
preston@titan.rice.edu

firth@sei.cmu.edu (Robert Firth) (02/22/90)

In article <2054@osc.COM> jgk@osc.osc.COM (Joe Keane) writes:

>I agree this [ pooling constants]
 is hard, but it's an interesting optimization and can only
>improve your performance.  Of course it's completely impossible if your
>constants are embedded in the instruction stream.

Sorry, I don't see that.  Since the average constant is smaller than
the average address, taking constants out of line and pooling them
seems to me a guaranteed pessimisation

(a) you don't save bits in the instruction, and may need more

(b) the extra indirection is one more memory reference, which
    is pure overhead

(c) you have reduced locality by adding a gratuitous reference to
    another part of the address space

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/22/90)

In article <6998@celit.fps.com> ps@fps.com (Patricia Shanahan) writes:
>In article <29718@brunix.UUCP> phg@cs.brown.edu (Peter H. Golde) writes:
>>One might note that implementing a 64-bit address space
>>with 64 KB pages and 8 bytes/page in the page table requires
>>2 billion megabytes of page-table space.  Hmmmmm.....
>If the real memory is small compared to the address space, don't use page
>tables. If you must use page tables, design them as sparse arrays.

1) Several posters have mentioned that there is some unspecified but obvious
(to them) major problem with using inverted page tables together with
memory mapped files.  I wonder if someone could enlighten us regarding
this problem, since apparently it isn't obvious to every system architect :-)

2) It isn't apparent to me that page size is really a function of physical
memory size so much as it is of processing speed.  It might be noted that
the Cyber 205, which had virtual memory and vector processing, used two
page sizes, because potentially the system ate through memory at a rate of
about one new page every 512/3 clock cycles average with "small" (4 KByte)
pages.  Associative register (~TLB) misses cost, on that system, ~150 CPU
cycles, so the overhead was high enough that it was worth it to add
large pages (512 KBytes).  But, if you can keep the overhead on a TLB miss
down, I see no reason why you need larger pages.  Does anyone have any
hard data on how many clock cycles are chewed up on TLB misses on recent
RISC systems? 

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

pcg@aber-cs.UUCP (Piercarlo Grandi) (02/22/90)

In article <38764@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes:
  
  First of all, IBM has patents on their particular implementation of a
  reverse map MMU.  This does not imply they invented it, nor does it

They hve *very* wide ranging claims. Remember also that this is the company
that spent millions of dollars advertising their discovery of virtual memory.

  imply that they believe that they invented it, nor does it mean that
  can do anything to anyone who wants to publish a paper about it.
  
  Secondly, if you can prove that you worked on something and that it
  was in the public domain prior to when IBM claimed they invented it
  (public domain in this context means that it was not kept a trade
  secret, I believe), then you can actually invalidate IBMs patent
  claims, should they conflict.

Yes, spending a few hundred thousand dollars. IBM, if they cared about their
patent, could sue *me*, conceivably, to protect it, if I made such a claim.
  
  Thirdly, I'd like to see the paper...

Not so easy. You join a long queue of people to which I promised a copy
of some paper or other (mine or otherwise). Don't hold your breath. :-(.

  Note that IBMs implementation is a straight reverse map; it comes with a
  'hash anchor table' bit of indirection first, which presumably improves
  performance. This may be a bit of non-obviousness which makes the patent
  really valid.

That's the scary thing. I could not get hold of the IBM patent (all the info
I know comes from a Byte article on the ROMP), but the two key points of my
design are an (actually two) hash table to randomize access to the TLB (an
idea I got for another application, from Prof. Ian Watson of Manchester
University), and an optional indirect token table to support efficiently
segment sharing (yes, you can do that with a reverse map MMU, but it
requires a *very* subtle idea).

-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

rpeglar@csinc.UUCP (Rob Peglar) (02/22/90)

In article <6998@celit.fps.com>, ps@fps.com (Patricia Shanahan) writes:
> In article <29718@brunix.UUCP> phg@cs.brown.edu (Peter H. Golde) writes:
> >One might note that implementing a 64-bit address space
> >with 64 KB pages and 8 bytes/page in the page table requires
> >2 billion megabytes of page-table space.  Hmmmmm.....
> >
> >--Peter Golde
> 
> 
> If the real memory is small compared to the address space, don't use page
> tables. If you must use page tables, design them as sparse arrays.
> If the real memory is large, you would probably want a bigger page
> size anyway.
> 
> Those systems I am familiar with that seemed to have a good fit between
> page size and memory size had the page size approximately the square root
> of the memory size of a largish system. Anyone know of any counter-examples?
> --

The virtual-memory supers in the CDC Star-100/Cyber 20x/ETA-10 line solved
this problem by having multiple page sizes.

The early machines (Stars and 203's) had anywhere from 4 to 8MB of 
physical memory.  VM translation by pages had two flavors, "small"
and "large".  The size of a small page was 4KB; a large, 512KB.  Thus,
the small page was anywhere from .5*sqrt(sizeof(PM)) to .25 (ditto).
The large page was there due to the use of associative registers to
hold the last 16 page table entries.  In an 8MB machine, 16 AR's held
translation for the entire physical memory, when large pages were used.
Any process was allowed to have a any mix of small and large pages.

The later machines were similar, but the OS'es were extended to support
multiple sizes of small pages - 4, 16, and 64KB - selectable at boot
time.  Large pages were still 512KB.  The exception was the ETA-10,
which had two sizes of large pages - the 512KB and the "giant" page,
which was 4MB.  The ETA-10 still had 16 AR's for VM translation;  thus,
16 "giant" pages corresponded to 64MB of physical memory, which was the
base amount for all ETA-10 processors.  NB, the "giant" page was never
given OS support.  

Moral of the story?  As physical memory size grows, virtual page size
*ought to* grow as well.  For simplicity's sake, if nothing else.

Rob
> 	Patricia Shanahan
> 	ps@fps.com
>         uucp : {decvax!ucbvax || ihnp4 || philabs}!ucsd!celerity!ps
> 	phone: (619) 271-9940

-- 
Rob Peglar	Control Systems, Inc.	2675 Patton Rd., St. Paul MN 55113
...uunet!csinc!rpeglar		612-631-7800

The posting above does not necessarily represent the policies of my employer.

johnl@esegue.segue.boston.ma.us (John R. Levine) (02/23/90)

>In article <38764@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes:
>  
>  First of all, IBM has patents on their particular implementation of a
>  reverse map MMU.  This does not imply they invented it, ...

Who did invent reverse map MMUs?  The earliest one I'm aware of is the IBM
System/38, nearly 15 years ago.  Was there an earlier one?  (For that matter,
how did the Atlas' pager work?)
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
"Now, we are all jelly doughnuts."

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/23/90)

In article <986@m1.cs.man.ac.uk> mshute@r4.UUCP (Malcolm Shute) writes:
>>In article <36080@mips.mips.COM>, mash@mips.COM (John Mashey) writes:
>>> Barry has a good analysis, but I'd observe a few other things:
>addressing haven't thought through their reasoning, and realised that

>However, to change sides now, and argue the other way,
>In article <168@csinc.UUCP> rpeglar@csinc.UUCP (Rob Peglar) writes:
>>Today, I'm sure codes could use two-digit GB D-spaces (10-99 GB per
>>file) if they had it.  [...]
>physics will *allow* them to use it.  "Barry's Good Analysis",
>alluded to above, is a convincing one: how can we expect to see
>machines with 64bit adrspc until we have machines that can do useful
>things with it in small finite time?

>still be: "Not until instruction speeds are of the order of 4.0e9
>times faster than they are at present (i.e. to cover 2^64 locations
>in about the time that current processors take to cover 2^32 locations)",
>or "Not until we can get 4.0e9 times as many processors sharing the
>job of massaging all of those locations".

I don't follow this.  A Cray Y-MP memory subsystem already currently has 
40 GBytes/sec of memory bandwidth, and has the processors to chew up most of 
that bandwidth.  If you follow the rule of thumb that the processor should be
able to read/write/zero or whatever the memory in 1-10 seconds, you have 
already reached 100 GBytes.  Today.  With next generation systems
expected to have bandwidth requirements 16 times those of today, you should
be able to address Terabytes of data.  With 16 Mbit DRAMs coming out, you
can expect physical memories to reach 64 GBytes before too long.  

These numbers imply, to me, that next generation architectures need *at least* 
40 bit addressing.  It doesn't make sense to design your system with odd sized
addresses that are just barely big enough- surely we can learn something from
the history of computing.  Using the next sized power of two and keeping
address size consonant with one of the supported integer sizes is, IMHO, the
only sensible thing to do.  

Now, if you add to that the requirement for sparse address spaces, 
due to memory mapped files, distributed applications, global addressing,
and object/capability programming, and 64 bits is actually quite reasonable
*anyway*. 

Now, you may be asking, what do systems with 64 Gbytes of memory have to
do with micros, which, in the same time frame, will have maybe 256 MBytes
of memory?  Answer: If you want your KM to be used in multiple processor
systems with 32-? processors, you will easily exceed the capability of 32 bit
addressing before long.  I note that a number of *Big Iron* folks are already
designing with multiple micros.  I am certain that a system with clean
flexible 64 bit linear addressing will have a significant leg up on the 
competition when it comes to being chosen for next generation parallel 
micro-based systems.

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

baum@Apple.COM (Allen J. Baum) (02/23/90)

[]
>In article <1654@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>In article <38764@apple.Apple.COM> baum@apple.UUCP (Allen Baum) writes:
>  
>  First of all, IBM has patents on their particular implementation of a
>  reverse map MMU.  This does not imply they invented it, nor does it
>
>They have *very* wide ranging claims. Remember also that this is the company
>that spent millions of dollars advertising their discovery of virtual memory.

Um, later on you say you can't get copies of the patents- so how do you
know that the claimes are very wide ranging? I found them fairly limited.

>  Secondly, if you can prove that you worked on something and that it
>  was in the public domain prior to when IBM claimed they invented it
>  (public domain in this context means that it was not kept a trade
>  secret, I believe), then you can actually invalidate IBMs patent
>  claims, should they conflict.
>
>Yes, spending a few hundred thousand dollars. IBM, if they cared about their
>patent, could sue *me*, conceivably, to protect it, if I made such a claim.

Well, no. If you tried to sell a product that they felt conflicted, or
you sued to have their patent deemed invalid, then they'd probably come
after you. Mostly, they just come to you can say: look, we have 50
bazillion patents you are violated (almost certainly true if you look
at what IBM has invented -as opposed to what their marketing people
might claim they invented)- so, we'll let you use ours if you let us
use yours.

In any case, they can't sue if you publish a paper, even if you claim it
is novel, and they claim its patented. We're talking patents here, not
copyright, and the rules are quite different.

>That's the scary thing. I could not get hold of the IBM patent (all the info
>I know comes from a Byte article on the ROMP),

See above

--
		  baum@apple.com		(408)974-3385
{decwrl,hplabs}!amdahl!apple!baum

AI.Gadbois@MCC.COM (David Gadbois) (02/23/90)

    From: mshute@r4.uucp (Malcolm Shute)
    Date: 21 Feb 90 12:22:22 GMT

    >In article <36080@mips.mips.COM>, mash@mips.COM (John Mashey) writes:
    >> Barry has a good analysis, but I'd observe a few other things:
    and proceeds to observe that most people who think they want 64bit
    addressing haven't thought through their reasoning, and realised that
    they *could* live without it if they redesigned their proposed solution
    properly.  But isn't this the point in current computer usage: hardware
    is cheap, salaries are expensive.  If there is a 'natural' sledgehammer
    approach which will ensure that the job is finished quickly, and
    correctly (this follows from it being a 'natural' approach wrt to the
    human minds that designed it) then it beats the more thought-intensive
    solutions.

Shute has pointed out a tradeoff that I think is too often overlooked.
Having needlessly big address spaces makes it possible to get answers to
problems that would be too expensive in human time to solve otherwise.

For example, lately I have been doing manipulations of address traces.
The (brute force) algorithms I use to get the results need space
potentially much larger than the size of the traces.  That's OK for
small data sets, but I would be happy to look at gigabytes of addresses.

With my dumb algorithms, I can set up a data run in a few hours, get it
started, and then go play nethack until it finishes.  On the other hand,
if I had to worry about running out of address space, the setup time
goes from hours to days.  I'd have to bend over backwards writing
intermediate results to the filesystem or coming up with tricky
space-efficient algorithms.  It's just not worth it.

Also, as Shute implies, the small address space aproach constraints have
greater potential for incorrect results.  Lacking the will and the
verification techniques, how do you find subtle bugs in gigabytes of
output?

I wouldn't even mind if it took a week or two to process a really big
trace.  It would be a waste of machine time, not my time.

(Actually, even if I had a processor with a 64-bit address space, I
certainly couldn't afford all the secondary storage necessary to use the
whole space.  Assuming that backing store costs, say, as little as $1.00
per megabyte, I'd still have to buy $17,592,186,044,416.00 worth of
disks to use every bit in the address space.  Even a measly 40-bit space
would eat up $1,048,576.00 in disk space at those prices.)

--David Gadbois
various affiliations
Disclaimer: I'm not a real computer architect, but I play one in grad school

jgk@osc.COM (Joe Keane) (02/23/90)

In article <2054@osc.COM> i write:

>I agree this is hard, but it's an interesting optimization and can only
>improve your performance.  Of course it's completely impossible if your
>constants are embedded in the instruction stream.

I'm not sure what the antecedent of `this' is in my post, but what i meant to
be referring to is replacing multiple instances of the same constant with
multiple references to one instance.  This is what i claim can only help.

The first step, though, is to take the constants out of line.  I'm don't think
this usually helps; the best i hope for is that it doesn't hurt.

In article <6190@bd.sei.cmu.edu> firth@sei.cmu.edu (Robert Firth) writes:
>Sorry, I don't see that.  Since the average constant is smaller than
>the average address, taking constants out of line and pooling them
>seems to me a guaranteed pessimisation

The relevant size to compare is that of the offset from the base register, not
the effective address.  In particular, on many RISC architectures, you get an
8-bit or so offset free with your load instruction.

>(a) you don't save bits in the instruction, and may need more

True enough, if you don't count the immediate data as part of its instruction.

>(b) the extra indirection is one more memory reference, which
>    is pure overhead

This is not true.  You either fetch the constant from the constant pool or the
instruction stream.  If the instruction sizes are the same, the number of
fetches is the same in either case.

>(c) you have reduced locality by adding a gratuitous reference to
>    another part of the address space

I've replaced some number of fetches from the instruction stream with the same
number of fetches from the constant pool.  Whether this helps or hurts depends
on a bunch of things about your caches.  In the best case it can make a loop
fit completely in the I-cache which didn't before.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (02/23/90)

In article <11679@nigel.udel.EDU> nelson@ee.udel.edu (Mark Nelson) writes:

| It's a good thing you don't believe what you say, because this is false.
| Crays have either 24 or 32 bit addressing:

  I think the 32 bit versions can only use 29 bits, but our Cray guru is
in Boston today, so I can't verify that.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
            "Stupidity, like virtue, is its own reward" -me

pcg@aber-cs.UUCP (Piercarlo Grandi) (02/23/90)

In article <43367@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
  In article <6998@celit.fps.com> ps@fps.com (Patricia Shanahan) writes:
  >In article <29718@brunix.UUCP> phg@cs.brown.edu (Peter H. Golde) writes:
  >>One might note that implementing a 64-bit address space
  >>with 64 KB pages and 8 bytes/page in the page table requires
  >>2 billion megabytes of page-table space.  Hmmmmm.....
  >If the real memory is small compared to the address space, don't use page
  >tables. If you must use page tables, design them as sparse arrays.

  1) Several posters have mentioned that there is some unspecified but obvious
  (to them) major problem with using inverted page tables together with
  memory mapped files.  I wonder if someone could enlighten us regarding
  this problem, since apparently it isn't obvious to every system architect :-)

There actually two problems:

A) to do address translation you want to have a direct map. This means that
to fill the TLB you have to scan the reverse map until you find the wanted
virtual addr. This scan can be organized in many ways, of course.

B) You have difficulty supporting shared pages. A physical page cannot
contain an arbitrarily long list of virtual addresses mapped to it.  There
are two workarounds, to forbid shared memory entirely (the best solution, as
I have already argued, also from a logical point of view), and sharing
*segments* using indirect segment capabilities (which slows down things a
bit on every reference). There is a third workaround, which is to change the
virtual address associated to every shared page, on every context switch.
This is practical only for relatively small memories, or with hardware
assist, or anyhow is not desirable because it raises the cost of context
switching.

The p.r. problem with reverse map MMUs is that the ROMP reverse MMU caused
more than a few pains to the CMU people porting MACH, which is heavvily
shared memory oriented.

  2) It isn't apparent to me that page size is really a function of physical
  memory size so much as it is of processing speed. [ .... ]

Page size ought to be *only* determined by what makes the working set
smallest, and this points to very small page sizes, indeed it points to
object memory, like the Burroughs. In practice you also have that fine
granularity is swamped by high overheads, mainly the IO ones, so you have to
make do with coarser granularity than you'd like.

The solution is not to make pages larger to amortize overheads over a larger
granule; this is the Berkeley/SUN style hack. Nice people try to reduce
overheads (CCDs and bubbles were hailed as what could make object virtual
memory practical) or try to use dynamic, adaptive groupingof small objects.

A large page size implies coarse, static grouping of objects, and
approximates the correct policy, and poorly, only in the case of sequential
access, for which it also approximates prefetching. Otherwise it is
a loss, as observed in Ritchie (or Thompson's) observation in "The UNIX
IO system" (or "The UNIX system") paper as to the issue of whether to
double the block size to 1024 bytes.

The eight kilobyte page size of SUNs (one of the two nasty consequences of
their regrettable MMU design) is probably one of the two major factors
in the extremely poor memory-wise performance of SunOS 4.
-- 
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

peralta@pinocchio.Encore.COM (Rick Peralta) (02/24/90)

In article <193@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes:
>... how many MIPS does it take before you can process a 4G space...
> I think we're a still far from that point.

I suspect that these things are closer than you think.
Current designs are using multiple external busses to
hook together many CPU clusters.  There are plenty of
100Mip single bus machines and 1000Mips is just around
the corner.  So the horse power argument is rather weak.

 - Rick

richard@aiai.ed.ac.uk (Richard Tobin) (02/24/90)

In article <36080@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>	a) What applications have you seen that wanted >32 bits of space?

Consider a language (prolog for example) that is implemented with
multiple stacks (maybe 2-4).  It's nice (a) not to have to check for
stack overflow and (b) to be able to expand the stacks without
relocating and fixing up all the pointers.  If you can spread your
stacks around virtual memory and unmap pages at the ends of the
stacks, you can do this.  In general, 4Gbytes of virtual address space
is plenty.

Now consider the same language with multiple threads, each requiring
(say) 2 stacks.  It's no longer clear that 4Gbytes is enough (though
it often will be - threads are often used for fairly simple tasks, 
with perhaps one really big one doing the hard work).

>	b) DId they need it, or justtake it because they had it?

Well, I haven't seen such a system on a machine with more than 32 bits
of address space, and people manage without.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

robk@altos86.Altos.COM (Rob Kleinschmidt) (02/24/90)

A couple of questions re additional memory needed to run in a 64 bit
address space:

1)	Would anyone care to speculate on page sizes and layouts
	for mmu descriptors ? Seems like the dimensions of these
	items get very big quickly. Does the problem get better if
	we avoid building knowledge of mmu tables/levels into hardware ?

2)	How much additional memory would be needed to run a garden
	variety program that is happy in a 32 bit environment ?
	My bet would be that data space doubles.

3)	Other space wasted by allocation due to larger page sizes ?

4)	Possible increases in bus traffic due to normal stack accesses
	etc. ? Possible increase in context switch overhead ?

5)	Applicable data from early 16 to 32 bit ports ?



Rob Kleinschmidt

feustel@well.sf.ca.us (David Alan Feustel) (02/24/90)

This may be nit picking, but the 386 has only a 46 bit address space.
2 bits in the segment register are used for other functions than
addressing.

mash@mips.COM (John Mashey) (02/25/90)

In article <193@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes:
>In article <168@csinc.UUCP> rpeglar@csinc.UUCP (Rob Peglar) writes:
>>In summary, there are lots of codes that could be "scaled" (e.g. finer
>>grids) to consume just about any D-space feasible today.  If the
>>vendors want a piece of that action, >32 bits of D-space is necessary.
>>Not to mention issues like memory bandwidth, I/O rates, etc.  1/2 :-)

>Arn't these typically compute bound problems also?  Doesn't this mean
>supercomputers?  Isn't this discussion about high volume microprocessors?
>
>I don't remember who it was that presented the argument based on the
>length of time taken to zero, let alone process 4G, but this looks like
>a valid line of reasoning.  Just how many MIPS does it take before you
>can process a 4G space in less than a week.  I think we're a still far
>from that point.

Micros (one can argue about what you call high-volume :-) have been
ther for a while: certainly any of the current crop of RISCs have
no problem doing something with 4GB of memory.
For example, consider an M/2000 with a 25MHz R3000 (4Q89).
It can zero memory at 50MB/sec.  4GB / (50 MB/sec) = 80 seconds.

Let's try another one.  If accessing data in reasonable size hunks,
you can certainly get 1MB/sec read rates off vanilla SCSI disks.
(Of course you can do better; I'm picking something really trivial,
without even disk-strping, SMD, or IPI.)
Hence it takes 4000 seconds, or a little over an hour to get
this amount of data from a disk, with vanilla technology.  4GB of
disk is certainly something you can put into a PC/AT-size box
already, or at least very soon, with off-the-shelf stuff.

Finally, if the issue is not having enough physical memory to back up the
real memory, there are already microprocessor systems with 128MB
of memory shipping for a while with 1Mb DRAMS, and they can certainly go
to 512MB with 4Mbits, and there are others around with a factor of 2X
bigger on the way. 

Finally, at least part of the issue here is the ability to use
algorithms that burn up address space, even if they don't necesasrily
touch every bit of data, every time.

In any case, let me make a really conservative bet:
No later than 1991, somebody will purchase:
	a) A microprocessor-based system.
	b) Delivered with 1GB of memory.
	c) And they'll really want more physical memory
	d) And they'll dislike an addressing limit of 2GB-4G.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/26/90)

In article <1662@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
>In article <43367@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
>  In article <6998@celit.fps.com> ps@fps.com (Patricia Shanahan) writes:
>  >In article <29718@brunix.UUCP> phg@cs.brown.edu (Peter H. Golde) writes:
>  >tables. If you must use page tables, design them as sparse arrays.
>  1) Several posters have mentioned that there is some unspecified but obvious
>  (to them) major problem with using inverted page tables together with

>to fill the TLB you have to scan the reverse map until you find the wanted
>virtual addr.

With hashing the problem - large amounts of
physical memory, also provides the solution.  In practice, this overhead appears
to be small.  

>B) You have difficulty supporting shared pages. A physical page cannot
>contain an arbitrarily long list of virtual addresses mapped to it.

Well, I suppose not, but is this a problem?  In particular, most uses of shared
memory will limit the number of virtual addresses mapped to it to the number
of processes :-)  since it would normally be unusual for a process to map an
address more than once.  Realistically, the question is: how does an IPT scheme
compare to the alternatives for the top half of the kernal and shared libraries?
Other uses will generally not involve every process in the system.

*If* you can demonstrate that you can support:
A) A large number of processes sharing a limited amount of memory (kernel and
shared libraries), and
B) A small number of processes sharing a large memory (the other usual 
applications) then the scheme is adequate for the present.

How much overhead is incurred when accessing shared libraries, for example?
And how does the overhead compare with *not* using shared libraries?

>  There
>are two workarounds, to forbid shared memory entirely (the best solution, as
>I have already argued, also from a logical point of view),

The best solution only if you can demonstrate that *not* providing shared
memory provides better performance on the set of problems which the capability
is intended to address.  It isn't obvious that the overheads are necessarily
lower to read through a file sequentially unnecessarily, than to spend a
certain percentage of time reloading a TLB with somewhat increased overhead.
Sequential I/O is *very general* but some algorithms may require other
capabilities for performance reasons...

> and sharing
>*segments* using indirect segment capabilities (which slows down things a
>bit on every reference).

Page table support of shared memory sections (I hate to use the word *segment*
since so many people think *Intel*...) makes perfect sense.  No argument there.

>Page size ought to be *only* determined by what makes the working set
>smallest, and this points to very small page sizes, indeed it points to
>object memory, like the Burroughs.

I was intending to address an entirely different question.  Why did the
the Cyber 205 have two page sizes?  Because it had a small TLB combined with
a fairly large TLB load time.  At vector processing speeds, the TLB reloads
on small pages consumed too much time.  So, the overhead was reduced with
large pages.  The pages need to be big enough that the overhead of doing
a TLB reload is amortized over enough memory accesses.  One way to do this
is to use dual page sizes.  Another way is to make TLB reload faster- then
you can keep one size of pages.  

Now, what size should those pages be? The goals are that:
A) You can support large numerical simulations and databases, and
B) Large numbers of small Unix processes accessing small files, and
C) Object oriented environments where every object may have its own attributes,
etc.  Unfortunately, a trade off is required...

It is interesting that you mentioned Burroughs.  I wonder how viable such
a scheme is when memory sizes get to the Gigabyte range?  It seems to me
that management of memory fragmentation would start to incur a tremendous
overhead.  The best method known so far is to use fixed sized pages, and
require all objects to be a multiple of them.  Dual sized pages are also
very little work.  It is probably worth exploring how far you could go with
multiple page sizes, all powers of two.  Arbitrary object lengths, on the
other hand, look like an unsolved problem to me.

BTW, why is "making the working set smallest" a primary goal?  Working set
is a performance question, and the overall best performance is the goal.

>A large page size implies coarse, static grouping of objects, and
>approximates the correct policy, and poorly, only in the case of sequential
>access, 

It also approximates correct policy on array dominated simulations where you
can, on some systems, require touching 24 bytes or more of data for every 
CPU cycle.

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)694-6117

lkaplan@bbn.com (Larry Kaplan) (02/27/90)

In article <36439@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>
>In any case, let me make a really conservative bet:
>No later than 1991, somebody will purchase:
>	a) A microprocessor-based system.
>	b) Delivered with 1GB of memory.
>	c) And they'll really want more physical memory
>	d) And they'll dislike an addressing limit of 2GB-4G.
>-- 
>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
>UUCP: 	{ames,decwrl,prls,pyramid}!mips!mash  OR  mash@mips.com
>DDD:  	408-991-0253 or 408-720-1700, x253
>USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

Too late, such a system has already been SOLD.  BBN ACI just announced the
sale of a 126-node microprocessor-based system (TC2000) to Lawrence Livermore
National Labs.  The first part of the order calls for a 63-node machine
to be delivered in 3/90.  This system has 1008 Megabytes of physical memory
(16 Megabytes short of 1 Gigabyte).  The 126-node machine 
(having 2016 Megabytes) is due sometime near the end of 1990.
See the article in the 2/21/90 issue of the San Fransisco Chronicle for
some more details of the sale.

#include "standard_disclaimer"
_______________________________________________________________________________
				 ____ \ / ____
Laurence S. Kaplan		|    \ 0 /    |		BBN Advanced Computers
lkaplan@bbn.com			 \____|||____/		10 Fawcett St.
(617) 873-2431			  /__/ | \__\		Cambridge, MA  02238

ps@fps.com (Patricia Shanahan) (02/27/90)

In article <1662@aber-cs.UUCP> pcg@cs.aber.ac.uk (Piercarlo Grandi) writes:
...
>Page size ought to be *only* determined by what makes the working set
>smallest, and this points to very small page sizes, indeed it points to
>object memory, like the Burroughs. In practice you also have that fine
>granularity is swamped by high overheads, mainly the IO ones, so you have to
>make do with coarser granularity than you'd like.
>
...
>-- 
>Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
>Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
>Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

I disagree. Page size, as part of the overall memory design, should be
determined  mainly by what gives the lowest average memory access time 
for target workloads, for a given cost. Consideration should also be given
to minimizing the probability of any given workload having a very long
average memory access time.

Average memory access time includes the time to do a successful virtual to
real translation for anything that is not in a virtual-addressed cache. The
more complicated and finer granularity the virtual to real mapping, the harder
it is to make it really fast.

For some workloads, minimizing the working set may be the way to optimize
the memory access time. It does not seem a good sole objective for something
as important to memory hierarchy design as the page size.
--
	Patricia Shanahan
	ps@fps.com
        uucp : {decvax!ucbvax || ihnp4 || philabs}!ucsd!celerity!ps
	phone: (619) 271-9940

ldm@texhrc.UUCP (Lyle Meier) (02/27/90)

We have a class of problems that need greater than 4 gb of address space.
One such problem is a 1000x1000x1000 wave propagation problem. We would not
like to have to write our own paging and swapping code do to this. While
currently this is a requirement on mini-supers etc, soon with workstations
getting close to that power level, we would like to be able to run the
problem on the workstation. A segmented architecture is ok if it doesn't
restrict the application program in the size of arrays etc. I.e. the
application program should think the address space is flat.

dmocsny@uceng.UC.EDU (daniel mocsny) (02/27/90)

In article <52651@bbn.COM> lkaplan@BBN.COM (Larry Kaplan) writes:
>In article <36439@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>>No later than 1991, somebody will purchase:
>>	a) A microprocessor-based system.
>>	b) Delivered with 1GB of memory.
>>	c) And they'll really want more physical memory
>>	d) And they'll dislike an addressing limit of 2GB-4G.
>
>Too late, such a system has already been SOLD.  BBN ACI just announced ...

Now that this barrier is history, how long will we wait for the following?

1. The value of computers sold with 1 GB of physical memory to
exceed 1% of the total computer market value.

2. The above fraction to exceed 10%.

Every limit constrains *someone*, but until it constrains enough people,
industry does not respond.

I am sure that someday 2-4GB addressing limits will be constraining,
but this would seem to require major advances in software technology.
1 MB sounded like a lot back in 1980, but even then a single
programmer could readily write, from scratch, an application that
would use more memory than that. So you had a built-in barrier that
was already too low for the types of jobs people could do at the
time.

I don't know many people who can single-handedly write code that chews
up >1 GB. One way to do that is to have repetitive data structures
(e.g., arrays), but many applications are too irregular to make this
easy. The only way for most people to use up 1 GB is to have vastly
more complex software. How will we create such software? (Using CASE
tools? Organizing bigger collective code-writing efforts?
Standardizing the hell out of everything, so everybody's work can go
into everybody else's library?) Unfortunately, the complexity of
reliable software seems to be growing at best linearly with time. A
single human can only manage some roughly fixed amount of complexity,
and this fact may emerge as the limiting factor in exploiting large
address spaces.

Dan Mocsny
dmocsny@uceng.uc.edu

sbf10@uts.amdahl.com (Samuel Fuller) (02/27/90)

In article <3786@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
>In article <52651@bbn.COM> lkaplan@BBN.COM (Larry Kaplan) writes:
>>In article <36439@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>>>No later than 1991, somebody will purchase:
>>>	a) A microprocessor-based system.
>>>	b) Delivered with 1GB of memory.
>>>	c) And they'll really want more physical memory
>>>	d) And they'll dislike an addressing limit of 2GB-4G.
>>
>>Too late, such a system has already been SOLD.  BBN ACI just announced ...
>
>Now that this barrier is history, how long will we wait for the following?
>
>1. The value of computers sold with 1 GB of physical memory to
>exceed 1% of the total computer market value.

Most likely, this limit has already been broken.  I good percentage of
the mainframes sold today are shipped with more than 1GB of memory
installed. The maximum memory configuration sold on Amdahl mainframes
is 512M main and 2G expanded.  I believe that most other mainframe
suppliers are in the same ballpark.

-- 
---------------------------------------------------------------------------
Sam Fuller / Amdahl System Performance Architecture

I speak for myself, from the brown hills of San Jose.

UUCP: {ames,decwrl,uunet}!amdahl!sbf10 | USPS: 1250 E. Arques Ave (M/S 139)
INTERNET: sbf10@amdahl.com             |       P.O. Box 3470
PHONE: (408) 746-8927                  |       Sunnyvale, CA 94088-3470
---------------------------------------------------------------------------

wayne@dsndata.uucp (Wayne Schlitt) (02/27/90)

In article <4cES02Kd8cLZ01@amdahl.uts.amdahl.com> sbf10@uts.amdahl.com (Samuel Fuller) writes:
> In article <3786@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
> >>>	[ ... ]
> >
> >Now that this barrier is history, how long will we wait for the following?
> >
> >1. The value of computers sold with 1 GB of physical memory to
> >exceed 1% of the total computer market value.
> 
> Most likely, this limit has already been broken.  I good percentage of
> the mainframes sold today are shipped with more than 1GB of memory
> installed. The maximum memory configuration sold on Amdahl mainframes
> is 512M main and 2G expanded.  I believe that most other mainframe
> suppliers are in the same ballpark.

umm, i am a little bit confused by this...  you are saying that the
most memory you can put in an Amdahl is 512M main, but that a "good
percentage" of mainframes are shipped with more than 1G of main
memory?

what is the difference between "main" memory and "expanded" memory?
if "expanded" memory is just a back store for paging and disk cache,
then i dont think that including "expanded" memory is what people
normally think of as "physical memory".  (you can also exclude ram
disks and the memory on high res video cards)

-wayne

peralta@pinocchio.Encore.COM (Rick Peralta) (02/28/90)

In article <3786@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
>Now that this barrier is history, how long will we wait for ...

This may not be the best place for this, but...

What about getting rid of conventional secondary storage?
Simply keep everything in virtual memory and have the OS
maintain whatever secondary store it sees fit?

This is more of an OS issue, but if the iron isn't there...


 - Rick

johnl@esegue.segue.boston.ma.us (John R. Levine) (02/28/90)

In article <3786@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
>Now that this barrier is history, how long will we wait for the following?
>
>1. The value of computers sold with 1 GB of physical memory to
>exceed 1% of the total computer market value.

Probably already has happened.  I suspect that IBM's highest end machines are
now routinely sold with > 1GB of RAM.   After all, a year ago they introduced
a segmented architecture called ESA/370 to circumvent the 2GB per process
address space limit of the XA architecture they introduced about 5 years
before that.  (Before that it was 16MB virtual, 64MB physical.)

At $10M apiece, these machines constiture a large fraction of the computer
market by value, if not by numbers.

>I am sure that someday 2-4GB addressing limits will be constraining,
>but this would seem to require major advances in software technology.

Not so, you can be sure that they wouldn't have done ESA if the customers
hadn't been screaming for it.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
"Now, we are all jelly doughnuts."

meissner@osf.org (Michael Meissner) (02/28/90)

In article <11254@encore.Encore.COM> peralta@pinocchio.Encore.COM
(Rick Peralta) writes:

| In article <3786@uceng.UC.EDU> dmocsny@uceng.UC.EDU (daniel mocsny) writes:
| >Now that this barrier is history, how long will we wait for ...
| 
| This may not be the best place for this, but...
| 
| What about getting rid of conventional secondary storage?
| Simply keep everything in virtual memory and have the OS
| maintain whatever secondary store it sees fit?
| 
| This is more of an OS issue, but if the iron isn't there...

Hmm, you seem to have just proposed the System/3x (and AS/400 I think)
from IBM, which uses a single level store.  Of course the only
languages supported on it are business-type languages (RPG and Cobol).
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA

Catproof is an oxymoron, Childproof is nearly so

cet1@cl.cam.ac.uk (C.E. Thompson) (03/02/90)

In article <WAYNE.90Feb27094326@dsndata.uucp> wayne@dsndata.uucp (Wayne Schlitt) asks:
>what is the difference between "main" memory and "expanded" memory?
>if "expanded" memory is just a back store for paging and disk cache,
>then i dont think that including "expanded" memory is what people
>normally think of as "physical memory".  (you can also exclude ram
>disks and the memory on high res video cards)
>
On IBM 3090s (probably the mainframe that the previous poster had in mind)
"expanded storage" is RAM which is not directly addressable, but is moved
in 4Kbyte chunks to and from directly addressable storage by synchronous
instructions. It is used by MVS and VM as a very-high-speed paging device;
microcode assists can do the transfers without a software page fault in
some circumstances.

If someone would like to explain what the economic advantages are to 
providing extra storage in this form, I would be interested to listen.


Chris Thompson
JANET:    cet1@uk.ac.cam.phx
Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk

pcg@odin.cs.aber.ac.uk (Piercarlo Grandi) (03/02/90)

In article <43688@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh
LaMaster) writes:

   Arbitrary object lengths, on the other hand, look like an unsolved
   problem to me.

Tell that to Burroughs... :-). There have been studies (Smalltalk: bits
of history, words of advice) in object memory *with* pages; you have
object memory in core (which is good), and then secondary storage
transactions are done in pages, and you try to stuff a page with many
small objects, hopefully related. A compacting garbage collector helps
here, of course.

   BTW, why is "making the working set smallest" a primary goal?  Working set
   is a performance question, and the overall best performance is the goal.

For general purpose applications/architectures, minimizing the working
set gives the overall best performance.

   >A large page size implies coarse, static grouping of objects, and
   >approximates the correct policy, and poorly, only in the case of sequential
   >access, 

   It also approximates correct policy on array dominated simulations where you
   can, on some systems, require touching 24 bytes or more of data for every 
   CPU cycle.

As usual: if we are talking about special purpose architectures, then
things may dramtically change. After all Cray is fond of base+limit, and
for good reason, for his vector machines. We were discussing 64 bit
machines; not all machines with 64 bit addressing will be used for
vector processing. Somebody said that they may well be used for
hypermedia, or databases, or many other things that resemble the classic
timesharing computer utility concept.

What I really don't like is SUN having large (8KByte) pages on machines
that are rarely used for sequential access, and their only excuse being
having misdesigned the MMU architecture. :-(.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

grunwald@foobar.colorado.edu (Dirk Grunwald) (03/02/90)

Also, according to articles in ICCD '89, and conversation with people
who used RIOS/America at that time, the America processor (what the
hell is the right name: America, RIOS, POWERthing, or PC/6000?)  has
support for memory mapped file stores and transaction processing.

Dunno if AIX 3.0 uses it though.

slackey@bbn.com (Stan Lackey) (03/02/90)

In article <20534@netnews.upenn.edu> iyengar@grad2.cis.upenn.edu.UUCP (Anand Iyengar) writes:
>In article <29718@brunix.UUCP> phg@cs.brown.edu (Peter H. Golde) writes:
>>One might note that implementing a 64-bit address space
>>with 64 KB pages and 8 bytes/page in the page table requires
>>2 billion megabytes of page-table space.  Hmmmmm.....

There is a cost of virtual memory.  It takes memory space for page
tables, code space for the software for paging, performance when you
miss, and both hardware and software complexity to make them work.  It
is the most common single cause of ECOs, both hardware and software.
Why do we pay all those costs?  For the functionality.  Imagine a new
architecture that runs physical only.  (for rec.humor.funny, I guess.)

Note that to map a 32-bit virtual address space [fully] with 4K byte
pages requires four megabytes for just the lowest level.  It doesn't
sound like much today, but at the time the VAX came out, one megabyte
of physical address was a LOT.

We are not asking for 64 bit addressing to build full 64-bit address
spaces tomorrow, neither virtual nor physical.  The early 32-bit
machines typically had far fewer than 32 bits of physical address.  We
are asking to not do what happened the last time - kludge more bits
into a 16-bit architecture with nonsense like segments, making us go
to all the trouble to make it work, then eventually increasing real
addressing later, making us do the work over again, the way it should
have been done in the first place.

We would like a clean path for the future.  It's OK to give us 64 bits
of virtual address but only say 40 or 48 bits of physical address for
say 1992 or 1993.  Let technical needs drive up the number of physical
address bits over time.

I have stayed out of this one up to now.  But one thing I can see
(besides the desire to build large systems of processors that
communicate though the use of a unified memory addressing scheme,
BBN-style) is the ability to map the entire file system into a user's
virtual address space.  Naturally, I am assuming the architecture is
fully functional, with the ability to share page tables, and page page
tables.
-Stan

gerry@zds-ux.UUCP (Gerry Gleason) (03/03/90)

In article <20534@netnews.upenn.edu> iyengar@grad2.cis.upenn.edu.UUCP (Anand Iyengar) writes:
>	What's the general feeling on future page sizes (I'm guessing that
>64K was picked here to make the page-tables smaller)?  Is it reasonable
>to assume that they are going to increase also (as bandwidth/latency
>increases), or will we lose more by making pages larger?  

A past coworker of mine was proposing that bigger pages would be a good idea
several years ago, and I tend to agree.  The only thing you loose with
bigger pages is memory utilization because of breakage (probably > page size/2
since segment size is not uniformly distributed), and you gain on several
fronts:  fewer page faults (loading 64k in one page fault takes much less
than faulting 16 times for 4k pages), fewer TLB misses, and generally less
paging overhead since all the data structures are smaller (page tables, etc.).
Since memory is cheap relative to anything you can do to improve CPU
throughput, bigger pages should be a win.

Of course a 4M machine, you can only have 64 64k pages, which isn't a very
big working set.  But then, 4M is rapidly becoming a small machine, so this
probably isn't a big problem.  If you MMU design can handle short pages (i.e.
ones smaller than the page size, with the tail unallocated), then you get the
best of both worlds, except for a little more complexity since pages aren't
all one size anymore.

Gerry Gleason

rpeglar@csinc.UUCP (Rob Peglar) (03/03/90)

In article <20534@netnews.upenn.edu>, iyengar@grad2.cis.upenn.edu (Anand Iyengar) writes:
(some deletions)

> 
> 	What's the general feeling on future page sizes (I'm guessing that
> 64K was picked here to make the page-tables smaller)?  Is it reasonable
> to assume that they are going to increase also (as bandwidth/latency
> increases), or will we lose more by making pages larger?  

Well, if the current "trend" of incorporating mainframe/supercomputer-ish
architecture concepts (e.g. superscalar) continues, history has given us
page sizes as large as 512KB (Cyber 205) and even 2MB (ETA-10, although
the OS'es never actually supported it).  

The VAT mechanism in these machines was slightly different, in that the
205 had a global page table and the 10 used local (one per task) page
tables.  The seemingly large page sizes solved both the page table
size issue and the long vector ops without faulting issue.

Rob
-- 
Rob Peglar	Control Systems, Inc.	2675 Patton Rd., St. Paul MN 55113
...uunet!csinc!rpeglar		612-631-7800

The posting above does not necessarily represent the policies of my employer.

bzs@world.std.com (Barry Shein) (03/03/90)

>On IBM 3090s (probably the mainframe that the previous poster had in mind)
>"expanded storage" is RAM which is not directly addressable, but is moved
>in 4Kbyte chunks to and from directly addressable storage by synchronous
>instructions. It is used by MVS and VM as a very-high-speed paging device;
>microcode assists can do the transfers without a software page fault in
>some circumstances.
>
>If someone would like to explain what the economic advantages are to 
>providing extra storage in this form, I would be interested to listen.
>
>
>Chris Thompson

In the first place, the 370 is basically limited to a 24-bit (16MB)
address space. Several instructions used the high eight bits of
addresses to store other things and, probably more importantly, so did
applications (such as type fields for pointers.)

They also use a base+displacement addressing scheme which usually (one
can in theory code around it) requires one, and sometimes more,
registers to be dedicated to base addressing. The displacement is in
the instruction and is limited to 12-bits (4K.)

A few years ago IBM came up with XA which added some new instructions
(e.g. BASR) replacing the ones which used the hi eight bits in
addresses and declared the machine now safe, from their side, for
using the full 32 bits of address. Of course, that didn't change the
applications, so you had to have well behaved applications or be
willing to rewrite them (usually not a horrendous rework, but
definitely a chore.)

At that point of course they're out of juice, and about all one can do
is what was described, use any more memory as a fast place to store
things while they're not in use rather than going to disk. I wonder
why they didn't re-invent the separate I&D space of the PDP-11?
Perhaps DEC holds a patent on that scheme, it would at least double
the address space (although it's likely that it's almost all needed in
the data space, I doubt many people have gigabytes of code that needs
to be resident.)

The major economic advantage is not having to move to a non-370
architecture.

The 370 goes back to the 360 which goes back to the early 1960's.

Needless to say it's not an environment that was driven by
portability, a lot of applications are written in assembler, not to
mention that even higher level language applications can be full of
tons of System/370 specific stuff (JCL and all that.) And this is a
world with megalines of code, you don't just sit down and rewrite it
all if you can avoid it.

The 3090 still provides fairly awesome I/O for the price, this is the
class of machines used on those terabyte databases people like
Mastercard or J.C. Penney's have to manage.

It's not a cost-effective compute server if you were starting out
today with fresh code (of course, even in the compute server world
there are people who have tons of code locked into the 370
architecture.) It probably still is a cost-effective database
transaction machine for very large databases and other similar
data-intensive tasks. People have told me that some Crays are
competitive, but few other machines (note that Amdahls are 370 clones,
so they're in the same ballpark.) It hardly has any place in an
academic computing environment even though some diehards maintain them
against all reason.

So, there's reasons to keep kicking that dead whale down the beach.
-- 
        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

sbf10@uts.amdahl.com (Samuel Fuller) (03/03/90)

In article <1786@gannet.cl.cam.ac.uk> cet1@cl.cam.ac.uk (C.E. Thompson) writes:
>On IBM 3090s (probably the mainframe that the previous poster had in mind)
>"expanded storage" is RAM which is not directly addressable, but is moved
>in 4Kbyte chunks to and from directly addressable storage by synchronous
>instructions. It is used by MVS and VM as a very-high-speed paging device;
>microcode assists can do the transfers without a software page fault in
>some circumstances.
>
>If someone would like to explain what the economic advantages are to 
>providing extra storage in this form, I would be interested to listen.

Expanded storage is used because it is much faster than disk for paging
operations, and cheaper than mainstore for normal data storage.  On most
mainframes the mainstore is built out of ECC protected static rams.
Expanded storage, is still ECC protected (I think) but is built out of
slower, cheaper dynamic ram.

Current mainframes have not yet found it necessary to implement an entire
2Gigabyte physical mainstore.  Building such an array out of high speed
static rams is still too expensive even for mainframes.  The lower cost
workaround is to provide expanded storage to be used as swap or paging
or data space (at the OS's discretion).  This has proven to be faster
than using disk and cheaper than building all of the memory out of sram.

-- 
---------------------------------------------------------------------------
Sam Fuller / Amdahl System Performance Architecture

I speak for myself, from the brown hills of San Jose.

UUCP: {ames,decwrl,uunet}!amdahl!sbf10 | USPS: 1250 E. Arques Ave (M/S 139)
INTERNET: sbf10@amdahl.com             |       P.O. Box 3470
PHONE: (408) 746-8927                  |       Sunnyvale, CA 94088-3470
---------------------------------------------------------------------------

littauer@uts.amdahl.com (Tom Littauer) (03/03/90)

In article <1990Mar2.232735.6071@world.std.com> bzs@world.std.com (Barry Shein) writes:
>
> ... a good summary of 370 architecture history, but there're a few nits...
>
>At that point of course they're out of juice, and about all one can do
>is what was described, use any more memory as a fast place to store
>things while they're not in use rather than going to disk. I wonder
>why they didn't re-invent the separate I&D space of the PDP-11?

Well, they did. In the *NEW* architecture, ESA, they've added an addressing
mode which allows one instruction to work in two address spaces at the
same time, while living in a third. I forget the absolute limit on the
number of address spaces that can be so used, but it's over 4K. The buzzname
for this is Hiperspaces (probably TM IBM).

>The major economic advantage is not having to move to a non-370
>architecture.

... and a non-trivial one, as you point out.

>               It probably still is a cost-effective database
>transaction machine for very large databases and other similar
>data-intensive tasks.

It is indeed, especially if you use UNIX (definitely TM AT&T) as the
operating system. Double especially if you can run the older proprietary
stuff at the same time on the same machine.

>So, there's reasons to keep kicking that dead whale down the beach.
>-- 
>        -Barry Shein

Thank you for your support :-)
Tom Littauer
-- 
UUCP:  littauer@amdahl.amdahl.com
  or:  {sun,decwrl,hplabs,pyramid,ames,uunet}!amdahl!littauer
DDD:   (408) 737-5056
USPS:  Amdahl Corp.  M/S 278,  1250 E. Arques Av,  Sunnyvale, CA 94086

I'll tell you when I'm giving you the party line. The rest of the time
it's my very own ravings (accept no substitutes).

pcg@teachc.cs.aber.ac.uk (Piercarlo Grandi) (03/03/90)

In article <MEISSNER.90Feb28094823@curley.osf.org> meissner@osf.org (Michael Meissner) writes:

   Path: aber-cs!gdt!dcl-cs!ukc!mcsun!uunet!samsung!think!paperboy!meissner
   From: meissner@osf.org (Michael Meissner)
   Newsgroups: comp.arch
   Date: 28 Feb 90 14:48:23 GMT
   References: <9708@spool.cs.wisc.edu> <20270@cfctech.cfc.com>
	   <36080@mips.mips.COM> <168@csinc.UUCP> <193@zds-ux.UUCP>
	   <36439@mips.mips.COM> <52651@bbn.COM> <3786@uceng.UC.EDU>
	   <11254@encore.Encore.COM>
   Sender: news@OSF.ORG
   Distribution: all
   Organization: Open Software Foundation
   Lines: 22

   In article <11254@encore.Encore.COM> peralta@pinocchio.Encore.COM
   (Rick Peralta) writes:
   | What about getting rid of conventional secondary storage?
   | Simply keep everything in virtual memory and have the OS
   | maintain whatever secondary store it sees fit?

   Hmm, you seem to have just proposed the System/3x (and AS/400 I think)
   from IBM, which uses a single level store.  Of course the only
   languages supported on it are business-type languages (RPG and Cobol).

The old single level storage dream! And capabilities as well...

I think that the best starting point for an investigation of this issue
is "Computer systems with a very large address space and garbage
collection", an MIT Dissertation and TechRep (#178) by Peter Bishop.

Legend has it that the S/38, from which the AS/400 evolved, was an
implementation of that thesis done by the people that had previously
done the S/3 accounting machine (what a joke! they made a few mistakes
:-), one of which particularly glaring and crucial in the design of the
reverse map MMU).

	Just to tell you how funny this is, the S/3 was a _card based_
	machine programmed in RPG, Bishop's architecture was a computer
	utility with a very lispish/capabilitish flavour. The S/38
	exhibits traits from *both* ancestors.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

jkrueger@dgis.dtic.dla.mil (Jon) (03/04/90)

bzs@world.std.com (Barry Shein) writes:

>The 3090 still provides fairly awesome I/O for the price...
>...It probably still is a cost-effective database
>transaction machine for very large databases

Barry,

It's folklore in the database world that things are as you say...but
nobody ever seems to have any numbers.  It's my opinion that you're
right, but does anyone have any facts?

It should also be noted that good DBMSs avoid i/o's at very reasonable
(and these days, affordable) costs in computation and memory.  The
basic tools are query optimizing, access methods, cacheing, intelligent
buffering, indices, readahead, data clustering, deferred writes.  The
next generation will add optimizing placement of data on disk (and disk
array, yum yum!).  At that point the bottleneck will be bus throughput,
which I suppose brings us back to dead whales.  Or will Futurebus solve
this?  Actually, I suppose that just brings me back to my first point:
how big is the gap?  Measurement of real operations on very large
databases using currently available DBMS on KM and big iron would be
very interesting indeed.  Might even help free the masses.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.

rwwetmore@grand.waterloo.edu (Ross Wetmore) (03/04/90)

In article <42998@ames.arc.nasa.gov> lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:
>In article <36080@mips.mips.COM> mash@mips.COM (John Mashey) writes:
> >costs, speed tradeofss, etc, already discussed].  Then I ask them what
> >they want it for.  [64 bit addressing. -H.L.] 
> > Usually, the answer is independent of physical memory size,
> >i.e., they want several things:
> >	a) Map big files ,and bunches of files.
> This is the primary reason as I see it.  Even in the past, I have seen
> requirements to map anywhere from 16-4000 files or separate memory
> areas [sections, segements [[not "Intel" segments]] or whatever].
> >	b) Use object management techniques that speed performance by
> >	assuming they've got monster address spaces available, evn if they
> >	aren't using very much of it at once
> This is what I see as a good reason for pushing from 48 bits to 64 bits.
> The possibility of assigning every object a memory section and therefore
> making it sharable.  

  The above seems to be a slightly incorrect partitionning of the answer.
The use of large address spaces might better be characterized by splitting
the problems into ...

a)  Large indivisible objects (eg big files)
    This is the case that did in the 80286, since 16 bits was insufficient
to handle such items even at the time the chip came out. There is no real
workaround for such cases by definition.

b)  Mapping small objects into a large address space 
    This is usually done to provide uniform addressing for shared global 
access. Or context switching efficiencies in providing data to processes 
with only a single linear address space. 

  Unfortunately, true class a) problems are a very small subset of highly 
specialized applications, though many more are cast in this mode by choice
of algorithm rather than necessity. Class b) problems are the typical case
and I would hypothesize particularly bad cases for justifying an extension
of the linear address size.

  It seems to me particularly wasteful to carry around a large number of
extra address bits on every memory reference and storage location when all 
that is often needed is a small offset value into any particular object.
The problem is inherently two-tiered, find an object and access a location
within that object, and a two dimensional addressing technique is a far
better match to these kind of operations.

  In addition, when the particular mapping of an object is changed (eg.
one removes a node from a multiprocessor and replaces it with one at a
'different' address), this means a re-link of any program(s) that have
imbedded static notions of any such addressing. Programs will be
compelled to adopt the dynamic approach of retrieving a base pointer at
run-time and writing the code as offsets from this pointer. That is
emulating the two tiered algorithm, but in an ad hoc and non-hardware
supported way. 

  One also gets into all sorts of regulatory problems in how the global 
address space (now a restricted resource) gets allocated. Alternatively, 
a 2-dimensional address space with a natural hardware supportable split
permits one to *reference* the segment internally as a logical or virtual
entity, and have the hardware automatically handle the addressing *dynamics*
including such things as protection, security and whatnot on an efficient
object by object basis. Such things can be carried out on a local level
without the need for global considerations about how someone 3000 miles
away might be allocating their global memory.

  Unfortunately, most programmers think of a process address space as a
vector, rather than an array or vector of vectors. Inate conservatism
coupled with an emotional reaction to alternative hardware or software
models tends to slow any mass movement to such new ways of thinking and
practices. Also, commercial practices do not readily lend themselves to
radical changes in direction. But perhaps it is time to rethink some of
these things and consider whether blind extension of the linear dimension
of addresses is really the appropriate choice for all current needs?

Ross W. Wetmore                 | rwwetmore@waterloo.BITNET
University of Waterloo          | rwwetmore@math.Uwaterloo.ca
Waterloo, Ontario N2L 3G1       | {uunet, ubc-vision, utcsri}
(519) 885-1211 ext 4719         |   !watmath!rwwetmore

pcg@odin.cs.aber.ac.uk (Piercarlo Grandi) (03/05/90)

In article <780@dgis.dtic.dla.mil> jkrueger@dgis.dtic.dla.mil (Jon) writes:

   It should also be noted that good DBMSs avoid i/o's at very reasonable
   (and these days, affordable) costs in computation and memory.  The
   basic tools are query optimizing, access methods, cacheing, intelligent
   buffering, indices, readahead, data clustering, deferred writes.

Let me disagree. Unfortunately almost all databases for which a
mainframe is worth having are so immense that their working set is
always much larger than real memory available. You still rate DBMS
perofmance in IO operations per transaction; main memory is used
essentially only as scratchpad. No clever strategy  or architecture
is going to change this.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

pcg@odin.cs.aber.ac.uk (Piercarlo Grandi) (03/05/90)

In article <204@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes:

   In article <20534@netnews.upenn.edu> iyengar@grad2.cis.upenn.edu.UUCP
   (Anand Iyengar) writes:

   >	What's the general feeling on future page sizes (I'm guessing that
   >64K was picked here to make the page-tables smaller)?  Is it reasonable
   >to assume that they are going to increase also (as bandwidth/latency
   >increases), or will we lose more by making pages larger?  

We will lose a lot. For *general purpose* machines the number of useful
bytes that you can pack on a page is not very large.

   The only thing you loose with bigger pages is memory utilization
   because of breakage (probably > page size/2 since segment size is
   not uniformly distributed),

The 8KB page size of most SUNs tends to waste, I suspect, about 40% of
available memory. On general purpose timesharing/workstation systems
most files are much smaller than 8KB. Greater fragmentation implies
poor locality, and the last thing we need now is even less locality.

   and you gain on several fronts: fewer page faults (loading 64k in
   one page fault takes much less than faulting 16 times for 4k pages),

The difference is less marked than you might expect, and might be
vanishingly small, if the IO subsystem is supportive (both HW & SW),
e.g.  hardware scatter/gather and software fetch-ahead and
discard-behind (as described in Knuth's Vol. 1)...

   fewer TLB misses,
   and generally less paging overhead since all the data structures are
   smaller (page tables, etc.).

This really matters only if you use direct map MMUs...

   Since memory is cheap relative to anything you can do to improve CPU
   throughput, bigger pages should be a win.

A resource being cheap is no reason to squander it. Unless you are a
manufacturer and you sell it to your customers with a huge markup...

   Of course a 4M machine, you can only have 64 64k pages, which isn't
   a very big working set.  But then, 4M is rapidly becoming a small
   machine, so this probably isn't a big problem.

Here you should reconsider your opinions. 64 64K pages is not a
problem, if locality of reference is such that the working set of an
application does not increase because of the larger page granularity.
This depends on how much related data you can manage to cluster
together on a page. If the working set of an application rises in direct
proportion to the page size, raising the page size is not going to give
you *anything*, except kudos from MITI.

If you can only manage to cluster say 2-4KB of data frequently accessed
together on a page, using 64KB pages will just waste 60KB of real memory
per page. As the Unix designers observed as to whether double the Unix
filesystem blocksize from 512B to 1KB, this will work only if the amount
of useful data per page will also (almost) double, and this is not by
any means guaranteed (unless you are doing strictly sequential access).

In other words, let me repeat that large page sizes are a poor, static
way of doing clustering, and one that is too rigid and inefficient.
Clustering should be dynamic. If you do static clustering only
applications that do sequential access will benefit. Even worse,
applications will be written with the expectation that only sequential
access is desirable, which is a sorry state of affairs.

   If you MMU design can handle short pages (i.e.  ones smaller than
   the page size, with the tail unallocated), then you get the best of
   both worlds, except for a little more complexity since pages aren't
   all one size anymore.

Why don't we go all towards Burroughs like variable length descriptor
machines then? It is actually a good idea, as long as you have ways
(dynamic clustering, object secondary store, whatever) to compensate
for the well known fact that the average segment size on a descriptor
machine is only a dozen words long.
--
Piercarlo "Peter" Grandi           | ARPA: pcg%cs.aber.ac.uk@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcvax!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk

bzs@world.std.com (Barry Shein) (03/05/90)

From: jkrueger@dgis.dtic.dla.mil (Jon)
>bzs@world.std.com (Barry Shein) writes:
>
>>The 3090 still provides fairly awesome I/O for the price...
>>...It probably still is a cost-effective database
>>transaction machine for very large databases
>
>Barry,
>
>It's folklore in the database world that things are as you say...but
>nobody ever seems to have any numbers.  It's my opinion that you're
>right, but does anyone have any facts?

I think the standard reference is still TP1 (Codd & Date?) and it
bears this ``folklore'' out. I don't have numbers handy, my memory may
be way off but I seem to remember seeing dozens of TPS for
supermicros, around 100+ for modern minis, and approaching 1000 for
machines like the 3090. I assume someone from a mainframe company can
set this straight, it's their bread and butter.

But that's not the only consideration, and many vendors (particularly
in the super-mini and parallel mini class) are getting closer to their
numbers.

My personal feeling is that the 3090-class machines have gotten so
huge that many people who think they need one don't, it's a religion
left over from the days when smaller systems were a joke, particularly
in regards to I/O. Today they may be a relative "joke", but have those
databases really grown 10X or more? The smaller systems have.

It's, of course, driven by a lot of rational considerations also,
like installed software base (tho it's not clear that some folks
wouldn't be better off spending a few million on converting their
databases and a million on a new super-mini rather than buying the
latest and greatest multi-million mainframe.)

Look, when things get BIG there's a whole different set of
considerations that kick in. There are 3090-class machines out there
with over 10,000 simultaneous interactive sessions. Most systems which
might be sufficient in other ways can't even plug in 1,000 interactive
sessions, let alone manage them. And service and other considerations
are a big issue, can that super-mini vendor afford to take a total
loss on your system (and maybe more), millions of dollars, to save
reputation, possibly taking a long view? Will they even take an RFP
with absolute performance requirements? Not vague mumbles of
benchmarks, but if their machine doesn't cut it, they take it back,
and probably lose hundreds of thousands on install/de-install?

For those who need it, they probably *do* need it, but we're talking
databases approaching a terabyte at this point in history, probably a
fortune-1000 company with millions of customers (Mastercard, JC
Penney's, American Express etc.)
-- 
        -Barry Shein

Software Tool & Die    | {xylogics,uunet}!world!bzs | bzs@world.std.com
Purveyors to the Trade | Voice: 617-739-0202        | Login: 617-739-WRLD

johnl@esegue.segue.boston.ma.us (John R. Levine) (03/05/90)

>bzs@world.std.com (Barry Shein) writes:
>>The 3090 still provides fairly awesome I/O for the price...
>>...It probably still is a cost-effective database
>>transaction machine for very large databases
>Barry,
>
>It's folklore in the database world that things are as you say...but
>nobody ever seems to have any numbers. ...

I saw an article by someone from IBM in, as I recall, TODS that showed that
a smaller number of faster processors would give you better database
throughput than a larger number of slower processors of equivalent aggregate
mips.  The crux of the argument is that when you have a lot of transactions,
there gets to be considerable contention and the faster processors have a
shorter hold time per transaction and fewer simultaneous transactions, hence
fewer collisions.  I'll see if I can dig it up.

Also, people who aren't familiar with the current state of mainframes may not
realize how different mainframe I/O is from workstations'.  A 370's channel
probably has more CPU horsepower than a Sparcstation (there's reputed to be
an 801 RISC mini hidden inside) not to mention far more RAM.  It's quite
common to have 128MB and up of RAM buffer in the channel.  There are multiple
data paths at each stage -- disk to controller, controller to channel, and
channel to processor memory, and the channel assigns paths on the fly,
keeping aware of such things as the relative angular position of each disk
platter so it can tell when desired data are coming to the head so that it
needs to assign a path.  Having the channel wait idle while the platter comes
around, like every workstation I know still does, went out of style in
mainframes in about 1972.

Workstation I/O is probably as far behind mainframe I/O as workstation CPU
performance was behind mainframe CPU in 1975.  Of course, at the current rate
of change, that means the killer micros will catch up in about 1993.

-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
"Now, we are all jelly doughnuts."

wcs) (03/05/90)

In article <193@zds-ux.UUCP> gerry@zds-ux.UUCP (Gerry Gleason) writes:
>... how many MIPS does it take before you can process a 4G space...
> I think we're a still far from that point.

5 or so years ago, Peter Honeyman and gang at Princeton were
building the "Massive Memory Machine Project" to find out what you could
do with lots of memory if you could afford it.  They had a toy machine,
which they didn't consider massive, which was a VAX with 128MB.
(Unfortunately, they had bought DEC memory instead of third-party,
so they were paying through the nose for maintenance, but....)
Today, 128MB costs about $12.8K, and 4GB costs about $400K - at that
cost, a lot of people can find fun things to do with 4GB.
For instance, why swap?  Why not start re-inventing Multics's mapped
files, only really map them?  Why think about files at all - maybe
you should be thinking about data structures with persistence?

Certainly, in the UNIX world, it's nice to have a file represented
by a single-word data pointer, rather than ugly segments of 1-4GB,
and we have 2GB disks on the market now and striped file systems
that span multiple drives - why should we treat disks the way an
8086 treats RAM?  40 bits is only 1 Terabyte - StorageTek makes
tape-beasts that large, and optical jukeboxes are getting close to
that rapidly.  48 bits might be better.

If you've got a machine with, say 10-50 MIPS (a Killer micro or
small Pyramid or Sequent) and 100-1000GB of database, maybe you
could keep the indices in RAM instead of wasting time with disk.
That way, your processors could concentrate on communications or
query-hacking, and not worry about all this disk-space business all
the time.
-- 
# Bill Stewart AT&T Bell Labs 4M312 Holmdel NJ 201-949-0705 erebus.att.com!wcs
# Fax 949-4876.  Sometimes found at Somerset 201-271-4712
# He put on the goggles, waved his data glove, and walked off into cyberspace.
# Wasn't seen again for days.

jkrueger@dgis.dtic.dla.mil (Jon) (03/06/90)

bzs@world.std.com (Barry Shein) writes:

>I think the standard reference is still TP1 (Codd & Date?) and it
>bears this ``folklore'' out.

TP1 is a simplified, non-standard version of the DebitCredit
benchmark.  DebitCredit was published: Anon Et. Al, "A Measure of
Transaction Processing Power", DATAMATION, April 1985.  TP1
is neither as useful or comparable as DebitCredit.  Its results
are less meaningful, more sensitive to factors not controlled
or reported.  It's cheaper to run, easier to manipulate to get
higher numbers, and harder to be held to as a standard.  And
DebitCredit itself is not perfect.

But again, it's my belief that you're right, that dead whales generate
more and faster i/o's.  But I have no DATA. TP1 would be better than
nothing, but DebitCredit would be the beginning of real usefulness.
Anyone have any DATA?  If TP1, results are meaningless without
including information about table sizes used and method used to
simulate users (tty vs. network vs. multiple concurrent batch).

>[good points, material deleted]

>Will [other vendors] even take an RFP
>with absolute performance requirements?

A continuing and profitable source of business for dead whales is right
here in defense land.  A popular MO for such deals is to take the RFP,
sign the requirements, deliver the system, verify that it doesn't meet
the requirements, and wait until the customer adjusts the
requirements.  The alternative, remember, is starting over.  The vendor
knows all along that the customer will have his personal reputation
heavily staked on the deal by then.  So it's impressive to state
absolute performance requirements in writing, but it's even more
impressive to meet them.  One hopes it's better when the customer is
Mastercard or J. C. Penny.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.

jkrueger@dgis.dtic.dla.mil (Jon) (03/06/90)

johnl@esegue.segue.boston.ma.us (John R. Levine) writes:

>I saw an article by someone from IBM in, as I recall, TODS that showed that
>a smaller number of faster processors would give you better database
>throughput than a larger number of slower processors of equivalent aggregate
>mips.  The crux of the argument is that when you have a lot of transactions,
>there gets to be considerable contention and the faster processors have a
>shorter hold time per transaction and fewer simultaneous transactions, hence
>fewer collisions.  I'll see if I can dig it up.

The argument rests on a very questionable assumption, to wit:
transaction latency is bottlenecked by computation.  If so, great,
SPECmarks/processor will predict TPS reliably and validly.  If not,
then the argument serves only to remind us on which side the bread is
buttered for its author.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.

jkrueger@dgis.dtic.dla.mil (Jon) (03/06/90)

pcg@odin.cs.aber.ac.uk (Piercarlo Grandi) writes:

>Let me disagree. Unfortunately almost all databases for which a
>mainframe is worth having are so immense that their working set is
>always much larger than real memory available.

You misunderstand.  I do not refer to keeping things resident.  Query
optimization, for instance has to do with minimizing search paths, not
optimizing use of fast store (although good query optimizers do figure
in costs of network latencies).  If your database is large, you might
concern yourself with avoiding unnecessary search components.  For
another example, deferred writes have to do with the timing
characteristics of disks, such as the interesting fact that it costs
about as much to write a byte as it does to write a bunch of them.  If
your database is large and active, you might find this method
worthwhile.  Neither method depends on large memory with respect to
database size.  Thus either would be particularly appropriate for cases
where the opposite will always be true.

Perhaps I misled you by my mention of the memory costs of avoiding
i/o.  This is simply a reference to the space required to store the
code that computes the query execution plan, the data that's waiting
to synchronize a table update with a committed logged write, and so
on.  None of this space represents storage for the database at any
time.  Buffering, for instance, might count as such.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.

csimmons@jewel.oracle.com (Charles Simmons) (03/06/90)

In article <783@dgis.dtic.dla.mil>, jkrueger@dgis.dtic.dla.mil (Jon) writes:
> But again, it's my belief that you're right, that dead whales generate
> more and faster i/o's.  But I have no DATA. TP1 would be better than
> nothing, but DebitCredit would be the beginning of real usefulness.
> Anyone have any DATA?  If TP1, results are meaningless without
> including information about table sizes used and method used to
> simulate users (tty vs. network vs. multiple concurrent batch).
> -- 
> Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
> The Philip Morris Companies, Inc: without question the strongest
> and best argument for an anti-flag-waving amendment.

Here at Oracle, we run TP1s on lots of different platforms.  Currently
the Amdahl 5990 is the fastest TP1 machine.  [Hmmm...  Unless Amdahl
has announced something more recent than the 5990.]  A good industrial
strength mainframe can generate something on the order of 300 or so
transactions per second.  [It might be 250, it might be 400, I don't
remember the exact number.  But it isn't 1000, unless you cheat a lot.]

A good fast mini such as a Sequent with 20 or so processors can execute
on the order of 120 transactions per second.  I think you could get
a Sun 3/50 or 3/60 to do around 10 transactions per second.  For the
current generation of RISC workstations, 20-30 TPS is the figure to shoot for.

What kind of hardware do you need for high TPS rates?  Fast disks.
When we're doing performance analysis of a port, we play lots of little
games like using a real small database that fits in memory, etc.  On
a 5990, if your database fits in memory, and if you turn off logging
[Kids!  Don't try this at home!] you can get 1000 or so TPS.  But as
soon as you start doing I/O, your performance drops dramatically.

For example, the Apollo Prism (4 processors) can do around 120 TPS...as long
as you don't do any I/O.  Start doing I/O and the number drops to 30 or
40.  Also, EDASD helps your TPS rates a lot.

An interesting measure is cost per TPS.  A mainframe costs around
$50,000 per TPS.  A PC costs about $1,000 per TPS.

Currently, the various development groups around here are having a bit
of a contest to see who can put together the first machine to execute
1000 TPS (with I/O turned on).  The next goal will be to do 1000 TPS
at a cost of $1,000 per TPS.

-- Chuck