[comp.arch] 64 Vs 32

ram@nucsrl.UUCP (03/06/87)

     As the 32-bit CPU is everywhere, is the mini-class of machines 
moving towards 64 bit?.  In a decade we have seen the migration from 
4->8->16->32.  Would there be a 64-bit micro-processor soon?
      

     Any predictions?  Do we really need 64 bit processing power?



					  renu raman
				   	....ihnp4!nucsrl!ram
						  ^^^^^^
						     +--
					        (not alpha)
		                   (our sendmail have screwed up this name)

yerazuws@rpics.UUCP (03/13/87)

In article <3810013@nucsrl.UUCP>, ram@nucsrl.UUCP (Raman Renu) writes:
> 
>      As the 32-bit CPU is everywhere, is the mini-class of machines 
> moving towards 64 bit?.  In a decade we have seen the migration from 
> 4->8->16->32.  Would there be a 64-bit micro-processor soon?
 
Yep.  Consider- it's now a standard configuration for VAX 8800's
to come with 512 megs of memory (I've got the part number around
somewhere).  A VAX has 32 bits- so if we assume (*) that all 32
can be used as memory address, a VAX (or other 32-bit processor)
can have AT MOST 4 GIG of memory.
	
If available memory continues to double every 18 months, we have
about 3.5 doubling periods left, or about 5 years.
	
   (*) assumptions like this are never valid.
-- 
	-Bill Yerazunis "VAXstation Repo Man"
-->Copyright 1987. Restricted Redistribution via Stargate PROHIBITED <--

fouts@orville.UUCP (03/13/87)

In article <3810013@nucsrl.UUCP> ram@nucsrl.UUCP (Raman Renu) writes:

>     Any predictions?  Do we really need 64 bit processing power?

Dusting off my crystal ball, I come to the following conclusion:

It depends.

As things currently stand, much of what is done with a computer is
"naturally" done with smaller word sizes;  bytes for character
processing, 16 bit integers for many things, 32 bit for others and 32
bit floating point data is often adequate.  32 bit address spaces will
probably be adequate for five or six years.

64 bit systems buy you more floating point precision at the cost of
complexity in nearly everything else.  If the memory addressable unit
is also 64 bits, then partial word computations become relitively more
expansive, because of the need to shift and mask.  If the memory
addressable unit is 32, 16, or 8 bits, than subword swapping can
become a local problem, address alignment complexity creeps in, and
memory reference operations can become more expensive.

Given a lack of preceived need which would drive a requirement for 64
bit systems, I would suspect most designers will concentrate on
building 32 bit systems which are better than current systems by some
metric such as speed, power consumption, or chip count.

Marty

faustus@ucbcad.UUCP (03/13/87)

I just read somewhere that the new HP Precision Architecture supports "up
to 64-bit virtual addresses".  It seems to have a 32-bit internal datapath,
however.  Can anybody explain what the 64-bit addresses are?

	Wayne

fouts@orville.UUCP (03/14/87)

In article <985@rpics.RPI.EDU> yerazuws@rpics.RPI.EDU (Crah) writes:

>Yep.  Consider- it's now a standard configuration for VAX 8800's
>to come with 512 megs of memory (I've got the part number around
>somewhere).  A VAX has 32 bits- so if we assume (*) that all 32
>can be used as memory address, a VAX (or other 32-bit processor)
>can have AT MOST 4 GIG of memory.
>	

Although this is true, historically when the limit is first reached,
processors aren't redesigned to use larger address spaces, but rather
work arounds are found.  When 16 bits wasn't enough address space for
micros, "bank selection" was utilized.  There are machines around
which use 24 bit addresses and manage > 24 bit memory spaces by
segmentation (IBM), or memory hierarchy (Cray).  Another possibility
is to change the granularity of address.  A word addressable 32 bit
word VAX is more likely than a byte addressable 64 bit VAX, and that
alone buys 3 more doublings in memory size.

Besides, with 4Mb memory chips, 4Gb is still (not counting ECC) 8
THOUSAND chips; so we aren't likely to see very many machines with
that many memory parts, soon.

I still hold out for ten years, if at all.

rgs@sdiris1.UUCP (03/15/87)

Control Data's current line of mainframes is 64 bits. While the question was
asked for micros, micros tend to follow the lines of the larger machines but
with a few year lag. So I think this question may best be answered by looking
at the advantages and disadvantages of 64 bits in a mainfraim environment.

The biggest advantage is, of course, floating point. Doing everything in what
is in effect double precision can be quite nice. The vast majority of code we
work with is has at least some double precision floating point. I have never
worked with an analysis package that didn't make extensive use of it. In fact
our cad/cam package running on 32 bit machines has been exclusively double
precision for quite some time now. 32 bit reals just doesn't hack it for
serious analysis (or even most ball park analysis) work. The newest entry in
the Cyber line seems to run about x times faster then a VAX, but that's a
VAX running single precision compared to a Cyber running in effect double
precision. Comparing a Cyber running single to a VAX running double is a
more accurate benchmark. It turns out be quite an impressive number when you
consider the current bottom of the line Cyber costs about the same as the
8600. That, and just try to do reasonably fast 128 bit floating point on a
32 bit machine (a Cyber's double precision is 128 bits).

The other neat thing they do is with address space. The Cyber uses a 48 bit
virtual address bus. This is divided into 3 (4 for purists) fields. These
are:
------------------------------------------------------------------------------
| Ring | Segment  | Indirection|     Address                                 |
______________________________________________________________________________
47   44 43      32  31       31  30                                         0

Each process has 4096 segments of 2 Gigabytes each. Code resides in one
segment, stack in another, heap in another, data in another, and each virtually
addressed file in yet another. This is kind of like a virtual version of a 8086
achitecture, but on a BIG scale. The other two fields are ring and indirection.
The indirection bit is for indirection chains through memory, and the ring
field (16 rings) is used to seperate levels of the OS from the application.
The whole idea was to make it very difficult to run out of virtual memory even
if you segment your data structures into different bases.

Other than that, the advantage of 64 bit integers isn't that great. As has been
mentioned, most things fit in 32 bit integers. It does seem to lessen the worry
about integer overflows tho. The biggest cost for 64 bit integers is, of
course, the memory it uses. But let's face it, memory is not quite the cost
consideration it used to be.

I wish I could give more details on the achitecture used, but I just haven't
worked with the machines, so I don't really know other than what I've read.
I also wish I could give real numbers about cost and performance comparisons,
but little things like "company private" labels dissuade me.
-- 
UUCP: ...!sdcsvax!jack!man!sdiris1!rgs |  Rusty Sanders
Work : +1 619 450 6518                 |  Control Data Corporation (CIM)
                                       |  4455 Eastgate Mall, 
Insert standard disclaimers here.      |  San Diego, CA  92121

aglew@ccvaxa.UUCP (03/15/87)

...> 64 bit machines

One respondent seemed to be under the impression that a "64 bit machine"
implied that the minimum addressible unit was 64 bits. That's another
tradeoff entirely. What I mean, and what I think is being talked about,
is machines that have address sizes (and integers, etc.) of 64 bits.

>32 bit machines are already here: the IBM RT ROMP has a 40 bit
virtual address and I've seen articles saying that the 80386 has a 16T
address space (Milutinovc in last month's computer).  What are the
exact specs on the 80386?

chuck@amdahl.UUCP (03/15/87)

In article <755@ames.UUCP> fouts@orville.UUCP (Marty Fouts) writes:
>Besides, with 4Mb memory chips, 4Gb is still (not counting ECC) 8
>THOUSAND chips; so we aren't likely to see very many machines with
>that many memory parts, soon.

A 4Gb memory is roughly 1.5 memory generations away from the
current 512Mb memory.  (Memory density increases by a factor
of four from one generation to the next.)  A 4Gb memory would
thus be built with 16Mb chips or even 64Mb chips, reducing the chip
count to 2,048 or 512 chips.

greg@utcsri.UUCP (Gregory Smith) (03/15/87)

In article <755@ames.UUCP> fouts@orville.UUCP (Marty Fouts) writes:
>In article <985@rpics.RPI.EDU> yerazuws@rpics.RPI.EDU (Crah) writes:
>
>>Yep.  Consider- it's now a standard configuration for VAX 8800's
>>to come with 512 megs of memory (I've got the part number around
>>somewhere).  A VAX has 32 bits- so if we assume (*) that all 32
>>can be used as memory address, a VAX (or other 32-bit processor)
>>can have AT MOST 4 GIG of memory.
>>	
>c
>Although this is true, historically when the limit is first reached,
>processors aren't redesigned to use larger address spaces, but rather
>work arounds are found.  When 16 bits wasn't enough address space for
>micros, "bank selection" was utilized.  There are machines around[...]

Even if you have 128 GIG of physical memory, it is still reasonable
to restrict user processes to 4 GIG of virtual memory. Some PDP-11's
have a 22-bit physical address space and all have a 16-bit virtual
address space ( is the 22-bit figure right? ). A vax running a process
in PDP-11 mode gives it only a 64K byte virtual address space.

The point of all this being that you can still get by with 32-bit user
registers and data width - you just have to widen the MMU. 

Another question: If you get to the point where you are using more
than 2 GIG of virtual space on your 32-bit machine, and thus have
'negative' addresses, how many compiler (and other) bugs are suddenly
going to show up? You had to be careful with comparisons on the
PDP-11, and I've seen some sloppy code for 32-bit machines which assumes
that pointers are always 'positive'.

>Besides, with 4Mb memory chips, 4Gb is still (not counting ECC) 8
>THOUSAND chips; so we aren't likely to see very many machines with
>that many memory parts, soon.

I have used an 11/34 with 256K, made of 2Kx1 static rams. That's 1024
chips, (which probably used much more power per chip than a 1M chip)
and they were all on a single hex UNIBUS card, and not very densely,
either. A double-sided board.

>I still hold out for ten years, if at all.
Maybe 10 years for the 64-bit data path on a chip, but not for >32 bits
address in an external MMU chip.

-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

hansen@mips.UUCP (Craig Hansen) (03/16/87)

In article <1308@ucbcad.berkeley.edu>, faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
> I just read somewhere that the new HP Precision Architecture supports "up
> to 64-bit virtual addresses".  It seems to have a 32-bit internal datapath,
> however.  Can anybody explain what the 64-bit addresses are?

The 64-bit addresses are segmented addresses, with a 30-32 bit offset, and
an (address) space identifier of up to 32 bits. Space identifiers are held
in separate registers from the general registers, and are referenced either
explicitly by the instruction or implicitly by the high-order bits of the
offset.

The purist would complain that this isn't at all the same as having a 64-bit
linear virtual address, and he'd be right - it isn't. However, as you can
see from the datapath width (32 bits), Precision isn't a 64-bit machine, and
it wasn't really expected that a single process would need more than a
32-bit address space.

Remembering that what we're talking about is virtual addresses, the size of
the virtual address space really reflects the fact that Precision was
designed for virtual-addressed caches, and you'd like to avoid flushing the
caches on context switches.  The first machine that HP has released has
16-bit space identifiers, which reflects the number of simultaneous
processes that are expected to run on the machine.

I'd be interested in hearing opinions on the necessary sizes of both virtual
and physical address spaces for machines over the next five to ten years,
both per-process and per-system.  (Without trying to color the responses, my
personal view is that per-system physical addresses will have to grow beyond
32 bits on high-end machines within five years, and that virtual addresses
(per-process) will have to go beyond 32 bits in the same time frame.)

Looks like memory is becoming cheap and memory addressing is becoming dear...

-- 

Craig Hansen			|	 "Evahthun' tastes
MIPS Computer Systems		|	 bettah when it
...decwrl!mips!hansen		|	 sits on a RISC"

tuba@ur-tut.UUCP (Jon Krueger) (03/16/87)

Bill Yerazunis "VAXstation Repo Man" writes:
> A VAX has 32 bits- so if we assume (*) that all 32
>can be used as memory address, a VAX (or other 32-bit processor)
>can have AT MOST 4 GIG of memory.
>   (*) assumptions like this are never valid.

VAX architecture limits each process to a virtual 32 bit space.  The
processor may generate physical addresses of different sizes.  VAX
implementations have supported physical spaces as small as 24 bits
(microvax) and as large as 30 bits (11/780).  Therefore, DEC could
choose to do with VAX as they did with PDP-11s, support virtual spaces
smaller than available physical memory.  Of course, that might not
be the Thing To Do, for a variety of reasons.

-- 
Jon Krueger   Department of Necessary Evil   University of Rochester
uucp: {seismo, allegra, decvax}!rochester!ur-tut!tuba   BITNET: TUBA@UORDBV

greg@utcsri.UUCP (Gregory Smith) (03/16/87)

In article <28200016@ccvaxa> aglew@ccvaxa.UUCP writes:
>
>...> 64 bit machines
>
>One respondent seemed to be under the impression that a "64 bit machine"
>implied that the minimum addressible unit was 64 bits. That's another
		 ?maximum?
>tradeoff entirely. What I mean, and what I think is being talked about,
>is machines that have address sizes (and integers, etc.) of 64 bits.

I discussed in my last posting machines with >32 bit addresses. Out of
context, though, I take '64-bit-machine' to imply a 64-bit wide data
path and memory system. I suspect the practice of giving the size of
the address originated when IBM called the 8088 a 16-bit CPU and
Apple retaliated by calling the 68000 a 32-bit CPU.

Remember (back around '83 or so) when 8080 (CP/M) and 8088 (IBM PC)
software were distinguished in the consumer mags simply by being called
'8-bit' and '16-bit' software? Bizarre market.
>
>>32 bit machines are already here: the IBM RT ROMP has a 40 bit
>virtual address and I've seen articles saying that the 80386 has a 16T
>address space (Milutinovc in last month's computer).  What are the
>exact specs on the 80386?

This is largely marketingese. The 386 has ten 32-bit registers,
corresponding to ten 16-bit registers on the 86/286 (and six of them
corresponding to six 16-bit registers on the 8080 hee hee haw haw).

It also has six 16-bit segment registers. Four of these correspond to
the 86's registers; two are just extra extra segments (FS and GS).
Like on the 286, these are 'magic' registers; when you load a number
into one of them in any way, the CPU goes off behind your back and reads
the segment descriptor for that segment. Thus there are six sets of
hidden registers, one for each segment: A 'physical base' register,
giving the base address of the segment (unlike the 86, this can be
specified down to the byte); A 'size' register, giving the length of the
segment ( stack segments can grow backwards, like on the 286 ); and
access information. The access information indicates whether you can
read, write, or execute the segment (in a nutshell, anyway). Intel
calls all these registers a 'segment cache', although the loading
of the registers is under program control. If you load the same
segment number twice into a segment reg, it will load the info
twice ( at least on the 286 ).

The difference between the 286 and the 386 in this area is this:
The 286 allows segment bases to be specified as a 24-bit quantity, and
segment lengths to be given in 16 bits. This gives 64K segments within
a 16 Meg address space. The 386 extends both by 8 bits; thus the segment
base is specified in 32 bits and the length in 24. This allows 16-meg
segments in a 4 GIG physical space. (This is gleaned from a 286 book
and a glossy article (Micro Dec 85) on the 386). The article says
that 4-Gig segments are possible, so there must be an option in the
386 segment descriptor to select 'huge' segments where the size is
given in units of 256 bytes. Or something.

The article says that the virtual address space is 2^46, or 64 Terabytes.
This is arrived at as follows: A virtual address is given by a 32-bit
offset  and a 16-bit segment selector. Two of the selector bits are
a privilege level, and do not contribute to the selection process, so
you get 14+32 = 46.

If you are using segmentation, you will probably have <1000 segments
active per process, and segment sizes varying from a few bytes to a few
100k. Thus this 64TB address space is extremely sparse in practice,
which can be seen as a good thing in terms of fault protection. It is
obtained at the expense of constantly loading the segment registers
whenever you have to look at something you may not have looked at
recently (and the associated cost of reading 2 doublewords of
descriptor).

Another difference: the 386 has a conventional paging unit, which can
be used to translate the 32-bit address produced by the segmentation
unit into a physical memory address. Thus you can scrap the segment
registers (by pointing them all to the same huge segment and ignoring
them) and just use paging, e.g. to run BSD. Under this model, the
virtual address space is 4 Gig. Of course, you can use the segment
registers as intended and use the paging unit too (It slices, it dices...)

An inference: from the description in the article, I have to conclude
that the 386 has its own instruction set ( the 286 opcode space is too
full, and does not contain the kind of instructions that you would want
in a 32-bit machine, or the addressing modes described in the article).
Thus the 386 has two instruction modes (386/286 opcodes). I imagine the
286 instructions are available with different encoding (and probably
different addressing) within the 386 set. Maybe they'll make one that
runs 6502 code too so we can make Apple-compatible PC's.

One of the problems with the 286 is that, although it can emulate the
86's segmentation (base = 16*seg #, length = 64k, access = any), it
can only do so when the more powerful segmentation (and task switching)
stuff is disabled ( I.e. in Real Address Mode). This means that you
can run an 8086 OS on your 286, but you can't run an 8086 program under
an operating system which uses the 286's features.

The 386 has a mode (VM86) which can be selected to provide 8086 segment
emulation on a per-process basis, and presumably protect the rest of
the system from the internally unprotected 8086 process. The 386, like
the 286, powers up in 'global' 86 mode.
-- 
----------------------------------------------------------------------
Greg Smith     University of Toronto      UUCP: ..utzoo!utcsri!greg
Have vAX, will hack...

faustus@ucbcad.berkeley.edu (Wayne A. Christopher) (03/16/87)

In article <985@rpics.RPI.EDU>, yerazuws@rpics.RPI.EDU (Crah) writes:
> Yep.  Consider- it's now a standard configuration for VAX 8800's
> to come with 512 megs of memory (I've got the part number around
> somewhere).  A VAX has 32 bits- so if we assume (*) that all 32
> can be used as memory address, a VAX (or other 32-bit processor)
> can have AT MOST 4 GIG of memory.
> 	
> If available memory continues to double every 18 months, we have
> about 3.5 doubling periods left, or about 5 years.

I don't know if this is the right question to ask....  A machine could
have over 4G of real memory, but the most one process could address is
4G of virtual memory.  In the case of the VAX, 1/4 of this is "regular"
data and text space, 1/4 is stack space, and the other half is system
space.

What we need to ask is, who will need more than ~1G of memory?  I think
the only applications that currently could use this much memory are
scientific programs that run on Crays (which I think are addressible to
the 64-bit word anyway).  I certainly haven't been running into the 1G
limit too often lately.

I think the tradeoff here is between speed and address space -- from a
32-bit internal datapath to a 64-bit datapath is a big step in terms of
chip complexity, and if you're trying to get the most speed per square
mm, you'd be better off sticking to 32 bits and adding more registers
or whatever.

	Wayne

kds@mipos3.UUCP (03/16/87)

In article <28200016@ccvaxa> aglew@ccvaxa.UUCP writes:
>virtual address and I've seen articles saying that the 80386 has a 16T
>address space (Milutinovc in last month's computer).  What are the
>exact specs on the 80386?

the 386 has a 32-bit physical address bus brought out of the chip.  Internally,
the chip can address 2**14, 2**32 byte segments, so that is where the maximum
size of the address space (2**46) comes in.  Note that internally, the
segmented address space is transformed into a 32-bit "linear" address space
which is presented to the paging unit, which transforms it to the 32-bit
physical address space.  So, what you get is a 32-bit "flat" machine, or
a 32-bit machine with a much larger segmented memory space.  As was said about
the HP "Precision" machine, this may not quite what this discussion is about,
since it still is a 32-bit machine.
-- 
The above views are personal.

The primary reason innumeracy is so pernicious is the ease with which numbers
are invoked to bludgeon the innumerate into dumb acquiescence.
			- John Allen Paulos

Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
uucp: ...{hplabs|decwrl|amdcad|qantel|pur-ee|scgvaxd|oliveb}!intelca!mipos3!kds
csnet/arpanet: kds@mipos3.intel.com

lamaster@pioneer.arpa (Hugh LaMaster) (03/16/87)

In article <205@dumbo.mips.UUCP> hansen@mips.UUCP (Craig Hansen) writes: : >
>I'd be interested in hearing opinions on the necessary sizes of both virtual
>and physical address spaces for machines over the next five to ten years,
>both per-process and per-system.  (Without trying to color the responses, my
>personal view is that per-system physical addresses will have to grow beyond
>32 bits on high-end machines within five years, and that virtual addresses
>(per-process) will have to go beyond 32 bits in the same time frame.) >
>Looks like memory is becoming cheap and memory addressing is becoming dear...
: First a statement of personal bias:  Most of the machines that I have done
numerical computing on for the past 15 years have been CDC and Cray; with
single precision floating point precision of 60-64 bits.  The Cyber 205 has
full 32 bit support, but as "half precision".  Most of the work that I have
done is either engineering and scientific numerical computing, or support of
people doing the same.  Therefore, I have always seen the world as a 64 bit
world for DATA, with 32 bits when you could get by with it and needed it to do
a larger problem.  It is my experience that 32 bits is not sufficient often
enough (for floating point, which is what I usually think about), that 64 bits
is a much more natural word size.  Actually, I have also used Burroughs 48 bit
word machines, and in some respects 48 bits is really the natural single
precision word size for many floating point problems.  However, it is very
inconvenient (for those of us who need to move binary data over a network
between machines of different types) not to have the basic word size a power
of two, and 64 bits is the next convenient size.

It is also my observation that there are already machines being built  
today for which 32 bits/byte addresses are at the limit.  The Cray-2
has 256 MW of 8 byte, 64 bit, words = 2GB, and the next version will
have 512 MW.  Within 4 years, 4 Mbit memory chips will be commonplace
and 16 Mbit may be available, so a machine with 32-128 GB is likely.  
Therefore, physical memory will probably be beyond the 32 bit limit, even
for word addresses, on high end machines, within 5 years.  Furthermore,
there are plenty of applications for memories this size, so don't assume
that no one will know what to do with them if they are available.

One machine that I have used which has a very large linear address
space is the Cyber 205, which has 48 bit addresses (and 48 bit integers)
which fit inside 64 bit words (the other 16 bits is used for something
else-vector length).  These 48 bit addresses are actually bit addresses
(a feature that I wish more machines had - bit vectors are very useful)
so the machine is capable of addressing about 2 trillion words per process,
enough for the next five years, but maybe not for the next fifteen years.
Large linear address spaces are practical and in use today.

Finally, there is the internal data path size.  On the Crays, this is 64 bits.
On the Cyber 205, this is 512 bits (a "super-word").  Some machines with caches
use 64-128 bits, even with 32 bit words.  So an internal data path size of
64 bits or more is certainly useful.

Therefore, for my purposes, I would like to see the next generation of
architectures use 64 bits for data and 64(or 63) for (preferably bit, not byte
or word) addresses.  There shouldn't be any architectural reasons for limiting
physical memory size to less than 2**63 bits.  The internal data path size
should be flexible, 64 bits to 512 bits, depending on the implementation, size
of the processor, and cache considerations.  



  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

"In order to promise genuine progress, the acronym RISC should stand 
for REGULAR (not reduced) instruction set computer." - Wirth

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

news@cit-vax.Caltech.Edu (Usenet netnews) (03/17/87)

Organization : California Institute of Technology
Keywords: 
From: jon@oddhack.Caltech.Edu (Jon Leech)
Path: oddhack!jon

In article <1310@ucbcad.berkeley.edu> faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
>What we need to ask is, who will need more than ~1G of memory?  I think
>the only applications that currently could use this much memory are
>scientific programs that run on Crays (which I think are addressible to
>the 64-bit word anyway).  I certainly haven't been running into the 1G
>limit too often lately.

	People doing graphics. One of the people here has ray-traced
a picture containg 4e11 polygons. The only way he could get away with
it right now is by having multiple levels of instantiation of objects,
resulting in a very boring scene, but eventually we will want that many
DIFFERENT objects. There will never be enough memory.

    -- Jon Leech (jon@csvax.caltech.edu || ...seismo!cit-vax!jon)
    Caltech Computer Science Graphics Group
    __@/

eugene@pioneer.arpa (Eugene Miya N.) (03/17/87)

Wayne A. Christopher writes:

>What we need to ask is, who will need more than ~1G of memory?  I think
>the only applications that currently could use this much memory are
>scientific programs that run on Crays (which I think are addressible to
>the 64-bit word anyway).  I certainly haven't been running into the 1G
>limit too often lately.
>
>	Wayne

I think Wayne displays a bit of shortsightedness which has been common
through out the computer industry since its inception.  (Don't take this
personally, I don't write many 1G programs either.)  I can think of lots
of potential uses like Star War (Lucasfilm movies).  Graphics is very
compute intensive ("Welcome to the Teraflop Club" as J.C. would say at
Caltech).  I think the need grows to fill the void.  I also think this
example shows that introspection is not a good tool for design.

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

lamaster@pioneer.arpa (Hugh LaMaster) (03/17/87)

In article <1310@ucbcad.berkeley.edu> faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
:
>
>What we need to ask is, who will need more than ~1G of memory?  I think
>the only applications that currently could use this much memory are
>scientific programs that run on Crays (which I think are addressible to
>the 64-bit word anyway).  I certainly haven't been running into the 1G
>limit too often lately.
>
There could be other applications besides the often mentioned scientific
and graphics programs which could potentially benefit from machines with
larger address spaces.  Consider database applications, for example.
Some sites have hundreds of 3380 disk drives holding databases.  Currently,
there is no way to address more than about one disk worth (3380's 
are 2.5-5.0 GB) in IBM's 32 bit address space.  With a 64 bit address space,
the entire disk farm could be addressed.  All this data need not be 
accessed at one time in order to justify doing this: there are efficiency
improvements in access time which would be expected by using the pager
as the access method.  There may be other applications which would justify
the use of a large address space. 

The proper metric to use is the size of the system secondary storage.  The
virtual address space should be big enough to encompass the size of secondary
storage in a virtual memory world.  I know of one system that allows virtual
memory to be kept on tape as well as disk (obviously the access time is
slower, but the entire memory segment is restored once an access is made;
efficient).  At a minimum, the size of the largest OBJECT should fit in the
address space, and this object may be a database, a process address space for
an engineering, scientific, or graphical program, or other applications.  What
other applications or objects have people encountered that needed more than 
1 GB?

Since today there are deskside servers available for 4 MIPS workstations with
560 MB on a disk, I don't think the day is far off when even "small" systems
may find the 32 bit limit a burden.  I foresee a rosy future for the company
which chooses to start developing its 64 bit systems now in anticipation of 
that future.  



  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

"In order to promise genuine progress, the acronym RISC should stand 
for REGULAR (not reduced) instruction set computer." - Wirth

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

baum@apple.UUCP (03/17/87)

--------
[]
>In article <1308@ucbcad.berkeley.edu> faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
>I just read somewhere that the new HP Precision Architecture supports "up
>to 64-bit virtual addresses".  It seems to have a 32-bit internal datapath,
>however.  Can anybody explain what the 64-bit addresses are?
>
>	Wayne

Each memory reference instruction in the Precision architecture has
two or three bits to select a virtual space register. These can be
viewed as a set of eight segment registers, although the 'segments'
don't overlap in a virtual address space; there are 2^32 possible
addresses spaces of 2^32 bytes, so essentially each space has its own
page map.  Aliasing is possible, but dangerous. There are some
shortcuts so that a 32 pointer can be used for some subset of
addresses and space registers (the space register number is taken
from the top 2 bits of the pointer.) There are three architectural
levels with 0, 16, and 32 bit space registers. Details of how the
registers are managed can be found in 'Precision Architecture and
Instruction Reference Manual, manual part number 09740-90014 (Nov.
86).

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

kevin@zeke.UUCP (Kevin Buchs) (03/17/87)

In article <784@ames.UUCP> lamaster@pioneer.UUCP (Hugh LaMaster) writes:
>In article <1310@ucbcad.berkeley.edu> faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
>>What we need to ask is, who will need more than ~1G of memory?  I think
>>the only applications that currently could use this much memory are
>>scientific programs that run on Crays (which I think are addressible to
>>the 64-bit word anyway).  I certainly haven't been running into the 1G
>>limit too often lately.
>>
>There could be other applications besides the often mentioned scientific
>and graphics programs which could potentially benefit from machines with
>larger address spaces.  Consider database applications, for example.

Perhaps we should break free from thinking about virtual memory in the
first place.  What about programs and files "permanently" resident in
memory.  I believe that once one starts thinking this way, the uses of 
large memories are abundant - disk access is slower than memory.  Also
the lack of the hardware and operating system overhead to support 
virtual memory could speed up processing.


-- 
Kevin Buchs   3500 Zycad Dr. Oakdale, MN 55109  (612)779-5548
Zycad Corp.   {rutgers,ihnp4,amdahl,umn-cs}!meccts!zeke!kevin

kludge@gitpyr.gatech.EDU (Scott Dorsey) (03/17/87)

Wayne A. Christopher writes:

>What we need to ask is, who will need more than ~1G of memory?  I think
>the only applications that currently could use this much memory are
>scientific programs that run on Crays (which I think are addressible to
>the 64-bit word anyway).  I certainly haven't been running into the 1G
>limit too often lately.
>
>	Wayne

   "You don't really need 64K.  48K is so much memory that nobody 
    really knows what do with all of it.  " 
                               -- Ithaca Audio Salesman

   "It's possible to upgrade the machine to 640K, but there isn't 
    any software that uses more than 256K"
                               -- IBM PC Salesman
-- 
Scott Dorsey   Kaptain_Kludge
ICS Programming Lab (Where old terminals go to die),  Rich 110,
    Georgia Institute of Technology, Box 36681, Atlanta, Georgia 30332
    ...!{akgua,allegra,amd,hplabs,ihnp4,seismo,ut-ngp}!gatech!gitpyr!kludge

howard@cpocd2.UUCP (Howard A. Landman) (03/18/87)

In article <1310@ucbcad.berkeley.edu> faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
>What we need to ask is, who will need more than ~1G of memory?

For quick construction of hypothetical memory-hogs, I always think of image
processing.  Let's say you're working on data from an imaging spectrophotometer
which has 256 frequency bands and 16-bit resolution in each band, and your
image is 4K by 4K pixels.  That's 2^(8+1+12+12) = 2^33 = 8 GB memory to hold
just *ONE* *IMAGE*.  Sure would be nice to get it all in memory at once.  Of
course, it would also be nice to have a processor-per-pixel; that's only
16 M processors, each with at least 512 bytes of working memory, is that too
much to ask?  Today, you might be able to do it with 256 Connection Machines
($1 M each) connected in a big array or hypercube arrangement ... shouldn't
cost more than a quarter billion if we can get a volume discount. :-)
-- 

	Howard A. Landman
	...!intelca!mipos3!cpocd2!howard

faustus@ucbcad.berkeley.edu (Wayne A. Christopher) (03/18/87)

In article <3302@gitpyr.gatech.EDU>, kludge@gitpyr.gatech.EDU (Scott Dorsey) writes:
> >What we need to ask is, who will need more than ~1G of memory?
> 
>    "You don't really need 64K.  48K is so much memory that nobody 
>     really knows what do with all of it.  " 

Hmm, I seem to have said this the wrong way...  I was asking what
applications that people run right now are up against the 1G limit, not
suggesting that there are no such applications.  For my purposes, and I
think for the purposes of 95% of computer users, there are better ways
to use chip area right now than for 64-bit datapaths.  Sure, it would
be nice to have a 100 MIPS machine with a 128-bit datapath, but we have
a bit of time to wait until it's on our desks.

	Wayne

sher@rochester.ARPA (David Sher) (03/19/87)

Remember that any kind of calculation can be speeded up and made more
accurate by using table lookup.  Thus instead of using cumbersome and
slow calculations to do sin just do table lookup.  (You'd be surprized
how many applications could use a fast sin function).  The more memory
the more functions can be economically made into tables.  Thus one can
justify larger memory for any user who has critical functions that are slower
than a few memory references.
-- 
-David Sher
sher@rochester
{allegra,seismo}!rochester!sher

lawrenc@nvanbc.UUCP (Lawrence Harris) (03/19/87)

In article <4372@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>In article <28200016@ccvaxa> aglew@ccvaxa.UUCP writes:
>
>An inference: from the description in the article, I have to conclude
>that the 386 has its own instruction set ( the 286 opcode space is too
>full, and does not contain the kind of instructions that you would want
>in a 32-bit machine, or the addressing modes described in the article).
>Thus the 386 has two instruction modes (386/286 opcodes). I imagine the
>286 instructions are available with different encoding (and probably
>different addressing) within the 386 set. Maybe they'll make one that
>runs 6502 code too so we can make Apple-compatible PC's.
>
No, the 386 uses the same binary instruction encoding as the 286.  It
adds a new mode byte for extended addressing modes, and a bit in the
segment descriptor for a code segment which tells the processor whether
the code in this segment is to be 16 or 32 bit integer.  There is also
a prefix byte to switch an individual instruction to the opposite size.

I am told that XENIX V/286 runs on the Compaq 386 right out of the box
so the cpu's must be very compatable at the binary level.

Also there is another bit which says whether the segment size is in bytes
or increments of (I think) 16 bytes, this is how you can have segments up
to 1Mb now.

I believe there was a recent article in byte magazine on the 386.

rbl@nitrex.UUCP (03/20/87)

In article <784@ames.UUCP> lamaster@pioneer.UUCP (Hugh LaMaster) writes:
>In article <1310@ucbcad.berkeley.edu> faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
>:
>>
>>What we need to ask is, who will need more than ~1G of memory?  ...
>There could be other applications besides the often mentioned scientific
>and graphics programs which could potentially benefit from machines with
>larger address spaces.  Consider database applications, for example.
>...
>... What other applications  .. have people encountered that needed more 
> more than 1 GB?
Possible applications needing more than 1 GB:
o AI rule bases (close to a database)
o Text files (sometimes data for an AI application)
o Signal processing and image processing, including seismic.  I've heard that
one seismic series can be a semi trailer full of mag tapes.
o Interpretation of spectra from chemical analytic instruments.
o Dynamic system modeling, including weather forecasting, atmospheric dispersion
of hazardous materials (Chernobyl anyone?).
o Statistical analyses.

What would the net present value of "real" memory be if the "tape farms" and
"disk farms" could be replaced?  Allow for maintenance charges for electro-
mechanical equipment and operator time if the tape library is not automated.

Disclaimer:  I never have done seismic.  Any opinions are my own professional
thoughts and are not those of my employer.

Rob Lake
decvax!cwruecmp!nitrex!rbl
(cwruecmp soon to be domain-ized)

davidsen@steinmetz.UUCP (03/21/87)

The discussion of 32 and 64 bits is the best thing on this group
for a long time! One factor which hasn't been discussed yet is
the *need* for 64 bit machines.

Eight bit machines could do office automation quite well, and
many were used with Concurrent CP/M as multiuser machines. The
16 bit machines are being used to run small businesses, schools,
etc. The 68020 and 80386 have enough power to run large
businesses, schools, city and county government, etc.

The reason that machines like PCs sold so well is that many
people really needed them, and the large market brought
competition and ecomonics of scale. The market for 32 bit
computers goes all the way to running most countries. Who then
needs 64 bits?

The engineering types with large problems, the graphics hackers,
the people doing realtime Mandlebrot displays, even include
software developers and the percentage of the market is pretty
small. All the reasons for the rapid migration to 16 bits and
the steady evolution to 32 are missing.

I predict that the micro and mini computer markets will be
largely 32 bit for 5-8 years. I also predict the UNIX will
become much more common, due to the cost of developing new
software. The 32 bit computers we have now will run businesses
and governments up to the national level, provide cheap
multiuser office automation, run both classes and administration
in schools, etc. IBM has gone with 32 bits for 20 years, I see
no reason to think they will change. They have left the 64 bit
market to CDC and Cray as a low volume niche.

The issue is economic rather than technical, computers are
subject to the same market factors as automobile engines: it's
fun to have a big one, but most people won't spend a lot to get
more than they need. The market (units and $) is in the small
family sedan, 12 passenger vans and dump trucks are sold only to
those who need them enough to pay, and sports cars are bought by
people who want to project a flashy image.

Disclamer:
 my opinion only.

Suggestion: before you write that flame, talk to someone who has
had some formal training in economics, or dig out your old text.
-- 
bill davidsen			sixhub \
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)

kds@mipos3.UUCP (03/21/87)

>Also there is another bit which says whether the segment size is in bytes
>or increments of (I think) 16 bytes, this is how you can have segments up
>to 1Mb now.

actually, the bit indicates whether the segment size is in bytes or pages
(4k) increments.  This allows segments up to 4 GBytes.
-- 
The above views are personal.

The primary reason innumeracy is so pernicious is the ease with which numbers
are invoked to bludgeon the innumerate into dumb acquiescence.
			- John Allen Paulos

Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
uucp: ...{hplabs|decwrl|amdcad|qantel|pur-ee|scgvaxd|oliveb}!intelca!mipos3!kds
csnet/arpanet: kds@mipos3.intel.com

bobmon@iuvax.UUCP (03/21/87)

davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:
>
>Eight bit machines could do office automation quite well, and
>many were used with Concurrent CP/M as multiuser machines. The
>16 bit machines are being used to run small businesses, schools,
>etc. The 68020 and 80386 have enough power to run large
>businesses, schools, city and county government, etc.
>
>The reason that machines like PCs sold so well is that many
>people really needed them, and the large market brought
>competition and ecomonics of scale. The market for 32 bit
>computers goes all the way to running most countries. Who then
>needs 64 bits?
>	[...]
>I predict that the micro and mini computer markets will be
>largely 32 bit for 5-8 years.  [...]

I replaced an 8-bit 6502 machine with an 8088 (V20 now), and wish I had
something "one size larger."  Not because of the address space, 640K is
adequate for what I'm doing at the moment.  The issue for me personally is
one of speed.  If/when I move on to more memory, the ad hoc methods used to
access it on my current machine will make that access quite slow; as it is, I
start some runs, go to bed, and look for results in the morning.

In the microcomputer arena the increased word sizes have been accompanied by
increased clock rates, and I think that most people who want larger machines
want the speed at least as much as the memory; given virtual memory, speed is
the entire issue.  (No, not always, I agree.)

So I agree that for word size per se, 32 bits is surely adequate; 16 bits is
probably sufficient for most things that people do (floating point and address
space are the two exceptions I can think of).  BUT if somebody comes out with
a 64-bit CPU that also moves at 50MHz (or whatever), I think there will be a
large market for that CPU in applications that are insensitive to the word size
but are quite sensitive to the speed.

I don't see that 64 bits does imply a faster machine when working on characters,
"normal-size" integers, etc., and to that extent I agree that 32-bit machines
will probably dominate until people stop making them faster.  As Bill points
out, IBM has been making them faster for a long time and will probably continue
to do so.

chuck@amdahl.UUCP (03/22/87)

In article <3436@iuvax.UUCP> bobmon@iuvax.UUCP (Che' Flamingo) writes:
>davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:
>>
>>etc. The 68020 and 80386 have enough power to run large
>>businesses, schools, city and county government, etc.

Kinda makes you wonder why the Fortune 500 spends on the order of
$5,000,000 for an IBM mainframe or Amdahl mainframe when 68020
based workstations can be had for on the order of $50,000.

>So I agree that for word size per se, 32 bits is surely adequate; 16 bits is
>probably sufficient for most things that people do (floating point and address
>space are the two exceptions I can think of).

16 bits is clearly not sufficient for most things people do.  Consider
games like rogue which have a tendency toward creative bugs caused by
the fact that some numbers are stored in 16 bit integers as opposed
to 32 bit integers.

Consider any moderately sized company with a gross income of a few
hundred thousand dollars a year.  It would be nice, when generating
reports describing where the money came from and where it went, to
store these figures as fixed point numbers.  But they certainly won't
fit in 16 bit integers.

Any moderately sized database will contain more than 64K records.

I used to use 36 bit integers and had far fewer problems with
integer overflow than people have using 32 bit integers.

Let's think up some applications where 32 bit integers are a little
too small...  People have already mentioned floating point.  How
about the federal government?  Let's see...  they deal with a budget
on the order of a trillion dollars?  100 trillion cents?  I'll bet
they wouldn't mind having a 64-bit bcd word for holding their monetary
variables in their cobol programs.  Even Fortune 500 companies deal
with amounts of money larger than 4 million cents.

How about games?  An othello board can be stored very nicely in
two 64 bit words.  (Actually, 72 bit words work even nicer, but
Honeywell machines are hard to come by.)  Many of the algorithms
for manipulating othello boards are easier to code if you can
perform logical operations on 64 bit words instead of 32 bit
words.

Then there are those people over in net.math who like to look for
godawfully large prime numbers.  I'm sure they wouldn't mind
arithmetic operations on words larger than 32 bits.

By the by...  Amdahl mainframes (and hence probably IBM mainframes
as well) do support 64 bit integers.  Past experience suggests that
anything that exists on a mainframe (cache, virtual memory, time-sharing,
protected address spaces...) will eventually appear on a micro.  Two
items that exist on mainframes, but not yet on micros, are 128-bit data
paths between the cpu and memory, and operations on 64-bit words.

To summarize, there are many current applications that use, or would
like to use, 64-bit words.  When chip makers start building chips
with 64-bit internal data paths, the chips will sell, and the 64-bit
data paths will be useful.

-- Chuck

roy@phri.UUCP (03/22/87)

In article <985@rpics.RPI.EDU> yerazuws@rpics.RPI.EDU (Crah) writes:
> A VAX has 32 bits- so if we assume (*) that all 32 can be used as memory
> address, a VAX (or other 32-bit processor) can have AT MOST 4 GIG of memory.

	Remember when the 11/45 came out and they managed to wedge an
18-bit physical address space onto a 16-bit machine and then the 11/70 came
along and they managed to extend that to 22 bits?  Anybody willing to guess
when we'll see 48 bit physical addresses on 32 bit machines?  Anybody for
demand paging and bank switching at the same time?

	A question: I remember reading somewhere that the top 2 bits in a
VAX address signify user/kernel and data/text segments, which really leaves
you with "only" a 30-bit address space.  Is that a hardware feature or just
the way VMS sets up the memory map?
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

"you can't spell deoxyribonucleic without unix!"

anton@utai.UUCP (03/23/87)

In article <390@nvanbc.UUCP> lawrenc@nvanbc.UUCP (Lawrence Harris) writes:
>In article <4372@utcsri.UUCP> greg@utcsri.UUCP (Gregory Smith) writes:
>>In article <28200016@ccvaxa> aglew@ccvaxa.UUCP writes:
>>
>>An inference: from the description in the article, I have to conclude
>>that the 386 has its own instruction set ( the 286 opcode space is too
>>full, and does not contain the kind of instructions that you would want
>>in a 32-bit machine, or the addressing modes described in the article).
>>Thus the 386 has two instruction modes (386/286 opcodes). I imagine the
>>286 instructions are available with different encoding (and probably
>>different addressing) within the 386 set. Maybe they'll make one that
>>runs 6502 code too so we can make Apple-compatible PC's.
>>
>No, the 386 uses the same binary instruction encoding as the 286.  It
>adds a new mode byte for extended addressing modes, and a bit in the
>segment descriptor for a code segment which tells the processor whether
>the code in this segment is to be 16 or 32 bit integer.  There is also
>a prefix byte to switch an individual instruction to the opposite size.

Intel was planning ahead when they designed the 286, or perhaps the 286
was just a temporary measure before the 386 arrived.  Intel reserved
an extra word(16 bits) and a few more bits in the segment descriptors
of the 286.  In the 386 the first 16 bits go to extend the offset of
the segment to 32 bits (this is ignored, forced to 0, in the 286).
There is also a reserved bit elsewhere in the descriptor.  It 
becomes the granularity bit.  Hence, you can calculate the segment
size in bytes or pages.  There are extra 4 bits for segment size.

>I am told that XENIX V/286 runs on the Compaq 386 right out of the box
>so the cpu's must be very compatable at the binary level.
>
XENIX V/286 uses a subset of 386 segment desriptors, precisely the
286 subset (with all Intel reserved bits set to 0).

>Also there is another bit which says whether the segment size is in bytes
>or increments of (I think) 16 bytes, this is how you can have segments up
>to 1Mb now.
>
We can have segments of 1M units (16 bits from 286 and 4 new bits in 386 
= 20 bits total).  With granularity bit set to 0 (as in 286) this becomes
1Mb.  Otherwise, it is 1M pages or 1M * 4096b/page = 4Gb for full 32 bit
addressing.  This is very clever, since it allows to have the natural
memory protection mechanism (segmentation) sitting on top of paged
virtual memory.

>I believe there was a recent article in byte magazine on the 386.

I am posting this because I have not seen a single correct
description of 386 memory management mechanism on the net, yet.
However, I wrote this from memory without consulting my 386 book.

l-aron@obelix.UUCP (03/23/87)

	-- Qoutation --
In article <3810013@nucsrl.UUCP> ram@nucsrl.UUCP (Raman Renu) writes:
>     As the 32-bit CPU is everywhere, is the mini-class of machines 
>moving towards 64 bit?.  In a decade we have seen the migration from 
>4->8->16->32.  Would there be a 64-bit micro-processor soon?
>
>     Any predictions?  Do we really need 64 bit processing power?

	-- Discussion --
Why do computers have a certain "word length"?

1) To contain a machine code instruction.
There is no need for even 2^16 instructions in any computer.

2) To contain a binary address to its memory.
Since silicon RAMs tend to double in density (one more address bit per
chip) every 18 months, it would take 48 years to go from 32 to 64
address bits. Heavier use of virtual memory could make this go faster.

3) To contain "atomic" data like numbers.
64 bits is good for floats and will be (is?) used by the number-
crunching society. They probably have no need for 128 bits.

4) To contain some other relevant but "atomic" information.
Consider the 36 bit wide pdp-10. 36 bits is 6 bits (six-bit code)
times 6 characters (a file name or a Fortran identifier).
This motivates 64 or even 128 bit words, since 128 bits is 16
characters times 8 bits (extended ascii).

	-- Prediction --
Most certainly, we will see 64 bit and even 128 bit computers in say 10
and 15 years respectively, but I doubt it that 256 bit computers will
ever be commercially available. Instead, new and more parallel
architectures will be invented.

	-- New question --
I'd like to modify the original question:

Is there already, or will there ever (or soon) be a computer with
truely variable word length? I.e. not machines reading "enough many
bytes", but some kind of smart memory that is given orders like:

* On address ADDR, write word DATA, that is LEN bytes long.
* Read DATA at address ADDR+1 and tell LEN how long that data is.
-- 
Name:	Lars Aronsson
Snail:	Rydsvagen 256 A:10, S-582 48 Linkoping, Sweden
UUCP:	{mcvax,seismo}!enea!liuida!obelix!l-aron
ARPA:	l-aron@obelix.ida.liu.se

meissner@dg_rtp.UUCP (03/23/87)

In article <5954@amdahl.UUCP> chuck@amdahl.UUCP (Charles Simmons) writes:
> In article <3436@iuvax.UUCP> bobmon@iuvax.UUCP (Che' Flamingo) writes:
> >davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:
> >>
> >>etc. The 68020 and 80386 have enough power to run large
> >>businesses, schools, city and county government, etc.
> 
> Kinda makes you wonder why the Fortune 500 spends on the order of
> $5,000,000 for an IBM mainframe or Amdahl mainframe when 68020
> based workstations can be had for on the order of $50,000.

While the 68020, 80386 (and whatever the latest 32*32 National makes) are
fine in their niches, they are not up to the task of running a moderately
large business.  Yes the address space is reasonable, but that is not all
that goes into making a computer.  Ever see a DASD disk farm?  (a disk farm
is a roomful of large disk drives, and possibly a larger room behind it of
cartridges, tapes, etc -- these things store massize amounts of data).  The
thing that the big guys specialize in is I/O -- with intelligent controllers
[channels] offloading the main CPU.  I seriously doubt a 68020 workstation
could put 100 or so 500M disk drives on it (or if you could, what the response
time would be).  Data General is a supermini maker, and the largest disk
configuration we support is on the order of 15 gigabytes (it may be higher or
lower, but that is a ballpark figure).  That is a small compared to the large
mainframes (it also costs near $1M just for the disks, not to mention the
processor needed to support those disks).
-- 
	Michael Meissner, Data General	Uucp: ...mcnc!rti-sel!dg_rtp!meissner

It is 11pm, do you know what your sendmail and uucico are doing?

markp@valid.UUCP (03/23/87)

> [ Lots of good arguments for the use of 64-bit integers ]
> 
> To summarize, there are many current applications that use, or would
> like to use, 64-bit words.  When chip makers start building chips
> with 64-bit internal data paths, the chips will sell, and the 64-bit
> data paths will be useful.
> 
> -- Chuck

Nobody is arguing that there isn't a use for extended precision integers
and faster handling of double-precision floating point operands.  There's
nothing to prevent the implementation of a 64-bit type in any processor, be
it mainframe, mini, or microprocessor, and 64-bit operations can be executed
in a fraction over twice the time of comparable 32-bit operations (except
multiply, divide, etc.).  The harder question is: "Is a 64-bit datapath with
64-bit registers the best use of silicion area in order to attain higher
performance over a reasonable range of applications?" If you were faced with
a choice between going 64-bit and including an I-cache or a pipelined TLB,
there's little doubt that you would choose the latter (unless this machine
was going to do nothing but run double-precision Fortran programs all day).
When there're enough transistors to go around, then things get a bit more
interesting, and then you're looking at the tradeoff between increased logic
propagation delay from the deeper decoders, carry lookaheads, etc. vs. the
increased performance possible with single-cycle 64-bit operations.

In fact, the choice of 64 vs. 32-bit datapaths is a lot more complicated
than many people are willing to admit.  Application mix (which depends on
the market for the machine), chip area and yield considerations, on-chip
delays (both logic and length-related), board area available for buffers/
latches/wider caches, and connector density going off-board must all play an
important part in the decision for any given processor.  Computer architecture
is an N-dimensional task, where N is extremely large and all the dimensions
intersect in strange ways.

	Mark Papamarcos
	Valid Logic
	{ihnp4,hplabs}!pesnta!valid!markp

hank@masscomp.UUCP (03/23/87)

In message <5954@amdahl.UUCP> chuck@amdahl.UUCP (Charles Simmons) writes:
> 
> By the by...  Amdahl mainframes (and hence probably IBM mainframes
> as well) do support 64 bit integers.  
>
Well I haven't got a POO here just my trusty yellow card, the System
370 Reference Summary, but I don't remember any 64 bit integer
operations on system 370.  There are decimal operations that can go up
to 16 digits or 8 bytes, mabe more I never actually used them, but
integer operations are limited to 32 bits.
     What Chuck may be thinking about are the 64 bit long integers
supported by UTS C.  These things used to be called longs but when UTS
went public people were porting vax and pdp-11 UNIX applications to
UTS and the tendency of DEC programmers to treat longs and ints
interchangably led to problems when longs were 64 bits.  Personally I
liked having chars shorts ints and longs all different lengths but the
market prevailed and longs became long longs (Yecch!!).  I don't know
if this is still the case.  Whatever they are called the 64 bit longs
in UTS C are all implemented as compound operations utilizing multiple
machine registers and multiple instructions.
     The motivation for 64 bit longs in UTS C is simple, there are several
objects in the 370 world that are 64 bits long.  First and foremost is
the PSW, the Program Status Word for you non IBM types.  The PSW
contains the program counter and condition code bits among other
things.  The Channel Command Word or CCW and Channel Status Word or
CSW are also 64 bit objects that the operating system must be able to
manipulate easily.  Bit field operations on 64 bit quantities are
particularly important in developing an operating system for system
370.  One should be able to define fields that span the 32 bit boundry
without worrying about alignment.  

	Hank Cohen	masscomp!hank
	MASSCOMP	7315 Wisconsin Ave. Suite 1245W 
	Bethesda Md. 	20814	(301) 657-9855

davidsen@steinmetz.UUCP (03/23/87)

One of the things which is not always present on 64 bit machines
is byte addressing. The Cray2 is very fast on some things, but
character manipulation is VERY expensive, and hardly worth
moving off a cheaper system.

Depending on who's benchmarks you use, the Cray2 is 10-30 times
faster than an IBM 3090, but seems only about 2x faster for
character manipulation. The Cray I access doesn't have nroff, so
I can't really give hard numbers. An IBM 3081 times out about
10x a VAX for nroff.

Disclamer: numbers are quoted roughly from informal benchmarks.
-- 
bill davidsen			sixhub \
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)

ark@alice.UUCP (03/24/87)

In article <1500@masscomp.UUCP>, hank@masscomp.UUCP (Hank Cohen) writes:
> Well I haven't got a POO here just my trusty yellow card, the System
> 370 Reference Summary, but I don't remember any 64 bit integer
> operations on system 370.  There are decimal operations that can go up
> to 16 digits or 8 bytes, mabe more I never actually used them, but
> integer operations are limited to 32 bits.

True, more or less.  There are 64-bit shift instructions.  Moreover,
it is very easy to simulate a 64-bit add or subtract (3 instructions).
Multiply and divide are a good deal harder.

In general, the 370 architecture supports 64-bit arithmetic
about as well as the 68000 supports 32-bit arithmetic.

gil@pajaro.UUCP (03/24/87)

In article 645 Lars Aronsson asks:
> Is there already ... a computer with truely variable word length?

The Fairchild SymbolIIR (R for research) computer, circa 1970, was VFL with
hardware supported array structured memory having no limitations on
manipulation. The size or type of data did not have to be specified in the 
instruction since they were contained in the data fields. As one might
suspect, the primary drawback was speed when compared with conventional
fixed field length machines.

Gil Chesley{usual disclaimers}

kent@xanth.UUCP (03/24/87)

In article <1498@dg_rtp.UUCP> meissner@dg_rtp.UUCP (Michael Meissner) writes:
[ lots of recursive excerpting omitted ]
>While the 68020, 80386 (and whatever the latest 32*32 National makes) are
>fine in their niches, they are not up to the task of running a moderately
>large business. [ ... ]  I seriously doubt a 68020 workstation
>could put 100 or so 500M disk drives on it (or if you could, what the response
>time would be).
>	Michael Meissner, Data General	Uucp: ...mcnc!rti-sel!dg_rtp!meissner

I'm not sure it is fair to blame the processors here.  At least the IBM (tm)
architectures put a processor per i/o channel.  68000's are easily cheap
enough now to do that.

More important, if mass RAM or hard disk is used as a staging device type
secondary memory, and most of the data stored is on tertiary memory, LP sized
video disks, last I looked, could store 12 Gbytes per side, within a factor
of 4 of your whole disk farm.  Rewriteable video disks are working in the lab
(Sony says) and expected out (if I remember right) within 15 months or so.
Consumer grade read-only video disk drives are being shown now at shows with
prices in the $5000 range, within reach of the well to do computer hobbiest.

Thus it is not unreasonable to expect a 68020 or equivalent based system,
with, say, 100 Mbyte of hard disk and 50 Gbyte of video disk to be available
for home use in two or three years, for less than the price of a good new car.
This would be the equivalent of most governmental bodies computing resources
of 5 years ago.  Such a system could probably be put together now for about
for times the price (per delivered unit in quantity) as a government funded
R&D effort.

The (to me) unrealistic part of your retort was in assuming that the disk farm
has long to live.  My impression is that it is being rapidly overtaken by new
technology, and will soon be superseded by something with a much lower
cost-to-maintain.  The online, staged, tertiary storage concept has been proved
by several hardware releases with corresponding operating system functionality;
IBM's cartridge tape system comes to mind at once, and BBN's terabyte memory
was another attempt in this direction.  Then again, we may see fast access
digital video disk someday soon.

For the real data hogs, RCA or Phillips, I forget which, was working on a 100
video disk "jukebox" for defense data storage back in the late 1970's.  I
don't know what became of it, but with a times 4 density increase from current
best, that would be 5 Tbytes in the space of a large disk drive.  Yum.  ;-)
--
Kent Paul Dolan, "The Contradictor", 25 years as a programmer, CS MS Student
at ODU, Norfolk, Virginia, to find out how I was supposed to be doing this
stuff all these years.  3D dynamic motion graphics a specialty.  Work wanted.

Unemployment is soooo nice though...I never have to disclaim anything!

UUCP  :  kent@xanth.UUCP   or    ...seismo!decuac!edison!xanth!kent
CSNET :  kent@odu.csnet    ARPA  :  kent@xanth.cs.odu.edu
Voice :  (804) 587-7760    USnail:  P.O. Box 1559, Norfolk, Va 23501-1559
Wisdom:  "Peace in mankind's lifetime.  Why leave a whole universe unexplored?"

bjorn@alberta.UUCP (03/25/87)

In article <2610@phri.UUCP>, roy@phri.UUCP (Roy Smith) writes:
> 	A question: I remember reading somewhere that the top 2 bits in a
> VAX address signify user/kernel and data/text segments, which really leaves
> you with "only" a 30-bit address space.  Is that a hardware feature or just
> the way VMS sets up the memory map?

No, there is no direct connection between processor mode and
and addresses.  There is a practical distinction between
(data/text) vs. stack "segments" though.

Half of the address space is reserved to the system and is
shared by all processes, ie. the system is in the virtual
address space of every single process on a Vax.  Conversely
when the system receives an interrupt/exception the address
map is set to the current process and consequently the system
is operating in the context of the current process, a major
feature which does not exist on any memory managed processor
that I'm intimately familiar with.

The half of the address space that is reserved to the current
process is split into two regions (program and control), the
program region as the name suggests usually holds your code
and data while the control region is used to hold the stack.
Each of these regions has a 30 bit virtual range and they grow
in opposite directions.  If the Vax had a two level paging
scheme there would be no real reason for making this region
distinction as it would become much cheaper to map a sparse
address space.  However as things stand there are separate page
tables for each of the regions, that map one (virtually) contiguous
chunk of each.  All this just so you don't have to have page tables
covering the "gap" between the program and control regions.

DEC could design a Vax with a multi-level paging scheme without
impacting user programs, even if memory management is described
in the "VAX Architecture Handbook" B-).  Can anyone that has the
"VAX Architecture Reference Manual" handy, tell me if it explicitly
defines the regions and the single level paging store?

			Bjorn R. Bjornsson
			alberta!bjorn

aeusesef@csun.UUCP (03/25/87)

In article <1498@dg_rtp.UUCP>, meissner@dg_rtp.UUCP (Michael Meissner) writes:
  In article <5954@amdahl.UUCP> chuck@amdahl.UUCP (Charles Simmons) writes:
  > In article <3436@iuvax.UUCP> bobmon@iuvax.UUCP (Che' Flamingo) writes:
  > >davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:
  > >>
  > >>etc. The 68020 and 80386 have enough power to run large
  > >>businesses, schools, city and county government, etc.
  > 
  > Kinda makes you wonder why the Fortune 500 spends on the order of
  > $5,000,000 for an IBM mainframe or Amdahl mainframe when 68020
  > based workstations can be had for on the order of $50,000.
  
  While the 68020, 80386 (and whatever the latest 32*32 National makes) are
  fine in their niches, they are not up to the task of running a moderately
  large business.  Yes the address space is reasonable, but that is not all
  that goes into making a computer.  Ever see a DASD disk farm?  (a disk farm
  is a roomful of large disk drives, and possibly a larger room behind it of
  cartridges, tapes, etc -- these things store massize amounts of data).  The
  thing that the big guys specialize in is I/O -- with intelligent controllers
  [channels] offloading the main CPU.  I seriously doubt a 68020 workstation
  could put 100 or so 500M disk drives on it (or if you could, what the response
  time would be).  Data General is a supermini maker, and the largest disk
  configuration we support is on the order of 15 gigabytes (it may be higher or
  lower, but that is a ballpark figure).  That is a small compared to the large
  mainframes (it also costs near $1M just for the disks, not to mention the
  processor needed to support those disks).

Well, a good example of what Michael is saying is the pre-180 level Cybers.
These are *very* quick machines (a low level one can get something like 3 or
4 MegaFlops), but the CPU does almost no I/O at all.  There is, outside the
computer, somewhere between 4 and 20 PPU's (Peripheral Processer Units),
which are true computers themselves, which handle all of the I/O.  All the
computer does is put a call in its memory space (usually, probably even
always, location 1), and then the PPU sees it, and away it goes.  If you
need to wait for the data (which you usually do), you force a context switch
(save all the registers in a relatively short period of time, load the
registers from the last time System allowed a job to run, let it decide what
to do), and next time you run, you have your data.
Only saving grace on a machine running NOS....

 -----

 Sean Eric Fagan		    ------\
 Computer Center 		    litvax \
 Cal State University, Northridge   rdlvax  \
 18111 Nordhoff St.  		    psivax   --> !csun!aeusesef
 Northridge, CA  91330 		    hplabs  /
 AGTLSEF@CALSTATE.BITNET	    ihnp4  /
				    ------/
   "I drank what?!" -- Socrates  |  My opinions *are* facts

mac@uvacs.UUCP (03/25/87)

> >could put 100 or so 500M disk drives on it

> The (to me) unrealistic part of your retort was in assuming that the disk
> farm has long to live.

Probably not.

Many systems encourage the use of user-owned removeable disk packs, each of
which may have something like a megabyte of real data.  Think of tapes.

Such systems often wind up with dozens of drives, just to avoid frequent
mounting & dismounting of packs.  Think tape drives.

guy@gorodish.UUCP (03/26/87)

>Conversely when the system receives an interrupt/exception the address
>map is set to the current process and consequently the system
>is operating in the context of the current process, a major
>feature which does not exist on any memory managed processor
>that I'm intimately familiar with.

I'm not sure what you mean.

I presume by "...when the system receives an interrupt/exception the
address map is set to the current process..."  that you mean "when
the interrupt/exception routine is running, the 'user' half of the
address space is mapped in where you think it is", and not "when an
exception occurs the system maps the current process in"?  VAXes
don't set the address map to map in the current process; they don't
have to.

I don't know what "memory managed processor" means here.  Sun-3s
certainly have memory management hardware, although the MMU is not on
the chip, and the current process' address space is mapped in "where
you think it is"; i.e., the kernel can (and does) copy data from or
to the user's address space by using the pointer the user provides,
without any relocation.  (This is not the case on a Sun-2.)

The Motorola 68851 MMU permits you to run with the same address space
in supervisor and user mode; in such a system, presumably the
exception or interrupt handler would have the current process'
context.  According to the data sheet I have for the NS16082 (*sic* -
it's an old data sheet), it also permits you to map memory this way.

On most processors I'm familiar with, interrupt and exception
handlers run with the current process' address space mapped in
(unless the OS deliberately arranges to do things differently).
Interrupt handlers, though, usually don't care (they have no business
monkeying with whatever random process happened to be running when
the interrupt occurred); exception handlers normally do care.  What
processors are you familiar with?  Why don't interrupt or exception
handler run "in the context of" the current process on those
processors?  (Is this a function of the processor or the OS?)

>Can anyone that has the "VAX Architecture Reference Manual" handy, tell
>me if it explicitly defines the regions and the single level paging store?

It does - or, at least, the 20 May 1982 edition (Revision 6.1) does.
It explicitly indicates how addresses are translated.

davidsen@steinmetz.UUCP (03/27/87)

In article <1500@masscomp.UUCP> hank@masscomp.UUCP (Hank Cohen) writes:
% .....
%     What Chuck may be thinking about are the 64 bit long integers
%supported by UTS C.  These things used to be called longs but when UTS
%went public people were porting vax and pdp-11 UNIX applications to
%UTS and the tendency of DEC programmers to treat longs and ints
%interchangably led to problems when longs were 64 bits.  Personally I
%liked having chars shorts ints and longs all different lengths but the
%market prevailed and longs became long longs (Yecch!!).  I don't know

Unless the X3J11 standard has changed since I left the
committee, there was a beastie called a "long double" in the
language, at the request of Cray Research. I ran some stuff on a
Cray, and having long, short, and int all the same size is a
good portability test.
-- 
bill davidsen			sixhub \
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)

davidsen@steinmetz.UUCP (03/27/87)

In article <1498@dg_rtp.UUCP> meissner@dg_rtp.UUCP (Michael Meissner) writes:
>In article <5954@amdahl.UUCP> chuck@amdahl.UUCP (Charles Simmons) writes:
>> In article <3436@iuvax.UUCP> bobmon@iuvax.UUCP (Che' Flamingo) writes:
>> >davidsen@kbsvax.steinmetz.UUCP (William E. Davidsen Jr) writes:
>> >>
>> >>etc. The 68020 and 80386 have enough power to run large
>> >>businesses, schools, city and county government, etc.
>> 
>> Kinda makes you wonder why the Fortune 500 spends on the order of
>> $5,000,000 for an IBM mainframe or Amdahl mainframe when 68020
>> based workstations can be had for on the order of $50,000.
>
>While the 68020, 80386 (and whatever the latest 32*32 National makes) are
>fine in their niches, they are not up to the task of running a moderately
>large business.  Yes the address space is reasonable, but that is not all

You're missing the point. All of the objections you have made
have nothing to do with CPU power. Obviously any machine
handling a large number of tasks at once, and/or many users,
disk, etc, will have to have a wide bus and i/o processors. If
you take an IBM 3090 and make it handle all the interrupts
itself I suspect it would run about as fast as a 32 micro.

A number of manufacturers are addressing this problem, and I
believe more will do so as the market opens for super micros in
larger environments. Unfortunately these things add cost, so I
would expect a 30-60% increase in the actual box, but not the
peripherals.

A case in point is an AT running Xenix. It may have plenty of
power to handle 8 users, but it becomes a total pig when loaded.
Replace the cheap serial card with one which has a processor and
the system is now as responsive as a VAX 11/780 with 40 users.
If you hate the 80*86 chip, replace with your favorite, the
example still works, as see the Plexus line, which adds i/o
processors to it's larger machines, rather than using a faster
CPU.

-- 
bill davidsen			sixhub \
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)

davidsen@steinmetz.UUCP (03/27/87)

In article <15525@sun.uucp> gil%pajaro@Sun.COM (Gilman Chesley) writes:
>
>In article 645 Lars Aronsson asks:
>> Is there already ... a computer with truely variable word length?
>
>The Fairchild SymbolIIR (R for research) computer, circa 1970, was VFL with
>hardware supported array structured memory having no limitations on
>manipulation. The size or type of data did not have to be specified in the 

I believe that the Intel 432 was VFL *and* was bit addressable. I have a 432
manual home, and can check over the weekend.

-- 
bill davidsen			sixhub \
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)

meissner@dg_rtp.UUCP (03/31/87)

In article <731@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
> The (to me) unrealistic part of your retort was in assuming that the disk farm
> has long to live.  My impression is that it is being rapidly overtaken by new
> technology, and will soon be superseded by something with a much lower
> cost-to-maintain.  The online, staged, tertiary storage concept has been proved
> by several hardware releases with corresponding operating system functionality;
> IBM's cartridge tape system comes to mind at once, and BBN's terabyte memory
> was another attempt in this direction.  Then again, we may see fast access
> digital video disk someday soon.

I think analogues of the disk farm will always be with us.  Yes at any
given time you will have available on a "small" system as much storage
as you previously had on a large system.  That will only encourage the
large systems to add even more data to the disk farm.  There never seems
to be enough { memory, cpu cycles, etc. }
-- 
	Michael Meissner, Data General	Uucp: ...mcnc!rti-sel!dg_rtp!meissner

It is 11pm, do you know what your sendmail and uucico are doing?

henry@utzoo.UUCP (Henry Spencer) (03/31/87)

> ...If you take an IBM 3090 and make it handle all the interrupts
> itself I suspect it would run about as fast as a 32 micro.

More like an 8-bit micro!  Unless IBM has smuggled in some improvements
in their newer machines, the 360/370/... interrupt system is dreadful.
("Vectors?  What are they?")
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

johnl@ima.UUCP (04/01/87)

In article <2610@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>In article <985@rpics.RPI.EDU> yerazuws@rpics.RPI.EDU (Crah) writes:
>> A VAX has 32 bits- so if we assume (*) that all 32 can be used as memory
>> address, a VAX (or other 32-bit processor) can have AT MOST 4 GIG of memory.
>... Anybody willing to guess when we'll see 48 bit physical addresses on 32
> bit machines?  Anybody for demand paging and bank switching at the same time?

It's already happening.  The ROSETTA memory manager chip in the IBM PC RT
manages an address space that is 43 bits.  It manages 28-bit segments, of
which you can potentially have 32K.  (Actually, although the architecture
allows for 15 bit segment numbers, the current implementation limits you to
12, making your addresses 40 bits.)  The high 4 bits of a 32-bit address are
mapped to a 15 bit segment number using a fast 16-bit lookup which, now that
you mention it, is not unlike bank switching.  The current implementations of
the processor only support normal amounts of memory, e.g. 8MB, but you could
load it up  to a terabyte and user programs would only see fewer page faults.

The 370 XA architecture used in the 3090 series defines a funny kind of
memory for paging which is addressable in 4K chunks, and the only thing you
can do with it is to blat a chunk into or out of a 4K chunk of regular memory.
Think of that as a brute-force approach to bank switching given that they
couldn't build fast enough address decoders to switch banks the usual way.
-- 
John R. Levine, Javelin Software Corp., Cambridge MA +1 617 494 1400
{ ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.something
Where is Richard Nixon now that we need him?

mouse@mcgill-vision.UUCP (04/01/87)

In article <1310@ucbcad.berkeley.edu>, faustus@ucbcad.berkeley.edu (Wayne A. Christopher) writes:
> I don't know if this is the right question to ask....  [...] the most
> one process could address is 4G of virtual memory.  In the case of
> the VAX, 1/4 of this is "regular" data and text space, 1/4 is stack
> space, and the other half is system space.

Not quite.  1/4 is text+data (0x00000000-0x3fffffff, or P0 space), 1/4
is stack and assorted system trash (on UNIX, the u struct and the
process kernel stack live here) (0x40000000-0x7fffffff, or P1 space),
1/4 is normally used by the kernel (0x80000000-0xbfffffff, or system
space), and 1/4 is reserved (0xc0000000-0xffffffff).  Only the low half
(P0 and P1) is per-process; the other half (system and reserved) is
common to all processes.  The virtual memory hardware (firmware)
enforces this much of a distinction (though it doesn't enforce things
like system space being inaccessible to user-mode code; that's up to
the system software to set up).

> I certainly haven't been running into the 1G limit too often lately.

Neither have I :-).  But then, I haven't been doing image processing or
complicated spectrum analysis....

					der Mouse

Smart mailers: mouse@mcgill-vision.uucp
USA: {ihnp4,decvax,akgua,utzoo,etc}!utcsri!musocs!mcgill-vision!mouse
     think!mosart!mcgill-vision!mouse
ARPAnet: think!mosart!mcgill-vision!mouse@harvard.harvard.edu

stubbs@ncr-sd.UUCP (04/03/87)

In article <5954@amdahl.UUCP> chuck@amdahl.UUCP (Charles Simmons) writes:
>16 bits is clearly not sufficient for most things people do.  Consider
>
>hundred thousand dollars a year.  It would be nice, when generating
>reports describing where the money came from and where it went, to
>store these figures as fixed point numbers.  But they certainly won't
>fit in 16 bit integers.
>

Integer size and field size need not be restricted by hardware word
size. My NCR 9800 has up to 256 byte integer arithmetic (add sub, etc.)
implemented in one machine instruction. Of course since it is
a 32 bit word machine this takes multiple cycles, but firmware
does the work. 

Maybe the real concern is performance. The unit
of memory access is selected based on many factors, including
average instruction size, register size, address size, average
data operand size, and today, how many pins you can fit 
on your chip package. Integer size is only one (important)
factor. Another important factor is what all the support
hardware uses, and today 32 bits is in, and will be
for a long long time.


Jan Stubbs    ....sdcsvax!ncr-sd!stubbs
619 485-3052
NCR Corporation
16550 W. Bernardo Drive MS4010
San Diego, CA. 92127

henry@utzoo.UUCP (Henry Spencer) (04/04/87)

> It's already happening.  The ROSETTA memory manager chip in the IBM PC RT
> manages an address space that is 43 bits...

Yes and no, mostly no.  The 43-bit (or 40-bit) address is mostly IBM bull.
The fact is, the instructions use 32-bit addresses.  Not 40, not 43, but 32.
The top 4 bits are used to pick out a segment, which after some slightly
unusual fiddling ends up determining the physical address.  If, repeat if,
you can convince the operating system to change segments for you, you can
of course access more by changing which segments you have in your address
space.  IBM claiming that this is 40-bit addressing is utter bilge.  By
the same standard, most any decent paged machine has INFINITE-bit addresses,
since you can get access to arbitrary amounts of data by having the operating
system change your page tables suitably.
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

rgs@sdiris1.UUCP (04/05/87)

From what I understand of this discussion, it breaks down as following:

64-bit advantages
o  Floating Point performance
o  Large, segmentable address space

32-bit advantages
o  Lower cost
o  Less memory wasted on integers

Other than in passing, however, I haven't noticed anyone mentioning
raw architectual performance. It seems to me that in this area the
64-bit machine has a distinct advantage.

Very few instructions are actually 64 bits in length, so multiple
instructions get packed into a single word. This means that in one memory
fetch (on a cache miss) up to 4 instructions are pulled into cache. This
should significantly improve cache performance, I would think. Likewise,
the hit rate for data cache would improve for sequentially accessed
data structures.

As it turns out tho, this has nothing to do with the CPU architecture,
but with the memory-cache interface. In fact, I know of at least one
32-bit mini computer that has a 64-bit cache to memory bus (Data General).

It would interesting to see what this change would do to simulated
performance of a machine (anyone with a good simulator want to try it out?).
What you would have would be a hybrid machine. One with a 32 bit cpu
and a 64-bit memory bus. This would be fairly simple to build if you
already have an outboard cache. I don't happen to know if the currently
popular busses (like VME) would conveniently allow a 64 bit path.

This would give you the cost and memory/integer advantage of the 32-bit CPU,
along with the performance advantage of the 64-bit machine. Additionally,
the floating point hardware now has 1 memory cycle access to a 64-bit
floating point on a cache miss, assuming you put a 64-bit path between
cache and the FP hardware (which you should have anyway).

Currently, even some 64-bit machines use larger memory data paths. One
machine I know which does this is the ETA (and it's predecessor, the
Cyber 205). This architecture has what is called a Super-word, or
sword for short. This is a 512 or 1024 bit memory access (at least last I
heard, they keep making bigger swords to improve performance). In this
case I believe it's used to improve the vector pipeline performance by
bringing in several vector elements on a memory access.

This does add an interesting twist to optimizing compilers. It would
improve program performance to have code segments start on a sword boundary.
An obvious thing would be to place all subroutine entries at a boundary.
However, any major branch-to location would probably also benifit. This
would mean that from 48 to 0 bits of the previous sword would be wasted (if
the code falls thru), so it may be worth it to define a fast "next sword"
instruction to skip the 2 or 3 NOPs you'ld have to do. Data structures may
also benifit be being sword aligned (especially if the structure is the
same size as a single sword). In this case, 64-bit floats should be sword
aligned.

The use of swords has the biggest benifit with "cache buster" types of
programs. Anything that jumps all over a large data structure would
probably benifit. Also, highly complex programs that repeat code sequences
rarely (I know our CAD program fits this class) would also benifit.
Basically, any program which gets a low cache rate currently, but which does
resonably sequential access to data and program, would now get at least a
50% cache hit. In these cases, the larger the sword (128 bits, 256 bits)
the higher the cache hit rate.

The optimal sword size is probably related to CPU speed. Assume the following
"cache buster", a vector add:
vadd(size,a,b,c)
   int size;
   int a[],b[],c[];
{
   while (--size)
      c[size] = a[size] + b[size];
}
The loop (probably) compiles to something like:
       lda  D0,@size
       lda  A0,a
       lda  A1,b
       lda  A2,c
       lda  D4,"-1"
loop:  add  D0,D4,D0    ; size -= 1
       tst  D0
       beq  fini
       add  A0,D0,A3    ; A3=A0+D0
       lda  D1,@A3
       add  A1,D0,A4    ; A4=A0+D0
       lda  D2,@A4
       add  A2,D0,A5    ; A5=A0+D0
       add  D1,D2,D3    ; D3=D1+D2
       sta  D3,@A5
       jmp  loop
fini:  ...
This does 3 memory references for 11 instructions (without using any
addressing mode, most CPUs will use fewer instructions). In this example,
if the CPU can execute 2 instructions or more in the time it takes to
access one word of external memory it's probably going to have to wait.
In this case, the larger the sword the better the cache hit rate. Coded
this was it'll do 3 memory access every other loop.
Recoding as follows:
vadd(size,a,b)
   int size;
   int a[2][],c[];
{
   while (--size)
      c[size] = a[0][size] + a[1][size];
}
Really takes benefit from the sword . As soon as a[0][size] is accessed,
a[1] is loaded into cache. Coded this way there is 1 memory access on
odd passes and 2 on even passes. This spreds out the memory access out
better. The faster the CPU (and probably the larger the data structures) then
the larger the sword should be.

All of this is off the cuff, so I really haven't had time to actually work
out some timings on paper. I really would be interested to hear if anyone
has done some simulations (or has first hand experience with this on
32 bit machines).
-- 
UUCP: ...!sdcsvax!jack!man!sdiris1!rgs |  Rusty Sanders
Work : +1 619 450 6518                 |  Control Data Corporation (CIM)
                                       |  4455 Eastgate Mall, 
Insert standard disclaimers here.      |  San Diego, CA  92121

chris@mimsy.UUCP (04/05/87)

In article <563@sdiris1.UUCP> rgs@sdiris1.UUCP (Rusty Sanders) writes:
>... In fact, I know of at least one 32-bit mini computer that has
>a 64-bit cache to memory bus (Data General).

The Vax 11/780 has a 64 bit backplane (the SBI) between its cache
and its memory.

>This does add an interesting twist to optimizing compilers. It would
>improve program performance to have code segments start on a [superword]
>boundary.  An obvious thing would be to place all subroutine entries
>at a boundary.

The Unix Vax assembler has a `.align' directive for such purposes,
but the compiler emits only `.align 1's, which align to 2**1 bytes or
16 bit boundaries---probably because the first thing at each routine
is a short word containing a register save mask (and a few other bits
that are essentially never set anyway).

>The use of swords has the biggest benifit with "cache buster" types of
>programs.

... provided such programs were written carefully.  Similar to the
`cache buster' is the `VM buster': a program with multidimensional
arrays where the fastest-varying subscript is varied the slowest
(if that makes sense: if not, there is an example below).

>... Recoding as follows:
>vadd(size,a,b)
>   int size;
>   int a[2][],c[];
>{
>   while (--size)
>      c[size] = a[0][size] + a[1][size];
>}

This looks like the coder was `thinking FORTRAN and writing C', which
is often a performance disaster (as is `thinking C and writing FORTRAN').
Aside from the nits:

>As soon as a[0][size] is accessed, a[1] is loaded into cache.

Since the last subscript varies fastest in C, as soon as a[0][size]
is accessed, a[0][size+1] is cached.

A C matrix add loop should read

	for (i = 0; i < size1; i++)
		for (j = 0; j < size2; j++)
			c[i][j] = a[i][j] + b[i][j];
	/* and of course you can optimise with */
	/* pointers, if it really comes to that. */

while the Ratfor loop should read

	for (j = 1; j <= size; j = j + 1)
		for (i = 1; i <= size; i = i + 1)
			c(i, j) = a(i, j) + b(i, j)

(subscripts *do* start at one in FORTRAN?).  Reversing the loops
can have terrible effects on performance, due to cache effects (as
described above) and due to `unexpected' VM behaviour (the scattered
array references cause excessive page faults).

(For the nit-p..., er, record, most likely what was meant was

	vadd(size, a, c)
		register int size;
		register int a[][2], c[];
	/* or	register int (*a)[2], *c; */
	{

		while (--size >= 0)
			c[size] = a[size][0] + a[size][1];
	}
)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
UUCP:	seismo!mimsy!chris	ARPA/CSNet:	chris@mimsy.umd.edu

baum@apple.UUCP (04/06/87)

--------
[]

>> ...If you take an IBM 3090 and make it handle all the interrupts
>> itself I suspect it would run about as fast as a 32 micro.
>
>More like an 8-bit micro!  Unless IBM has smuggled in some improvements
>in their newer machines, the 360/370/... interrupt system is dreadful.
>("Vectors?  What are they?")

Interestingly enough, the channel processors on a 3090 are reputed to be the
infamous 801 processor. It would not be out of the question to see LARGE pieces
of the interrupt handling code to 'migrate' in that direction, making it yet
harder for the clones to follow suit.

--
{decwrl,hplabs,ihnp4}!nsc!apple!baum		(408)973-3385

henry@utzoo.UUCP (Henry Spencer) (04/12/87)

> What you would have would be a hybrid machine. One with a 32 bit cpu
> and a 64-bit memory bus. This would be fairly simple to build if you
> already have an outboard cache...

My recollection is that this is exactly what the Sun-3 "200 series" does.
-- 
"We must choose: the stars or	Henry Spencer @ U of Toronto Zoology
the dust.  Which shall it be?"	{allegra,ihnp4,decvax,pyramid}!utzoo!henry

rbl@nitrex.UUCP ( Dr. Robin Lake ) (04/13/87)

In article <731@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>In article <1498@dg_rtp.UUCP> meissner@dg_rtp.UUCP (Michael Meissner) writes:
>[ lots of recursive excerpting omitted ]
>> ...
>
>
>For the real data hogs, RCA or Phillips, I forget which, was working on a 100
>video disk "jukebox" for defense data storage back in the late 1970's.  I
>don't know what became of it, but with a times 4 density increase from current
>best, that would be 5 Tbytes in the space of a large disk drive.  Yum.  ;-)
>--

Integrated Automation in Alameda, CA has a 100-disk/200 Gb optical disk juke
box.  I saw it working 2 years ago.  Don't know where they're at with it now.

Rob Lake

root@hobbes.UUCP (John Plocher) (04/18/87)

+---- Robin Lake writes the following in article <449@nitrex.UUCP> ----
| >best, that would be 5 Tbytes in the space of a large disk drive.  Yum.  ;-)
| 
| Integrated Automation in Alameda, CA has a 100-disk/200 Gb optical disk juke
+----

( Paraphrased from an article in the Capital Times, March 1987)

At the Physical Sciences Lab at the Univ of Wisconsin there is a system
which is called the  Optical Archival System.  Billed as the largest 
computer 'memory' in the world, the system uses more than 1,000 optical disks
and a robot disk handler to give 2 Tera Bytes of storage.  (For comparison
shopping, note that this is equivalent to 40,000,000 9-Track tapes,
4,000,000,000 Floppy Disks, a stack of paper 3 times the height of Mt. Everest,
or 6 (six) copies of the 1980 U.S. Census.  Your milage may vary. Customer
liable for all state and local sales tax.)  Access time is less than 1 minute
maximum for any platter.   The system should be able to be expanded by as
much as eight times, if they can find out what to do with all that data...


-- 
 John Plocher		UUCP: <backbone>!uwvax!uwmacc!hobbes!plocher
==============      Internet: plocher%hobbes.UUCP@uwvax.WISC.EDU
FidoNet: 121/0	      BITNET: uwvax!uwmacc!hobbes!plocher@at th a

papowell@umn-cs.UUCP (Patrick Powell) (04/19/87)

In article <109@hobbes.UUCP> root@hobbes.UUCP (John Plocher) writes:
>+---- Robin Lake writes the following in article <449@nitrex.UUCP> ----
>| >best, that would be 5 Tbytes in the space of a large disk drive.  Yum.  ;-)
>+----
>( Paraphrased from an article in the Capital Times, March 1987)
>At the Physical Sciences Lab at the Univ of Wisconsin there is a system
>... uses more than 1,000 optical disks and a robot disk handler to give
>2 Tera Bytes of storage. ...Note that this is equivalent to 40,000,000
9-Track tapes
>  Access time is less than 1 minute maximum for any platter.
>  The system should be able to be expanded by as
> much as eight times, if they can find out what to do with all that data...
> John Plocher		UUCP: <backbone>!uwvax!uwmacc!hobbes!plocher
>==============      Internet: plocher%hobbes.UUCP@uwvax.WISC.EDU
>FidoNet: 121/0	      BITNET: uwvax!uwmacc!hobbes!plocher@psuvax1

This is not much data.  A quick, back of the terminal (no keypunch cards
left),  indicate that this is about the amount of information needed to record
a years transactions on the stock market.  Anybody want to try and estimate
the number of Interbank transfers in a week?  I know a small bank that uses
over 10,000 tapes a year just for transaction logging alone.  And as I recall,
there are over 12,000 banking centers in the U.S.A.

Patrick ("Million?  Just a 15 meter by 15 meter square of M and M's") Powell
P.S.= Kid set this up for a Science Fair.  He had a hell of a good time
giving them away.  Sign over the exhibit was:  can you eat a million?
-- 
Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

desj@lime.BERKELEY.EDU (David desJardins) (04/20/87)

In article <109@hobbes.UUCP> root@hobbes.UUCP (John Plocher) writes:
>Billed as the largest computer 'memory' in the world, the system uses more
>than 1,000 optical disks and a robot disk handler to give 2 Tera Bytes of
>storage.  (For comparison shopping, note that this is equivalent to ...
>4,000,000,000 Floppy Disks ...

   Come on now.  Please at least think about what you are saying before
you post it to the whole world.  If 2 x 10^12 bytes were the equal of 4 x 10^9
floppy disks, then that would mean 500 bytes on each disk.  Somehow I think
our technology has advanced slightly beyond that.

   -- David desese>

phil@osiris.UUCP (Philip Kos) (04/20/87)

In article <109@hobbes.UUCP>, root@hobbes.UUCP (John Plocher) writes:
> ( Paraphrased from an article in the Capital Times, March 1987)
> .... 
> 2 Tera Bytes of storage.  (For comparison
> shopping, note that this is equivalent to 40,000,000 9-Track tapes,
> 4,000,000,000 Floppy Disks, a stack of paper ....

I can't speak for the stack of paper, but according to my calculations,
2T/40,000,000 is about 54976 and 2T/4,000,000,000 is about 550.  Now, if
they're talking *paper* tape, then maybe I can figure the 54976, but hell,
even an RX01 holds more than 550 bytes!!  Either I'm off by an order of
magnitude (or three) or somebody else is.

John, did you seriously mistype these numbers or is this really what was
in the article?  I could say something here about using marketing numbers
without independent verification, but I had probably better shut up... :-)

-- 
                              ...!decvax!decuac -
Phil Kos                                          \
The Johns Hopkins Hospital    ...!seismo!mimsy  - -> !aplcen!osiris!phil
Baltimore, MD                                     /
                              ...!allegra!mimsy -

"And if you want to put your feet in a toilet, remember, what's a
democracy for?"  - Paul Krassner

lamaster@pioneer.arpa (Hugh LaMaster) (04/20/87)

Two items:

The April 1987 Anderson Report describes IBM's new announcement of production
4 Mbit chips.  It quotes a report by the California Technology Stock Letter
which claims that by 1994 small and medium magnetic disk drives will be
obsolete.  I know that we have heard it before, but in any case, the progress
in high density memory is not slowing down yet (I'm sure everyone has heard
that 16Mbit prototypes now exist also).  The technology necessary for much
larger memory systems is now in the pipeline.  My guess is:  you can
expect 64MB workstations within 4 years and 256MB systems within 8 years.

The second item:  National Advanced Systems has now announced 2Gigabyte
physical memory systems (the limit of 32 bit 370/XA).  This amount of
memory is consistent with the systems now being shipped with performance on the
order of 100MIPS (whatever a MIP is).  Amdahl has 1GB systems and IBM 512 MB
systems today.  If a 1 MIP system a few years ago needed 16MBytes of virtual
and 4 MBytes of physical memory (my own figures), a 100MIP system needs about
2GB of virtual and 512 MBytes of physical memory.  Therefore, a 4 processor
system with 4 100MIP CPU's (this is my estimate for the coming generation of
high end systems) will need 8 GB of virtual and 2GB physical memory; this is
more than the limit of 32 bit addressing processors.

I think we have settled the question earlier of whether everyday applications
exist for lots of memory and address space (high quality graphics alone is
sufficient in many cases, even without another application to display).  Now,
we see that the technology is here to exceed 32 bit addressing.  Recent
experience has shown that architecturally, "small" systems like workstations
have come to resemble their larger cousins more and more (real operating
systems, virtual memory, multiprogramming, etc. etc.)

Therefore,  I expect to see the first 64 bit linear addressed microprocessor
about 4 years from now.  I hope I'm not disappointed :-)  Naturally, the
machine will have 64 bit registers, with both 32 bit and 64 bit integer and
floating point formats supported with hardware on chip. The first versions
will probably have a 64 bit bus/data path, but within a few years expect to
see 256 bit data paths.  Within 8 years from now, we should expect to see a 1
million gate load/store version with a vector instruction set and segmented
arithmetic and logic functional units.  On chip instruction cache, of course.
In other words, a current supercomputer on a chip.  This estimate is based on
current or near term projected technology.  How about it- think it can be
done?



  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

"In order to promise genuine progress, the acronym RISC should stand 
for REGULAR (not reduced) instruction set computer." - Wirth

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

phil@osiris.UUCP (Philip Kos) (04/21/87)

In article <1316@ames.UUCP>, lamaster@pioneer.arpa (Hugh LaMaster) writes:
>   If a 1 MIP system a few years ago needed 16MBytes of virtual
> and 4 MBytes of physical memory (my own figures), a 100MIP system needs about
> 2GB of virtual and 512 MBytes of physical memory....

That's assuming that MIPS and physical/virtual memory scale linearly.  I
have my doubts about this.

While MIPS (for the sake of argument, let's use the "wise man's definition",
or "Meaningless Index of Processor Speed") and virtual memory size may have
a more or less linear interrelationship, I've noticed more of a logarithmic
relationship between virtual and real memory.

Am I the only person out here who doesn't agree?

-- 
                              ...!decvax!decuac -
Phil Kos                                          \
The Johns Hopkins Hospital    ...!seismo!mimsy  - -> !aplcen!osiris!phil
Baltimore, MD                                     /
                              ...!allegra!mimsy -

"People say I'm lazy, dreaming my life away..."  - J. Lennon

amos@instable.UUCP (04/22/87)

To increase  flexibility, processors that  can access 64  bits of
addressing will  probably do it  the old fashioned way:  by using
one word (32  bits) as a base address, and  another as an offset.
There's no point  in dragging all 64  bits everywhere, especially
since  processing   is  usually   localized  among   clusters  of
addresses.

Also  keep  in  mind  that  future processors  will  have  to  be
compatible with older programs -  the programs we are writing now
on 32-bit address machines!
-- 
	Amos Shapir
National Semiconductor (Israel)
6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel  Tel. (972)52-522261
amos%nsta@nsc.com {hplabs,pyramid,sun,decwrl} 34.48'E 32.10'N

lamaster@pioneer.UUCP (04/23/87)

In article <751@instable.UUCP> amos%nsta@nsc.com (Amos Shapir) writes:

>To increase  flexibility, processors that  can access 64  bits of
>addressing will  probably do it  the old fashioned way:  by using
>one word (32  bits) as a base address, and  another as an offset.
>There's no point  in dragging all 64  bits everywhere, especially
>since  processing   is  usually   localized  among   clusters  of
>addresses.
>
>Also  keep  in  mind  that  future processors  will  have  to  be
>compatible with older programs -  the programs we are writing now
>on 32-bit address machines!
>-- 


I disagree with this completely.  If you need more than 32 bits, you need real
linear addressing on the more than 32 bits.  I take the history of attempts at
such extensions as my evidence: the PDP-11 was probably the most successful of
these extended architectures, but even it increased the usable address size
only modestly.  This style of extending the addresses is also used on the
80386, but people seem to be underwhelmed. 

The installed customer base argument could always be used to justify never
changing the architecture.  If we took it seriously, all current machines
would be upwards compatible with the IBM 650.  Only IBM really has the kind of
customer base that will accept half a loaf for the sake of upward
compatibility.  

The real question is: when will enough customers need this capability that you
can afford to build a product to meet their needs?  Once you decide when that
is, you need a product ready or you will never be able to get your share of
the market.  

I argue that we will need more than 32 bits within 4 years on
minis/mainframes.  Since within 4 years micros should be able to support a
mainframe architecture, I would argue that you should be planning a 64 bit
micro-to-mainframe architecture now, so that you can begin implementation 3
years from now.  

One might argue that at least one more generation of specifically
microprocessor architectures will be necessary before microprocessors 
can support mainframe architectures.  However, I understand from friends that
that there may indeed be machines out there with half a million gates.  If
true, it is enough gates to implement a mainframe, or even a supercomputer.

If you plan the next micro architecture a little more ambitiously,
you will be able to address customer needs further into the future without
either kludge extensions or yet another architecture switch.


  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

"In order to promise genuine progress, the acronym RISC should stand 
for REGULAR (not reduced) instruction set computer." - Wirth

("Any opinions expressed herein are solely the responsibility of the
author and do not represent the opinions of NASA or the U.S. Government")

firth@sei.cmu.edu.UUCP (04/23/87)

In article <751@instable.UUCP> amos%nsta@nsc.com (Amos Shapir) writes:
>To increase  flexibility, processors that  can access 64  bits of
>addressing will  probably do it  the old fashioned way:  by using
>one word (32  bits) as a base address, and  another as an offset.
>There's no point  in dragging all 64  bits everywhere, especially
>since  processing   is  usually   localized  among   clusters  of
>addresses.

Sigh!  I remember people saying exactly the same thing 15 years ago
when we were moving from 16-bit to 32-bit machines.  To repeat the
same old arguments

(a) You mean I can't declare

	integer array huge[0:1048575] ?

(b) How do I pass a by-reference parameter?  Aren't I going to have
    to transmit 64 bits somehow, and isn't that easier as one thing
    than as two things?

(c) How do I pass an array slice by reference.  I might be able to
    live with having all arrays start at a segment boundary, but
    what about a call

	sort(a[low:pivot])

    which surely MUST pass a 64-bit address as the base of the slice.
    And since ANY actual array might be a slice, EVERY formal array
    parameter must be represented this way.

(d) How do I do

	integer a(100),b
	equivalence a(10),b

    and then access b?

And so on...

eugene@pioneer.UUCP (04/23/87)

>
>(a) You mean I can't declare
>
>	integer array huge[0:1048575] ?
>
Here's one from the "real world" [term I was given when asked to give an
ACM talk to one of the CSU schools].

One of my assignments at JPL was to get SPICE running on a VAX [VMS]
(back in 1980).  I was given the assignment because there were few VAXen
at JPL then and it would not run on the Univac [1100/81].  The reason
why it would not run on the Univac was the new big array SPICE uses for
a temporary array was :
	DIMENSION TEMP(80000)
Univac addressing only allows 65K words of data space.  I also have
this program waiting for some standalone time:

	DIMENSION IA(256000000)
	t1 = second()
	do 1 i = 1,256000000
	 ia(i) = 0.0
1	continue
	t2 = second()
	time = t2-t1
	write(*,*) time
	stop
	end

So as VLSI goes, I see its simulation as a minimum O(n^2) problem. Just
a matter of time before 64-bits is needed.

From the Rock of Ages Home for Retired Hackers:

--eugene miya
  NASA Ames Research Center
  eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {hplabs,hao,ihnp4,decwrl,allegra,tektronix,menlo70}!ames!aurora!eugene

amos@instable.UUCP (Amos Shapir) (04/24/87)

I guess I didn't make myself clear: I didn't mean to describe another
@#$% 86! All I wanted is to point out that on 64-bit machine, there will
be some use of the base/offset mechanism. The full virtual 64-bit address
space can remain contiguous if you like, but you can also have a 'small'
32-bit model if you like to. The 4Gb limit, unlike the 64K limit, is
not something everybody will bump their heads into.

All that will of course depend on the implementation, but remember
that compatibility is going to stay with us always.
-- 
	Amos Shapir
National Semiconductor (Israel)
6 Maskit st. P.O.B. 3007, Herzlia 46104, Israel  Tel. (972)52-522261
amos%nsta@nsc.com {hplabs,pyramid,sun,decwrl} 34.48'E 32.10'N

franka@mmintl.UUCP (Frank Adams) (04/29/87)

In article <751@instable.UUCP> amos%nsta@nsc.com (Amos Shapir) writes:
>To increase  flexibility, processors that  can access 64  bits of
>addressing will  probably do it  the old fashioned way:  by using
>one word (32  bits) as a base address, and  another as an offset.

Unfortunately, some of those processors undoubtably will do this.  I am
convinced that there is no *good* way to do this; any scheme for addressing
which uses addresses shorter than the address space size is a horrible
kludge.

>Also  keep  in  mind  that  future processors  will  have  to  be
>compatible with older programs -  the programs we are writing now
>on 32-bit address machines!

Which is why machines being designed now should have 64 bit addressing as
the architectural limit.  One of the best things IBM did was to use a large
address size in the 360 (at the time, 24 bits was regarded as ridiculously
large) -- this enabled them to go 20 years with the same architecture.

Don't design the architecture for today's hardware.  Design it for next
decade's.

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

steve@edm.UUCP (Stephen Samuel) (04/30/87)

Consider the history of the 8086 family: Many people claimed that the segmented
memory was just fine until people actually started to USE (and need) more than
256K of memory for a program.  Suddenly: the fact that you couldn't easily have
arrays >64k started to irk people.

 If you have programs that actually USE >2G memory, then I can assure you that
most of that memory is gonna be in the form of HUGE arrays and people will be
rather pissed off to find that they can't be kept in one piece without a
performance loss (with arrays THAT big, you'll want all the performane you can
GET!

In a similar vein: Although I can ALWAYS use the extra MIPS, for the time being:
(I just HATE having to wait 5 seconds for a compile!!! :-] -- actually-- when
there's 2meg of code tocompile, it all ads up) It's a slightly different
case for memory: With the exception of graphics, I have very few programs where
I actually USE >1meg, or so of memory.  For home use, 2G real is gonna be lots
of room for a long time coming. Even current state of the art uniprocessors
would take a couple of minutes just to INITIALIZE 2G, much less USE it.


-- 
-------------
 Stephen Samuel 
  {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve

rb@cci632.UUCP (05/01/87)

In article <1071@osiris.UUCP> phil@osiris.UUCP (Philip Kos) writes:
>In article <1316@ames.UUCP>, lamaster@pioneer.arpa (Hugh LaMaster) writes:
>>   If a 1 MIP system a few years ago needed 16MBytes of virtual
>> and 4 MBytes of physical memory (my own figures), a 100MIP system needs about
>> 2GB of virtual and 512 MBytes of physical memory....

>That's assuming that MIPS and physical/virtual memory scale linearly.  I
>have my doubts about this.

>While MIPS (for the sake of argument, let's use the "wise man's definition",
>or "Meaningless Index of Processor Speed") and virtual memory size may have
>a more or less linear interrelationship, I've noticed more of a logarithmic
>relationship between virtual and real memory.

>Am I the only person out here who doesn't agree?

There is actually a possibility that as speed increases, memory requirements
might even go down.

It mostly depends on what the processor is being used for.  If the purpose
for using the higher speed processor is to service more users on relatively
dumb terminals, there is a good chance that replacing an 8 mips machine
with an 80 mips machine and going from 20 users to 200 user would result
in requirements for more memory and disk space.

On the flip side, if one is simply placing the faster processor into an
"intelligent terminal" or one that is primarily involved with manipulating
bit maps, there is a good chance that the 10 fold increase in speed would
not necessarily imply a 10 fold increase in memory requirements.

There are some applications such as special purpose communications, where
the additional processing speed can be very important, yet the actual
memory requirements change very little.

Rex B.

mmp@cuba.UUCP (05/05/87)

In article <751@instable.UUCP>, amos@instable.UUCP (Amos Shapir) writes:
> To increase  flexibility, processors that  can access 64  bits of
> addressing will  probably do it  the old fashioned way:  by using
> one word (32  bits) as a base address, and  another as an offset.
> There's no point  in dragging all 64  bits everywhere, especially
> since  processing   is  usually   localized  among   clusters  of
> addresses.
> 
> Also  keep  in  mind  that  future processors  will  have  to  be
> compatible with older programs -  the programs we are writing now
> on 32-bit address machines!
>
If you replace 32 for 64, and 16 for 32 in the argument above,
you would be very close to the argument that Zilog put up for
years to justify their going with a base+offset addressing
scheme, rather than the linear addressing scheme that the
MC68000 designers chose.

In case you haven't noticed: Zilog lost.  They lost the argument
(they've since incorporated 32-bit addresses into their
architecture), and they lost the market opportunity (they were
first with a working 16/32 micro, which was also incorporated in
the first non-DEC Unix machine, the Onyx something-or-other and
later in a couple of Plexus models).

You have to be very careful in optimizing the important thing
and not the "obvious" thing.


____________________________________________________
* Matt Perez *   sun!cuba!mmp  (415) 691-7544
DISCLAIMER: beisbole has bean bery, bery guud too me