[comp.arch] Extremely Fast Filesystems

davecb@yunexus.YorkU.CA (David Collier-Brown) (07/31/90)

In <5465@darkstar.ucsc.edu> Craig Partridge writes:
|  I'm curious.  Has anyone done research on building extremely
|  fast file systems, capable of delivering 1 gigabit or more of data
|  per second from disk into main memory?  I've heard rumors, but no
|  concrete citations.

puder@zeno.informatik.uni-kl.de (Arno Puder) writes:
| Tanenbaum (ast@cs.vu.nl) has developed a distributed system called
| AMOEBA. Along with the OS-kernel there is a "bullet file server".
| (BULLET because it is supposed to be pretty fast).
| 
| Tanenbaum's philosophy is that memory is getting cheaper and cheaper,
| so why not load the complete file into memory? This makes the server
| extremely efficient. Operations like OPEN or CLOSE on files are no
| longer needed (i.e. the complete file is loaded for each update).

  Er, sorta...  You could easily write an interface that did writes
or reads without open or closes, for some specific subset of uses.

| 
| The benchmarks are quite impressive although I doubt that this
| concept is useful (especially when thinking about transaction
| systems in databases).

  Well, I have something of the opposite view: a system like Bullet makes a
very good substrate for a database system.  

  The applicable evidence is in the article "Performance of an OLTP
Application on Symmetry Multiprocessor System", in the 17th Annual
International Symposium on Computer Architecture, ACM SigArch Vil 18 Number
2, June 1990. (see, a reference (:-))
  The article uses all-in-memory databases in the TP1 benchmark as a
limiting case while investigating the OS and architectural support that
are necessary for good Transaction Processing speeds, and the speeds
are up in the range that Craig may find interesting...

  My speculation is that a bullet-like file system with a relation-
allocating layer (call it the Uzi filesystem? the speedloader filesystem??)
on top would make a very good platform for a relational database.  Certainly
the behavior patterns of an in-memory, load-whole-relation database would be
easy to reason about, and would be easy and interesting to investigate.

| You can download Tanenbaum's original paper (along with a "complete"
| description about AMOEBA) via anonymous ftp from midgard.ucsc.edu
| in ftp/pub/amoeba.


--dave
-- 
David Collier-Brown,  | davecb@Nexus.YorkU.CA, ...!yunexus!davecb or
72 Abitibi Ave.,      | {toronto area...}lethe!dave 
Willowdale, Ontario,  | "And the next 8 man-months came up like
CANADA. 416-223-8968  |   thunder across the bay" --david kipling

rminnich@super.ORG (Ronald G Minnich) (08/05/90)

>puder@zeno.informatik.uni-kl.de (Arno Puder) writes:
>| Tanenbaum's philosophy is that memory is getting cheaper and cheaper,
>| so why not load the complete file into memory? This makes the server
>| extremely efficient. Operations like OPEN or CLOSE on files are no
>| longer needed (i.e. the complete file is loaded for each update).

Yes, note that BULLET does not support a write command. You create, 
delete, and read files. If memory serves, thats it. 
This is very elegant, but there is 
a problem. We're running out of address bits again. 

I gave the standard "why shared memory is a nice way to do a high speed
network interface" talk the other day and someone pointed out that on 
Multics, with memory-mapped files, you always had to support the read-write 
interface for any program because the address space of the machine was too 
small for the memory-file abstraction to cover all files, and if your 
program couldn't deal with all files, it was useless. So you hacked your
program: if the file is too big, do it read write, otherwise use memory
files. And people realized that it was easier just to do read write,
and stopped bothering with memory files. Obviously this applies
to architectures we have now: lots of files are bigger than the 4 Gb address
space of my Sun, and things are not getting any better. And of course on 
Crays you don't get memory-mapped files at all. So the programs I now write
that use memory-mapped files on SunOS always have an out in the event that the 
mmmap fails or the system I am on does not support it. Conclusion: Bullet is 
really cool, as are memory-mapped files, but their eventual utility is 
limited by computer architecture questions. Since read-write is more general,
maybe it is the wave of the future. Gee, i don't like that! I have several
programs that got lots faster because calls to read() were replaced by 
an address computation. But my architectures have left me in a bind.
For your example gigabyte/second file system I can run through my Sun's
address space in 4 seconds. 

   Now what do we do? Maybe the next round of address spaces should be 
large enough to address all the atoms on the planet- that should cover 
us for a while. 
ron
-- 
1987: We set standards, not Them. Your standard windowing system is NeUWS.
1989: We set standards, not Them. You can have X, but the UI is OpenLock.
1990: Why are you buying all those workstations from Them running Motif?

davecb@yunexus.YorkU.CA (David Collier-Brown) (08/06/90)

puder@zeno.informatik.uni-kl.de (Arno Puder) writes:
| Tanenbaum's philosophy is that memory is getting cheaper and cheaper,
| so why not load the complete file into memory? This makes the server
| extremely efficient. Operations like OPEN or CLOSE on files are no
| longer needed (i.e. the complete file is loaded for each update).

rminnich@super.ORG (Ronald G Minnich) writes:
| This is very elegant, but there is 
| a problem. We're running out of address bits again. 
|
| I gave the standard "why shared memory is a nice way to do a high speed
| network interface" talk the other day and someone pointed out that on 
| Multics, with memory-mapped files, you always had to support the read-write 
| interface for any program because the address space of the machine was too 
| small for the memory-file abstraction to cover all files [...]

  To again misquote Morven's Metatheorum, ``any problem in computer
science can be solved with one more level of indirection...''
  This is dealt with by doing a transparent interface on top of the
large files, something like Multics MSFs, but done so the ill-advised
applications programmer (me --dave) won't depend on knowing how it was
implemented.
  I specifically considered large, relational, databases built on a bullet
fileserver!  The primitives provided to the DBMS would be read, write,
committ and abort on a pre-initiated relation.  Many relations would be
small enough to load into a proprely-configured fileserver, some would not.
The overlarge ones sould be slit two ways: transversely or longitudinally.
Transversely would be transparent to the application, if not to the human
DBM (he'd detect a performance loss, probably).  Longitudinally would
be visible to the applications (in a two-schema DBMS), because the DBM
would have to split them based on field-usage statistics. It wouldn't be
a problem in a three-schema architecture (modulo fiascos).

  Note that this is not a general answer to the problem, though.  Full
generality does require some form of ``extra-long address'', whether
implemented as a segment number sequence, a ``special large address'' in
either hardware or software, or a stdio FILE emulation library that only
provided it for seek/tell operations and hid it otherwise.

  I wouldn't mind tha latter too much: it's a nice interface for 90% of
the programs I've ever written, since they mostly read and wrote small
sequential files...  The other 10% took the other 90% of my time (:-)).

--dave
-- 
David Collier-Brown,  | davecb@Nexus.YorkU.CA, ...!yunexus!davecb or
72 Abitibi Ave.,      | {toronto area...}lethe!dave 
Willowdale, Ontario,  | "And the next 8 man-months came up like
CANADA. 416-223-8968  |   thunder across the bay" --david kipling

jesup@cbmvax.commodore.com (Randell Jesup) (08/07/90)

In article <30728@super.ORG> rminnich@super.UUCP (Ronald G Minnich) writes:
>>puder@zeno.informatik.uni-kl.de (Arno Puder) writes:
>>| Tanenbaum's philosophy is that memory is getting cheaper and cheaper,
>>| so why not load the complete file into memory? This makes the server
>>| extremely efficient. Operations like OPEN or CLOSE on files are no
>>| longer needed (i.e. the complete file is loaded for each update).
...
>This is very elegant, but there is 
>a problem. We're running out of address bits again. 
...
>and stopped bothering with memory files. Obviously this applies
>to architectures we have now: lots of files are bigger than the 4 Gb address
>space of my Sun, and things are not getting any better. And of course on 
>Crays you don't get memory-mapped files at all. So the programs I now write
>that use memory-mapped files on SunOS always have an out in the event that the 
>mmmap fails or the system I am on does not support it. Conclusion: Bullet is 
>really cool, as are memory-mapped files, but their eventual utility is 
>limited by computer architecture questions. Since read-write is more general,
>maybe it is the wave of the future. Gee, i don't like that!

	I submit that your situation is something of an unusual case, and is
likely to remain unusual for at least a decade, perhaps 2.  Few machines
(percentage-wise) even have 4 GB of storage, let alone files larger that 4GB
(I've never even seen a file larger than 100MB, even on mainframes).

	Eventually, perhaps, but not in the near future.  There are people
who have greater needs, that's the whole justification for the selling of
supercomputers, and the vastly expensive (read fast & large) IO systems that
support them.  But they're a tiny minority, numbers-wise.  Until the
number of people that require such things increases sufficiently, the only
architectures to support the extra address bits will be the super-(and maybe
mini-super-)computers.  Those extra address bits are _not_ free, in silicon,
memory, etc.  (I hope we haven't started the 32+ addr bit rwars again...)

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.cbm.commodore.com  BIX: rjesup  
Common phrase heard at Amiga Devcon '89: "It's in there!"

dave@fps.com (Dave Smith) (08/07/90)

In article <13526@yunexus.YorkU.CA> davecb@yunexus.YorkU.CA (David Collier-Brown) writes:
>puder@zeno.informatik.uni-kl.de (Arno Puder) writes:
>| Tanenbaum's philosophy is that memory is getting cheaper and cheaper,
>| so why not load the complete file into memory? This makes the server
>| extremely efficient. Operations like OPEN or CLOSE on files are no
>| longer needed (i.e. the complete file is loaded for each update).
>

I thought the Bullet file server was neat, but...I showed the paper to
one of our customers.  He read through it and laughed.  He said that
he wanted to have files bigger than his main memory, that was the
whole point of disks.

The Bullet server doesn't address the problems of very large files.  I also
see problems with the Bullet server when two large files (~ as large as the
memory size) are needed at the same time.

--
David L. Smith
FPS Computing, San Diego        
ucsd!celerity!dave or dave@fps.com

davecb@yunexus.YorkU.CA (David Collier-Brown) (08/07/90)

 In article <30728@super.ORG> rminnich@super.UUCP (Ronald G Minnich) writes:
[in a discussion of Tannenbaum's Bullet fileserver]
| >This is very elegant, but there is 
| >a problem. We're running out of address bits again. 

jesup@cbmvax.commodore.com (Randell Jesup) writes:
| 	I submit that your situation is something of an unusual case, and is
| likely to remain unusual for at least a decade, perhaps 2.  Few machines
| (percentage-wise) even have 4 GB of storage, let alone files larger that 4GB
| (I've never even seen a file larger than 100MB, even on mainframes).

	Alas, I worked on a library system under Unix... you wouldn't believe
the space costs of describing one book (:-)).  I kept having to check that
expected file maximums wouldn't exceed disk-drive sizes each time the DBMS
vendor postponed using raw partitions gain. 100MB was perfectly plausible,
and we had to plan for at least three existing sites well over that size.
Obviously we split this across many "files".

| 	Eventually, perhaps, but not in the near future.  There are people
| who have greater needs, that's the whole justification for the selling of
| supercomputers, and the vastly expensive (read fast & large) IO systems that
| support them.  But they're a tiny minority, numbers-wise.  Until the
| number of people that require such things increases sufficiently, the only
| architectures to support the extra address bits will be the super-(and maybe
| mini-super-)computers.  Those extra address bits are _not_ free, in silicon,
| memory, etc.  (I hope we haven't started the 32+ addr bit rwars again...)

	It's always a bad idea to put a hard addressing limit on things:
Intel, based on their past needs, octupled the addressable memory available
when they introduced the 8086, even though they expected 16k was adequate.
Experience has shown them wrong (;-)).  They needed to increase it by a
somewhat larger factor [see, we're back to computer architecture again].

   	I claim that this applies to files, too, and eventually to disks.
If you put hard limits in, people crash up against them.
	In the narrow case of Bullet, you can reinvent multi-segment files[1],
improve the addressing capabilities of the hardware or a combination of the
two.  Or other ideas I haven't even dreamed of yet. 
	I confess I'd have **real** trouble selling a raw bullet file system
to a customer doing anything but cad/cam, software development[2] or small
databases. 


--dave 

[1] A file which has multiple parts (segments), and which can be manipulated
transparently as if it had only one part. Multics MSFs were the first, but
weren't transparent enough. If you substitute segment for page in the above,
the mechanism used to implement it becomes almost obvious.  Alas, it still
requires a loooooooong integral variable somewhere user-accessable for
positioning oneself.

[2] Which includes academic computing, you understand: that's what I do
these days.
-- 
David Collier-Brown,  | davecb@Nexus.YorkU.CA, ...!yunexus!davecb or
72 Abitibi Ave.,      | {toronto area...}lethe!dave 
Willowdale, Ontario,  | "And the next 8 man-months came up like
CANADA. 416-223-8968  |   thunder across the bay" --david kipling

davecb@yunexus.YorkU.CA (David Collier-Brown) (08/07/90)

dave@fps.com (Dave Smith) writes:
| I thought the Bullet file server was neat, but...I showed the paper to
| one of our customers.  He read through it and laughed.  He said that
| he wanted to have files bigger than his main memory, that was the
| whole point of disks.

	Agreed.  I think I'd like your customer (:-)).

| The Bullet server doesn't address the problems of very large files.  I also
| see problems with the Bullet server when two large files (~ as large as the
| memory size) are needed at the same time.

	If one restricts the problem to the fileserver, and agrees that the
data will appear on the compute engine as a series of pages when needed,
then one need merely (ahem) ensure that the fileserver has enough memory for
a number of complete files and can feed the required pages to the compute
machine when asked.  This kind of fileserver is only slightly different
from what we have now, but it's a difference in **nature**, so it won't be
easy to do "right".
	Right now, I'd have to restrict Bullet to easily decomposable data
problems, like software development and teaching (lots of little files),
CAD/CAM (moderatly monstrous files), and with a bit of arm-waving,
transaction processing.


	In my next rant[tm], I'll touch on architectural support for VERY
LARGE FILES (:-)).

--dave
-- 
David Collier-Brown,  | davecb@Nexus.YorkU.CA, ...!yunexus!davecb or
72 Abitibi Ave.,      | {toronto area...}lethe!dave 
Willowdale, Ontario,  | "And the next 8 man-months came up like
CANADA. 416-223-8968  |   thunder across the bay" --david kipling

mash@mips.COM (John Mashey) (08/07/90)

In article <13667@cbmvax.commodore.com> jesup@cbmvax (Randell Jesup) writes:
>...
>>This is very elegant, but there is 
>>a problem. We're running out of address bits again. 
>...

>	I submit that your situation is something of an unusual case, and is
>likely to remain unusual for at least a decade, perhaps 2.  Few machines
>(percentage-wise) even have 4 GB of storage, let alone files larger that 4GB
>(I've never even seen a file larger than 100MB, even on mainframes).
>
>	Eventually, perhaps, but not in the near future.  There are people
>who have greater needs, that's the whole justification for the selling of
>supercomputers, and the vastly expensive (read fast & large) IO systems that
>support them.  But they're a tiny minority, numbers-wise.  Until the
>number of people that require such things increases sufficiently, the only
>architectures to support the extra address bits will be the super-(and maybe
>mini-super-)computers.  Those extra address bits are _not_ free, in silicon,
>memory, etc.  (I hope we haven't started the 32+ addr bit rwars again...)

Well, there are always less higher-end things than lower-end ones.
However, I'd STRONGLY disagree with the idea that 64-bit machines will
remain confined to the super- & minisuper world for 10-20 more years.
I propose instead:
	a) We are currently consuming address space at the rate of 1 bit year.
	b) Plenty of applications already exist for workstation-class machines,
	for which the devlopers bitterly complain that they only have
	31 or 32 bits of virtual address space, regardless of how much
	physical address space they have.  Note that they want bigger
	physical memories, also, of course.  However, the real issue is being
	able to structure applications conveniently, and then slide
	various amounts of real memory underneath.
	I've participated in customer meetings (commercial, not even
	scientific), in which people complained seriously that some
	microprocessor-based machine of ours started with 256MB as
	maximum memory.  They were more happy to know we'd get 1GB soon,
	but they still grumbled that it should be higher....
	c) Observe that there already exist desktop workstations that
	support max physical memories in the 128MB - 512MB range,
	using 4Mb DRAMs. Hence, by the 64Mb DRAM generation, one can expect
	2GB - 8GB maxes.  After all, at that point, you can get 4GB or
	so within a 1-ft cube.
	Of course, such things will not be on every desktop.
	However, people will certainly expect the servers to be able to
	do such things, and they'll certainly want workstations and servers
	that run the same code, especially since the economics of this
	business mandate that a company's smaller servers be derived from
	the workstations. 

So, here's my counter-prediction to the idea that it will be 10-20 years:

No later than 1995:
	1) There will be, in production, 64-bit microprocessors
	(and I mean 64-bit integers & pointers, not just 64-bit
	datapaths, which micros have had for years in FP).
	They'll cost < $500 apiece, i.e., less than a 486 does today.
	They'll either be new architectures, or derivations of existing
	RISCs.
	2) In fact, they'll be shipping in systems, in reasonable quantities.
	Let me try a market-analyst prediction and claim that there
	will be at least 50,000 such machines out there by YE1995,
	and 150,000 by YE1996.
Now, 150,000 machines is not a huge number .... but it's rather larger
than the number of supers and minisupers....

So, here's a thought to stimulate discussion:
	What applications (outside the scientific / MCAD ones that
	can obviously consume the space) would benefit from 64-bit
	machines?

	Why?  (for example, here are some low-level reasons why a
		a particular one might benefit):

		a) Need more physical memory, and thus more virtual address
		to deal with it conveniently.
		b) Need more virtual memory, to address a lot of data at once,
		and so probably need more phyiscal memory also.
		c) Need more virtual memory, sometimes sparsely addressed,
		to use algorithms and design approaches to make the software
		reasoanble, but possibly with less physical memory than b).
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

davecb@yunexus.YorkU.CA (David Collier-Brown) (08/08/90)

mash@mips.COM (John Mashey) writes:
|So, here's a thought to stimulate discussion:
|	What applications (outside the scientific / MCAD ones that
|	can obviously consume the space) would benefit from 64-bit
|	machines?

|	Why?  (for example, here are some low-level reasons why a
|		a particular one might benefit):

|		a) Need more physical memory, and thus more virtual address
|		to deal with it conveniently.
|		b) Need more virtual memory, to address a lot of data at once,
|		and so probably need more phyiscal memory also.
|		c) Need more virtual memory, sometimes sparsely addressed,
|		to use algorithms and design approaches to make the software
|		reasoanble, but possibly with less physical memory than b).

  Well, transaction processing machines with many large (effectively sparse
(:-)) databases to manipulate might well require fileservers with large
spares address spaces, even if the machines doing the computing didn't 
need all of the data at once.
  Note that TP, like real-time, is an environment where the implausible is
done regularly, and the inelegant daily (:-)), all in the name of
performance. 

--dave (rant[tm] coming: get your K key ready) c-b
-- 
David Collier-Brown,  | davecb@Nexus.YorkU.CA, ...!yunexus!davecb or
72 Abitibi Ave.,      | {toronto area...}lethe!dave 
Willowdale, Ontario,  | "And the next 8 man-months came up like
CANADA. 416-223-8968  |   thunder across the bay" --david kipling

pha@caen.engin.umich.edu (Paul H. Anderson) (08/08/90)

In article <13667@cbmvax.commodore.com> jesup@cbmvax (Randell Jesup) writes:
>
> (discussion about mapping 4G+ files in workstations deleted)
>
>	I submit that your situation is something of an unusual case, and is
>likely to remain unusual for at least a decade, perhaps 2.  Few machines
>(percentage-wise) even have 4 GB of storage, let alone files larger that 4GB
>(I've never even seen a file larger than 100MB, even on mainframes).
>
>	Eventually, perhaps, but not in the near future.  There are people
>who have greater needs, that's the whole justification for the selling of
>supercomputers, and the vastly expensive (read fast & large) IO systems that
>support them.  But they're a tiny minority, numbers-wise.  Until the
>number of people that require such things increases sufficiently, the only
>architectures to support the extra address bits will be the super-(and maybe
>mini-super-)computers.  Those extra address bits are _not_ free, in silicon,
>memory, etc.  (I hope we haven't started the 32+ addr bit rwars again...)
>

In order to make computers very useful to social scientists, for studies
of econometric or populations data, large data sets will be the norm.

Populations Studies Center, for example, would like nothing better than to
quickly analyze 5 gigabyte datasets (hence my earlier request for large
RAM systems).  Furthermore, many such datasets exist.  The 1990 census
is just one 5 gigabyte file - there are similar files for the last
100 years or more.  Likewise for China, Russia, Europe, and more.

Analyzing these things quickly is not currently very easy, but that
doesn't mean that people don't want to do it.

The demand is there now for computer systems that can deal with these
problems.  It may be some time before the demand and the cost for
meeting that demand meet, but make no mistake, the demand is there
right now!

Paul Anderson
University of Michigan

cliffc@sicilia.rice.edu (Cliff Click) (08/08/90)

In article <1990Aug7.190719.7907@caen.engin.umich.edu> pha@caen.engin.umich.edu (Paul H. Anderson) writes:
[ ...stuff about huge files... ]

Seems a step in the right direction would be to include transparent "bignums" 
as a standard part of a programming language.  Thus the applications 
programmer writes his programs that don't care how big the file gets (the
system read/write/seek call must handle "bignums").  The smart compiler
can figure out when everything's going to remain as "small integers" and
skip the expensive run-time check code when it can be avoided.  

How the OS maps the "bignum" to physical devices is a less difficult nut 
to crack- one can always shoot for an easy & slow solution (bank swapping,
paging, etc...).  

The problems are 1) getting applications into a world/language which has
no size restrictions on integers, and 2) getting compilers which can prevent
the grotesque performance hits from generic "bignum" handling (a topic
which is somewhat close to my heart ;-).

Cliff Bignum Click
-- 
Cliff Click                
cliffc@owlnet.rice.edu       

khb@chiba.Eng.Sun.COM (Keith Bierman - SPD Advanced Languages) (08/08/90)

In article <1990Aug7.190719.7907@caen.engin.umich.edu> pha@caen.engin.umich.edu (Paul H. Anderson) writes:

...

   Populations Studies Center, for example, would like nothing better than to
   quickly analyze 5 gigabyte datasets (hence my earlier request for large
   RAM systems).  Furthermore, many such datasets exist.  The 1990 census
   is just one 5 gigabyte file - there are similar files for the last
   100 years or more.  Likewise for China, Russia, Europe, and more.

   Analyzing these things quickly is not currently very easy, but that
   doesn't mean that people don't want to do it.
...

Humm. In estimation problems there are lots of ways to skin cats.
Algorithms which have huge datasets, but "small" models do not require
huge "core" storage.

In the satallite tracking biz, some experiements (like GPS baselines)
go on for years, and Tb of data could be necessary if one formed the
obvious 
         T
	A A

and proceeded to use elimination from there.

Back when I did that sort of work, we employed Square-Root Information
Filters, and/or UDU**T decomposition techniques. If, for the sake of
argument, your model has 70 independent variables, the bulk of the
"core" needed is

	(70+71)/2 = 71 words of storage

_independent_ of the size of the dataset. Of course, one also gets
estimates in "real time" (viz as fast as the data are available).

The "naive" approach would require that the entire dataset fit in
"core". 

I am sure that there are many problems which require really huge
memories ... but I am certain that use of appropriate algorithms can
limit the number of such "hogs" considerably.

Those interesed in SRIF and UD techniques might wish to peruse

	Factorization Methods for Discrete Sequential Estimation
	ISBN  0 12 097350 2


--
----------------------------------------------------------------
Keith H. Bierman    kbierman@Eng.Sun.COM | khb@chiba.Eng.Sun.COM
SMI 2550 Garcia 12-33			 | (415 336 2648)   
    Mountain View, CA 94043

pha@caen.engin.umich.edu (Paul H. Anderson) (08/08/90)

In article <KHB.90Aug7132932@chiba.Eng.Sun.COM> khb@chiba.Eng.Sun.COM (Keith Bierman - SPD Advanced Languages) writes:
>
>Humm. In estimation problems there are lots of ways to skin cats.
>Algorithms which have huge datasets, but "small" models do not require
>huge "core" storage.
>
>In the satallite tracking biz, some experiements (like GPS baselines)
>go on for years, and Tb of data could be necessary if one formed the
>obvious 
>         T
>	A A
>
>and proceeded to use elimination from there.
>

This is very true, but researchers benefit enourmously from
interactive computation, where the type of one computation
may depend on the outcome of the preceding ones.

Ideally, students in classes would be able to investigate problems
interacitely in a matter of minutes what it currently takes researchers
many months to accomplish.  The thing that prevents this from
taking place currently is that the datasets are on magtapes, and
therefore any computation using the entire set of data is forced to
sequentially access a very slow medium.

The current technique is for a researcher to identify the smallest
possible subset that has all the information they think they need.
This is coded up in a program that is run on a 3090 with lots of
big fast expensive tape drives.  Eventually, the researcher gets
the information they asked for, and if they are lucky, it is actually
what they need.

The problem is that the process of exploration doesn't match this
kind of turnaround at all, so as a result, highly competent social
scientists sit around on their collective backsides, waiting for
data to show up on their desks.  Computers that can address this
problem are needed now, independently of whether or not they actually
exist.

We don't have a real problem optimizing use of the hardware we
have now, it is just that the available hardware has too little
RAM, or has filesystems that are too slow.

Paul Anderson
University of Michigan

davecb@yunexus.YorkU.CA (David Collier-Brown) (08/08/90)

cliffc@sicilia.rice.edu (Cliff Click) writes:
[about adressing large files with bignums]
| How the OS maps the "bignum" to physical devices is a less difficult nut 
| to crack- one can always shoot for an easy & slow solution (bank swapping,
| paging, etc...).  

| The problems are 1) getting applications into a world/language which has
| no size restrictions on integers, and 2) getting compilers which can prevent
| the grotesque performance hits from generic "bignum" handling (a topic
| which is somewhat close to my heart ;-).

   Well, I wouldn't go as far as to remove range types from integers, but I
would propose a combined compiler->os->hardware solution along those general
lines:

[Caution: architectural speculation from a non-architect (a philosopher)
 follows: press "n" if not interested, "k" if you don't like rants.]

  Let us imagine
	0) a language with different length integral types, at least one of
	   which is longer than ``usual'', and specifically long enough to
	   describe more memory than the biggest virtual address used by the
	   target machine(s),
	1) a library declaring a large integral type we will call a vaddr_t,
	   which maps to some language-supported first-class construct, if
	   only a struct vaddr_t { long foo[some number]; };
	2) an OS that uses vaddr_t's as the parameters to its 
		a) file-positioning functions (like seek and tell),
		b) dereference operators, iff typing is preserved thereby,
	3) hardware that can support either one (initially) or more
	   lengths of virtual address.
  One can then write library functions which manipulate memory-mapped files
in a larger memory ``model'' than the hardware supports, and later upgrade
hardware, compilers and libraries as the supported types get cloaser to what
the applcations writers are using.

 Of course, our imaginary architect and his friend the compiler-writer have
just been handed an interesting task (:-)):
	2) making a non-standard-length pointer usable, generating
	   good code to access it, dereference it and compare it.
	   (there are some classical tricks: bignum'rs probably can
	   comment here)
	2.5) folding the overlength construct down into usable chunks.
	   (just as seeking and then referencing parts of the stdio
	   buffer is done by, what else, stdio) in both the standard
	   i/o and the memory-mapped i/o libraries, and
	3) giving the above adequate, elegant hardware support.

  My personal speculation on (3) is that someoone will provide a LAA
and SAA instruction-pair in a RISC machine:  it stands for Load Absurd
Address, and really means ``load less significant part of bloody oversize
number into register, discarding the rest''.

--dave (this has been a rant[tm] by...) c-b
-- 
David Collier-Brown,  | davecb@Nexus.YorkU.CA, ...!yunexus!davecb or
72 Abitibi Ave.,      | {toronto area...}lethe!dave 
Willowdale, Ontario,  | "And the next 8 man-months came up like
CANADA. 416-223-8968  |   thunder across the bay" --david kipling

rbw00@ccc.amdahl.com ( 213 Richard Wilmot) (08/08/90)

davecb@yunexus.YorkU.CA (David Collier-Brown) wrote:

> In article <30728@super.ORG> rminnich@super.UUCP (Ronald G Minnich) writes:
> [in a discussion of Tannenbaum's Bullet fileserver]
> | >This is very elegant, but there is
> | >a problem. We're running out of address bits again.
>
    ... stuff deleted

> It's always a bad idea to put a hard addressing limit on things:
> Intel, based on their past needs, octupled the addressable memory available
> when they introduced the 8086, even though they expected 16k was adequate.
> Experience has shown them wrong (;-)).  They needed to increase it by a
> somewhat larger factor [see, we're back to computer architecture again].

  ... more deleted.

>--dave


Indeed. I began administering a database of 1,600 MB in 1974 for Kaiser
Medical of Northern California. I think their membership has grown from
the 1,000,000 active and 2,000,000 inactive of that time and Kaiser very
likely keeps much more data about each member. This was done with an IBM
370/158 computer having 1 MB of main memory. All of the performance parameters
of that hardware system can be easily exceeded by today's high-end PCs.
I am certainly willig to bet that they've exceeded the 4,300 MB addressing
limit of IBM's VSAM by now (for them there are ways around this).
We always seem to exceed any addressing limit we can imagine.

I still think it is time to stop trying to use integers for addressing.
They always break down and probably always will. Many computers today have
floating point units. I would like to see floating point used for addressing.
It would help a great deal if addresses were not dense. What I have in mind
is for data objects to have addresses but these would be floating point and
between any two objects we could *always* fit another new object. In this way
I could expand a file from a hundred bytes to a hundred gigabytes without
changing the addresses of any stored objects. An index pointing to objects
in such a file would never need to be adjusted as the file grew or shrank.
Plastic addressing. This could still be efficient since addresses are almost
never real anymore anyway. Addressing is used to locate things in high speed
processor caches, in virtual memory, in virtual memory and on disk drives
(and in disk controller caches). Integer addresssing is unsuited to all these
different tasks. Fractional addressing could be flexible enough to allow for
all these locational needs.

Some things are nicely stored by hashing instead of by b*tree organization
(e.g. person records by driver license number) (it minimizes update locking
problems prevalent in b*trees as well as saving one or more extra levels of
access. This is hard to do as a file grows but would be simple with a file
addressed by fractions (0.5 is *logically* half way through the file). I
think this was used by one of the graphics systems for describing picture
objects (GKS?).

So when will I see fractional addressing?
-- 
  Dick Wilmot  | I declaim that Amdahl might disclaim any of my claims.
                 (408) 746-6108

ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) (08/08/90)

In <10606@celit.fps.com>, dave@fps.com (Dave Smith) says "I thought the
Bullet file server was neat, but...I showed the paper to one of our
customers.  He read through it and laughed.  He said that he wanted
to have files bigger than his main memory, that was the whole point
of disks."

Are you sure your customer isn't confusing the size of main *physical*
memory with the size of *virtual* memory?

Lawrence D'Oliveiro                       fone: +64-71-562-889
Computer Services Dept                     fax: +64-71-384-066
University of Waikato            electric mail: ldo@waikato.ac.nz
Hamilton, New Zealand    37^ 47' 26" S, 175^ 19' 7" E, GMT+12:00

cliffc@sicilia.rice.edu (Cliff Click) (08/08/90)

In article <dczq02zP01Hc01@JUTS.ccc.amdahl.com> rbw00@JUTS.ccc.amdahl.com (  213  Richard Wilmot) writes:
>davecb@yunexus.YorkU.CA (David Collier-Brown) wrote:
>
>> In article <30728@super.ORG> rminnich@super.UUCP (Ronald G Minnich) writes:
>> [in a discussion of Tannenbaum's Bullet fileserver]
>> | >This is very elegant, but there is
>> | >a problem. We're running out of address bits again.
>>
>    ... stuff deleted
>
> [ ...stuff about using fractional addressing deleted... ]
>
>So when will I see fractional addressing?

Humm... infinite precision fractional addressing... isn't this equivalent to
using "bignums"?  Suppose I take a fractional addressing system, and generate
a fraction digit-by-digit, 2 digits for each letter of a person's name.  
Suddenly I have a unique fraction which is some permutation of a name.  In
other words I have a system where names are addresses, and since names have 
no length limit, I have no limit to my file sizes.  I say a rose by any other 
addressing mode is a rose; what your describing is nothing more than a 
key-access file system.

Cliff Infinite Fractions Click
-- 
Cliff Click                
cliffc@owlnet.rice.edu       

nvi@mace.cc.purdue.edu (Charles C. Allen) (08/08/90)

> 	I submit that your situation is something of an unusual case, and is
> likely to remain unusual for at least a decade, perhaps 2.  Few machines
> (percentage-wise) even have 4 GB of storage, let alone files larger that 4GB
> (I've never even seen a file larger than 100MB, even on mainframes).

Until recently, the "standard" media for transporting files has been
9-track 6250 tape, which holds around 200M.  Until recently, all our
data files were less than 200M (hmm... I hope you see the
correlation).  Now that we have some 8mm tape drives, we routinely
have 400M files.  We'd have bigger ones, but all our disks are little
SCSI 600-700M thingies (access time is not very critical), and we
can't easily have a single file span volumes.  This is for high energy
physics data analysis.

Charles Allen			Internet: cca@newton.physics.purdue.edu
Department of Physics			  nvi@mace.cc.purdue.edu
Purdue University		HEPnet:   purdnu::allen, fnal::cca
West Lafayette, IN  47907	talknet:  317/494-9776

dave@fps.com (Dave Smith) (08/08/90)

In article <1179.26bffdbf@waikato.ac.nz> ccc_ldo@waikato.ac.nz (Lawrence D'Oliveiro, Waikato University) writes:
>In <10606@celit.fps.com>, dave@fps.com (Dave Smith) says "I thought the
>>Bullet file server was neat, but...I showed the paper to one of our
>>customers.  He read through it and laughed.  He said that he wanted
>>to have files bigger than his main memory, that was the whole point
>>of disks."
>
>Are you sure your customer isn't confusing the size of main *physical*
>memory with the size of *virtual* memory?

Yes.  We support (in the current product) up to 1GB of physical memory.
Convex, Cray, etc. also support main memory sizes in that range.  The customer 
in question does seismic processing and routinely has data sets in the 10GB 
range.  They have a file server which pulls several smaller files together 
into a larger virtual file to get around our current limitations on maximum
file size.

Virtual memory for a Bullet-style server is kind of like using /dev/ram as
your swap device.  Why copy in from the disk just to copy it right back
out again?
--
David L. Smith
FPS Computing, San Diego        
ucsd!celerity!dave or dave@fps.com

dave@fps.com (Dave Smith) (08/08/90)

In article <dczq02zP01Hc01@JUTS.ccc.amdahl.com> rbw00@JUTS.ccc.amdahl.com (  213  Richard Wilmot) writes:
>I still think it is time to stop trying to use integers for addressing.
>They always break down and probably always will. Many computers today have
>floating point units. I would like to see floating point used for addressing.
>It would help a great deal if addresses were not dense. What I have in mind
>is for data objects to have addresses but these would be floating point and
>between any two objects we could *always* fit another new object. 

This won't work.  There are only so many distinct numbers representable by
floating point.  There will be points where there is no "room" between
two numbers because the granularity of the floating point doesn't allow
a number to be represented between them.  As a simple example with two digit
decimal floating point (two digits of mantissa, one digit of exponent),
find the number between 1.01x10^1 and 1.02x10^1.  With a 64-bit (combined
mantissa and exponent) floating point number there are 2^64 distinct numbers
that can be represented.  The range is very large, but the numbers are sparse.

I liked the idea of "tumblers" as put forth by the Xanadu project.  Variable
length indices, that's the only way to go.  I think I'll go over and hit
the hardware engineers over the head until they figure out a way to make
it fast :-).

--
David L. Smith
FPS Computing, San Diego        
ucsd!celerity!dave or dave@fps.com

clj@ksr.com (Chris Jones) (08/08/90)

In article <dczq02zP01Hc01@JUTS.ccc.amdahl.com>, rbw00@ccc (  213  Richard Wilmot) writes:
>We always seem to exceed any addressing limit we can imagine.

This is very true, at least so far.

>I still think it is time to stop trying to use integers for addressing.
>They always break down and probably always will. Many computers today have
>floating point units. I would like to see floating point used for addressing.
>It would help a great deal if addresses were not dense. What I have in mind
>is for data objects to have addresses but these would be floating point and
>between any two objects we could *always* fit another new object. In this way
>I could expand a file from a hundred bytes to a hundred gigabytes without
>changing the addresses of any stored objects.

Um, I think that between any two *real* numbers you can always find another
real number.  Real numbers are a mathematical concept, and what is called
floating point on computers merely implements a useful approximation of them.
On computers with floating point, it is most definitely not the case that you
can always fit another floating point number between two other such numbers.
These things take up a finite number of bits, right, and that means there's a
finite limit to their ordinality.
--
Chris Jones    clj@ksr.com    {world,uunet,harvard}!ksr!clj

usenet@nlm.nih.gov (usenet news poster) (08/08/90)

In article <40644@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>However, I'd STRONGLY disagree with the idea that 64-bit machines will
>remain confined to the super- & minisuper world for 10-20 more years.
>I propose instead:
>	a) We are currently consuming address space at the rate of 1 bit year.
>	b) Plenty of applications already exist for workstation-class machines,
>	for which the devlopers bitterly complain that they only have
>	31 or 32 bits of virtual address space, ...
>
>So, here's my counter-prediction to the idea that it will be 10-20 years:
>
>No later than 1995:
>	1) There will be, in production, 64-bit microprocessors
>	(and I mean 64-bit integers & pointers, not just 64-bit
>	datapaths, which micros have had for years in FP).

Maybe, but aside from address generation and floating point, what are
people going to do with all those bits?  Setting aside address arithmatic,
most of the time you don't need 32 bit integers and lots of work involves
bytes or smaller (character strings etc.).

>[...]
>So, here's a thought to stimulate discussion:
>	What applications (outside the scientific / MCAD ones that
>	can obviously consume the space) would benefit from 64-bit
>	machines?

Using 64-bit chunks makes me wonder about non-numeric data representations.  
You can pack 8-12 characters of text in 64-bits, enough for most English 
words.  Or how about small images (8x8 or 7x9 fonts etc.)? Pattern recognition
using small neural nets operating on one or two registers of input data?  
An instruction set rich in bit manipulation could be a big help in 
exploiting these possibilities.

>-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>

David States

peter@ficc.ferranti.com (Peter da Silva) (08/08/90)

In article <1990Aug8.010203.18560@rice.edu> cliffc@sicilia.rice.edu (Cliff Click) writes:
> In article <dczq02zP01Hc01@JUTS.ccc.amdahl.com> rbw00@JUTS.ccc.amdahl.com (  213  Richard Wilmot) writes:
> >So when will I see fractional addressing?

When someone invents a floating point unit that maps to the real number system.

> Humm... infinite precision fractional addressing... isn't this equivalent to
> using "bignums"?

Or Ted Nelson's "Tumblers"?

"Any problem in programming can be solved with another level of indirection"
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
<peter@ficc.ferranti.com>

wayne@dsndata.uucp (Wayne Schlitt) (08/08/90)

In article <40644@mips.mips.COM> mash@mips.COM (John Mashey) writes:
> In article <13667@cbmvax.commodore.com> jesup@cbmvax (Randell Jesup) writes:
> >...
> >[ ... ]       (I hope we haven't started the 32+ addr bit rwars again...)
> 
> Well, there are always less higher-end things than lower-end ones.
> However, I'd STRONGLY disagree with the idea that 64-bit machines will
> remain confined to the super- & minisuper world for 10-20 more years.
                                                      ^^^^^
yes, 20 years is probably too long but i think that 10 years isnt too
far off the mark.  my guess is that it will be 7-10 years before 64bit
computers are making inroads into the 32bit market.  (maybe by then we
will have finally gotten away from 16bit computers.  1/2 :-)

> I propose instead:
> 	a) We are currently consuming address space at the rate of 1 bit year.
i thought it was only 1 bit per 18 months...  new data?

>       b) [says that there are commercial people who already want
>       31-32 bits of virtual address space, and they want more.]
and they are putting these applications on everyone's desk...?  or are
these applications things that they would have run on mainframes or
super-mini's if your "killer-micro's" werent chosen?

> 	c) Observe that there already exist desktop workstations that
> 	support max physical memories in the 128MB - 512MB range,
> 	using 4Mb DRAMs. Hence, by the 64Mb DRAM generation, one can expect
> 	2GB - 8GB maxes.  After all, at that point, you can get 4GB or
> 	so within a 1-ft cube.
using your own numbers, at one bit a year it will be 3-5 years before
physical memory _maximums_ will reach 4GB.  at one bit every 1.5
years, it will be more like 4.5-7.5 years.  how long do you think it
will be before 4GB is typical?

also, when do you expect 64Mb DRAMS to come out?  my guess would be
around 5-6 years or so, and then it will take at least a year before
they have ramped up production to the point that they are cheaper than
16Mb DRAMS.  (as a reference point, would you consider 4Mb DRAMS
"common" now?  when do you think that 4Mb DRAMS became or will become
"common"?) 

> [ .... ]

-wayne

rminnich@super.ORG (Ronald G Minnich) (08/08/90)

In article <13667@cbmvax.commodore.com> jesup@cbmvax (Randell Jesup) writes:
>	I submit that your situation is something of an unusual case, and is
>likely to remain unusual for at least a decade, perhaps 2.  Few machines
>(percentage-wise) even have 4 GB of storage, let alone files larger that 4GB
>(I've never even seen a file larger than 100MB, even on mainframes).

I couldn't disagree with you more. Now that storage is about $5K/gigabyte
for Sun file servers (if you don't buy from Sun, that is) i would expect 
4 Gb to be common. We have no Sun file servers here with < 4 Gb any more. 
I am using a demo Decstation 5000 right now with this little itty bitty box
which contains a 1Gb disk.

But that is a side issue. More important issue: suppose I find that there
is a 6 Gb file at NCAR which shows a really neat ocean model. It is there,
my workstation is here, so what do i do? Nowadays you do the easy thing:
ftp it over the net. YYYYUUUUCCCCKKKK. No, wait, i forgot: buy plane 
tickets to Colorado. Now that is fun, but you have just left your 
entire environment behind in (my case) Bowie, Md. That is no good either: 
now i have to ftp my environment to Colorado!

What I *want* to do is say: "when this
program runs, please associate this 6Gb chunk of its address space with 
that file over there on NCAR". Problem solved. Only I don't have any 
architectures that will let me, because of architectural limitations. 
Well, maybe PA can do it, with its ability to address
billions and billions  of segments each of which can contain
billions and billions of bytes. 

There are good reasons to have more than 32 bits *now*. 
ron
-- 
1987: We set standards, not Them. Your standard windowing system is NeUWS.
1989: We set standards, not Them. You can have X, but the UI is OpenLock.
1990: Why are you buying all those workstations from Them running Motif?

tom@nw.stl.stc.co.uk (Tom Thomson) (08/08/90)

In article <40644@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>	using 4Mb DRAMs. Hence, by the 64Mb DRAM generation, one can expect
>	2GB - 8GB maxes.  After all, at that point, you can get 4GB or
>	so within a 1-ft cube.
I sure do hope that physical dimension is wrong!  If I can't do it in less
than half a cubic foot by 1996 I've got problems. Of course, I don't believe
I have any problem at all here.
 
Tom  [tom@nw.stl.stc.co.uk

aglew@dual.crhc.uiuc.edu (Andy Glew) (08/08/90)

..> Who can use 64 bit machines?

As David States points out (and several comp.arch readers who have
worked on machines with large registers have testified in the past) 64
bit registers are very nice for manipulating strings in, since many
strings are shorter than 8 characters.
    Store partial operations (with or without an implied shift) are
useful.
    COBOL like fixed field width operations are particularly well
suited to large register widths, although null terminated C-style
strings can easily be handled by a number of operations that have been
added to RISCs like HP's.

By the way, can anyone recall the name of the PDP-11 retrospective
that advocated separate address and data registers?  If I remember
correctly, they figured that 32 bit addresses were needed, but 16 bit
integers were enough for most people's needs, so why provide things
like a 32x32 multiply?  (I may have misremembered, and they may have
advocated 16 bit addresses but 32 bit integers, but I don't think so).
I will admit that this paper influenced me for quite a while (in fact,
still does, in a mixed sense); I do not know for sure, but I would
also reckon that it influenced the design of two popular
microprocessor families.

--
Andy Glew, andy-glew@uiuc.edu

Propaganda:
    
    UIUC runs the "ph" nameserver in conjunction with email. You can
    reach me at many reasonable combinations of my name and nicknames,
    including:

    	andrew-forsyth-glew@uiuc.edu
    	andy-glew@uiuc.edu
    	sticky-glue@uiuc.edu

    and a few others. "ph" is a very nice thing which more USEnet
    sites should use.  UIUC has ph wired into email and whois (-h
    garcon.cso.uiuc.edu).  The nameserver and full documentation are
    available for anonymous ftp from uxc.cso.uiuc.edu, in the net/qi
    subdirectory.

aglew@dual.crhc.uiuc.edu (Andy Glew) (08/08/90)

..> Who can use 64 bit machines?

As David States points out (and several comp.arch readers who have
worked on machines with large registers have testified in the past) 64
bit registers are very nice for manipulating strings in, since many
strings are shorter than 8 characters.
    Store partial operations (with or without an implied shift) are
useful.
    COBOL like fixed field width operations are particularly well
suited to large register widths, although null terminated C-style
strings can easily be handled by a number of operations that have been
added to RISCs like HP's.

By the way, can anyone recall the name of the PDP-11 retrospective
that advocated separate address and data registers?  If I remember
correctly, they figured that 32 bit addresses were needed, but 16 bit
integers were enough for most people's needs, so why provide things
like a 32x32 multiply?  (I may have misremembered, and they may have
advocated 16 bit addresses but 32 bit integers, but I don't think so).
I will admit that this paper influenced me for quite a while (in fact,
still does, in a mixed sense); I do not know for sure, but I would
also reckon that it influenced the design of two popular
microprocessor families.
    Any bets on whether people will take the same limiting strategy (I
won't call it a mistake, because it might be right for short-term
goals) by providing 32 bit data registers and 64 bit address
registers?

--
Andy Glew, andy-glew@uiuc.edu

Propaganda:
    
    UIUC runs the "ph" nameserver in conjunction with email. You can
    reach me at many reasonable combinations of my name and nicknames,
    including:

    	andrew-forsyth-glew@uiuc.edu
    	andy-glew@uiuc.edu
    	sticky-glue@uiuc.edu

    and a few others. "ph" is a very nice thing which more USEnet
    sites should use.  UIUC has ph wired into email and whois (-h
    garcon.cso.uiuc.edu).  The nameserver and full documentation are
    available for anonymous ftp from uxc.cso.uiuc.edu, in the net/qi
    subdirectory.

rminnich@super.ORG (Ronald G Minnich) (08/09/90)

In article <1990Aug7.205747.14206@caen.engin.umich.edu> pha@caen.engin.umich.edu (Paul H. Anderson) writes:
>We don't have a real problem optimizing use of the hardware we
>have now, it is just that the available hardware has too little
>RAM, or has filesystems that are too slow.
And once it gets enough ram we will run out of address bits again! this 
ds 5000 on my desk as 128 mb of memory, and can have 512 mb. That is getting
uncomfortably close to running out of address bits. I figure we will be there 
in two years, at a bit a year. I guess there was a 32-bit war here before,
judging by earlier comments,  but fact is we are about to run out.
ron
-- 
1987: We set standards, not Them. Your standard windowing system is NeUWS.
1989: We set standards, not Them. You can have X, but the UI is OpenLock.
1990: Why are you buying all those workstations from Them running Motif?

seibel@cgl.ucsf.edu (George Seibel) (08/09/90)

In article <5286@mace.cc.purdue.edu> nvi@mace.cc.purdue.edu (Charles C. Allen) writes:
>> 	I submit that your situation is something of an unusual case, and is
>> likely to remain unusual for at least a decade, perhaps 2.  Few machines
>> (percentage-wise) even have 4 GB of storage, let alone files larger that 4GB
>> (I've never even seen a file larger than 100MB, even on mainframes).
>
>Until recently, the "standard" media for transporting files has been
>9-track 6250 tape, which holds around 200M.  Until recently, all our
>data files were less than 200M (hmm... I hope you see the
>correlation).  Now that we have some 8mm tape drives, we routinely
>have 400M files.  We'd have bigger ones, but all our disks are little
>SCSI 600-700M thingies (access time is not very critical), and we
>can't easily have a single file span volumes.  This is for high energy
>physics data analysis.

The important question here is: "what are these large files worth to you?"
It sounds as though you've always had datasets larger than the limits
imposed on you by hardware/software, and that you likely got by in the
past (and present) by splitting data into multiple files.   I generate
a lot of data from MD simulations, but find that it's more convenient
to split it into manageable chunks that are far smaller than 4GB.  The
size of "manageable" is of course determined by a variety of hardware/
software performance/capacity issues, plus economics and politics.
At any rate, I've been splitting data files up for years, and I bet
everyone else has been as well.   I already have the software in place 
to deal with multiple files, and don't expect that the ability to have
a gigantic single file will make a vast improvement in my life.   I'm
sure that someone out there needs huge files, but I also suspect there
is a price to be paid for going to the next higher increment of address
size.   I would rather not pay that price until the performance level
of network, cpu, memory, mass storage, etc has come to such a level
that my "manageable" chunks of data are approaching the GB range.  I
guess it's up to market analysis to decide when "enough" people have
reached the point where the benefits of a larger address space are
worth the cost.   This will of course depend on the good work of you
designers and engineers.   It's a balancing act.

George Seibel, UCSF
seibel@cgl.ucsf.edu

jonah@dgp.toronto.edu (Jeff Lee) (08/09/90)

rminnich@super.ORG (Ronald G Minnich) writes:

> [...] More important issue: suppose I find that there
>is a 6 Gb file at NCAR which shows a really neat ocean model. It is there,
>my workstation is here, so what do i do? Nowadays you do the easy thing:
>ftp it over the net. YYYYUUUUCCCCKKKK. No, wait, i forgot: buy plane 
>tickets to Colorado. Now that is fun, but you have just left your 
>entire environment behind in (my case) Bowie, Md. That is no good either: 
>now i have to ftp my environment to Colorado!

>What I *want* to do is say: "when this
>program runs, please associate this 6Gb chunk of its address space with 
>that file over there on NCAR". Problem solved. [...]

Given the current data+program+interface modularization, there are at
least four options:

1) hire a station-wagon full of mag-tapes (or send a DAT by over-night
courier) [6MB would saturate a T1 line (1.5Mbit/sec) for 9.1 hours.]

2) split between the data and program (e.g. with distributed shared memory)

3) split between the program and interface (e.g. with Plan 9, X, or NeWS)

4) plan a quick holiday in Colorado (with Plan 9, your environemnt
follows you automatically)

It depends on where the data-flow volumes are and what are the cost
breaks.  If you plan to use all of the data more than once, grabbing
your own copy is not a bad idea.  If you are going to analyse the some
or all of the data set and plot summary results, do it remotely and
ship the plot back (batch or real-time).  Only if you are planning to
randomly access a *small* portion of this database does mapping all 6GB
into your address space make sense.  [And if you are randomly
accessing a small part, then a remote file system might work almost as
well as mapping it into memory.]

I will agree though that most present operating systems will choke on
the idea of a single 6GB random-access file -- or a 6GB virtual memory
image.

j.

mash@mips.COM (John Mashey) (08/09/90)

In article <3293@stl.stc.co.uk> "Tom Thomson" <tom@stl.stc.co.uk> writes:
>In article <40644@mips.mips.COM> mash@mips.COM (John Mashey) writes:
>>	using 4Mb DRAMs. Hence, by the 64Mb DRAM generation, one can expect
>>	2GB - 8GB maxes.  After all, at that point, you can get 4GB or
>>	so within a 1-ft cube.
>I sure do hope that physical dimension is wrong!  If I can't do it in less
>than half a cubic foot by 1996 I've got problems. Of course, I don't believe
>I have any problem at all here.

Of course.  The comments wasn't intended to be a close-order estimate of
the space, merely to note that it would be easy to get a lot of memory in
a small box.:-)  Note that if you use 64-bit wide memories (+ byte parity, to
end up using 72 bits wide), and if you assume 64Mb-by-1 (which may or
may not be best assumption), then the "natural" memory increment
is 512MB (+ parity bits), using 72 DRAMs (8 SIMMs). Now, a MIPS Magnum, a
desktop workstation has 32 SIMM slots (to get 32MB with 1Mb, 128MB with
4MB DRAMs), arranged in two rows of 16.  Looks like the 1996 version
could have 4X512MB = 2GB in that same space, and in fact, one
would really need to get up around 8GB- 16GB to get close to a
cubic foot.  of course, you'd certainly want ECC memory instead, with
such sizes, so some additional space would get chewed up.

Although the rate of improvement in DRAM cost/bit seems to be slowing a bit,
it's still OK.  Of course, even if all of this is off a year or two,
it still means that one will fairly soon (1995 is about as far away
in one direction as the early commercial RISCs were in the other....)
be able to easily build desktop/deskside computers whose memories
are in current-supercomputer-or-bigger ranges....

Exercise 1: using the chart on page 55 of Patterson&Hennessy,
predict the cost of the memory in the 512MB "entry system" described
above (assuming it was parity, for simplicity), using 64Mb DRAMs.
Hint: the cost certainly depends on the date!
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

mash@mips.COM (John Mashey) (08/09/90)

In article <WAYNE.90Aug8085346@dsndata.uucp> wayne@dsndata.uucp (Wayne Schlitt) writes:
>> Well, there are always less higher-end things than lower-end ones.
>> However, I'd STRONGLY disagree with the idea that 64-bit machines will
>> remain confined to the super- & minisuper world for 10-20 more years.
                                                      ^^^^^
>yes, 20 years is probably too long but i think that 10 years isnt too
>far off the mark.  my guess is that it will be 7-10 years before 64bit
>computers are making inroads into the 32bit market.  (maybe by then we
>will have finally gotten away from 16bit computers.  1/2 :-)
Well, 7 isn't too far from 5.

>> I propose instead:
>> 	a) We are currently consuming address space at the rate of 1 bit year.
>i thought it was only 1 bit per 18 months...  new data?
Rough rule in any case.  Hennessy&Patterson claim (page 16):
	"This translates to a consumption of address bits at a rate of
	1/2 bit to 1 bit per year."
Hennessy always says informally, that the old rule of 1/2 bit per year
has tended to shift more towards 1 bit per year with MOS memories.
In any case, this is a vague enough metric that these are in the same
ballpark. :-)
>
>>       b) [says that there are commercial people who already want
>>       31-32 bits of virtual address space, and they want more.]
>and they are putting these applications on everyone's desk...?  or are
>these applications things that they would have run on mainframes or
>super-mini's if your "killer-micro's" werent chosen?
Well, the one's I've heard of weren't for everybody's desk, but
they would have liked them to be on some people's desks, and were
running on architectures suitable for the desktop.
>
>> 	c) Observe that there already exist desktop workstations that
>> 	support max physical memories in the 128MB - 512MB range,
>> 	using 4Mb DRAMs. Hence, by the 64Mb DRAM generation, one can expect
>> 	2GB - 8GB maxes.  After all, at that point, you can get 4GB or
>> 	so within a 1-ft cube.
>using your own numbers, at one bit a year it will be 3-5 years before
>physical memory _maximums_ will reach 4GB.  at one bit every 1.5
>years, it will be more like 4.5-7.5 years.  how long do you think it
>will be before 4GB is typical?
Well, I don't think 4GB/desktop will be "typical" for a long time, if ever.
I do think that there will be reasonable numbers of machines whose
maximum memories are in this range, for a whole bunch of the typical
reasons that cause systems to be built in certain ways.
a) Note that the "bit/year" rule is really applicable to virtual memory,
not necessarily physical memory, although the latter certainly correlate
with the former.  (Old saw: virtual memory is a way of selling more physical
memory.)
b) Anyone building a system will typically design it for at least 2
generations of DRAMS.  Right now, at least DEC, MIPS, and Sun build
desktops that use either 1Mb or 4Mb chips to cover various ranges.
I'm sure most everybody else does, also.
>
>also, when do you expect 64Mb DRAMS to come out?  my guess would be
>around 5-6 years or so, and then it will take at least a year before
>they have ramped up production to the point that they are cheaper than
>16Mb DRAMS.  (as a reference point, would you consider 4Mb DRAMS
>"common" now?  when do you think that 4Mb DRAMS became or will become
>"common"?) 

I'd consider 4MB DRAMs "common" (not "prevalent"): multiple vendors
have been delivering them in desktop systems already. (Some of the
very first MIPS Magnums that got sold had 128MB maxed-out memories in
them :-)
Suppose you
get 16Mb chips in the same state in 1993-1994, and 64Mb chips in
1996-1997, or 1995 if you're really lucky.

Certainly, people who design systems tend to allow for at least 2 DRAM
sizes in boards, so things designed 1 year before 64MB chips become practical
will allow for them.  All of this says that a Magnum-like design
appearing in 1995, would like come out the door to use 16Mb chips,
which would give a max memory of 512MB, and then upgrade to 2GB max.
A DECstation5100-like design has space for 4X more memory.

Again, I make no claims that such would be "typical" (whatever that is).
However, people like to buy systems whose max memories are bigger than
typical to leave them room for growth.

Application areas that will tend to want this stuff quickly include:
ECAD, MCAD, Image applications, geographic information systems,
financial modeling, as well as databases.  Observe that large memories
are one of the few obvious helps for DBMS read-performance assistance,
so you'll see it in the commercial world, as well.

(back to virtual-address space)
Finally, all of the economics of the business say that people like to have
ranges of machines that can run the same software.
It may well be that you may well have servers that have massive amounts
of memory (sorry, lots of people WILL have servers with massive
amounts of memory), and smaller desktop/desksides, but you'd certainly
like to run the same applications on both at least some of the time,
even if you back up the desktop with less physical memory.
Again that's why I'd claim that 1995-micros will want to either be
64-bit ones, or least have 64-bit modes.  Note that with the number of
tranisistors likely to be available, you can probably stuff a 32-bit
CPU in the corner of your 64-bit one to handle backward compatibility.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (08/09/90)

In article <1990Aug8.042631.7093@nlm.nih.gov>, usenet@nlm.nih.gov (usenet news poster) writes:
> Setting aside address arithmatic,
> most of the time you don't need 32 bit integers and lots of work involves
> bytes or smaller (character strings etc.).

For character strings, note that ANSI C's wchar_t type is already bigger
than a byte, and that this is driven by user demand (for Kanji), and that
the ISO 10646 standard which is currently under development encodes
characters in *32* bits.  (I do hope that ISO 10646 will include the
cuneiform characters, they've room for them, after all...)

Given that COBOL requires support for a minimum 18 decimal digits,
64 bits sounds just about right for "ordinary" programs.

For several years I had the pleasure of using a machine with 40 bit
integers (39 bits of magnitude, 1 bit of sign, in a 48-bit word) and
somehow it was never _quite_ enough.  Give people 64 bits and they'll
use them, don't you worry about that.
-- 
Taphonomy begins at death.

rminnich@super.ORG (Ronald G Minnich) (08/09/90)

In article <1990Aug8.195229.23544@jarvis.csri.toronto.edu> jonah@dgp.toronto.edu (Jeff Lee) writes:
>1) hire a station-wagon full of mag-tapes (or send a DAT by over-night
>courier) [6MB would saturate a T1 line (1.5Mbit/sec) for 9.1 hours.]
T1 is now slow.
6 GB would saturate an NREN for about 50 seconds. If you think I am saying
that mapping the 6GB NCAR file in is equivalent to copying the whole thing
over when your program starts up, I am not. That is just another form 
of FTP. 
>2) split between the data and program (e.g. with distributed shared memory)
Well, I am partial to that solution, not the least because hardware
implementations of such things have been done at least once, and I can't 
see using read and write calls to drive an NREN. 
>4) plan a quick holiday in Colorado (with Plan 9, your environemnt
>follows you automatically)
How many megabytes have to follow me? Sounds like the same problem to me.

>Only if you are planning to
>randomly access a *small* portion of this database does mapping all 6GB
>into your address space make sense.  
That depends to some extent on what you think your network is. 
For T1 i would agree with you. On a hypothetical 1Gb network I am no longer 
so sure. 
>I will agree though that most present operating systems will choke on
>the idea of a single 6GB random-access file -- or a 6GB virtual memory
>image.
More important, few architectures can accomodate it well. For instance, 
even 8Kb pages are silly in this case. But they are too big for most 
other cases. Does the small page/large page split deserve another look? 
I have seen one VM architecture recently in which the idea of 
paging is abandoned completely because it makes no sense in a large
memory environment. Note that VM was NOT abandoned on this machine.
ron
-- 
1987: We set standards, not Them. Your standard windowing system is NeUWS.
1989: We set standards, not Them. You can have X, but the UI is OpenLock.
1990: Why are you buying all those workstations from Them running Motif?

djh@osc.edu (David Heisterberg) (08/09/90)

In article <13667@cbmvax.commodore.com>, jesup@cbmvax.commodore.com (Randell Jesup) writes:
> 	I submit that your situation is something of an unusual case, and is
> likely to remain unusual for at least a decade, perhaps 2.  Few machines
> (percentage-wise) even have 4 GB of storage, let alone files larger that 4GB
> (I've never even seen a file larger than 100MB, even on mainframes).

Hang around with some quantum chemists sometime.  Files larger than 1GB are
routine.  In recent years so-called direct SCF methods have become popular
(again?) because the study of large molecules results in enormous files for
the two-electron integrals.  The direct methods simply recalculate the
integrals whenever needed.  This is ok for simple SCF calculations, and
Gaussian 90 will have direct MP2, but direct CI and CC is going to be
tough.  Realistic calculations could benefit from address spaces (and real
memory) of 16 GB or more - there are folks who could use that capability
right now.
-- 
David J. Heisterberg		djh@osc.edu		And you all know
The Ohio Supercomputer Center	djh@ohstpy.bitnet	security Is mortals'
Columbus, Ohio  43212		ohstpy::djh		chiefest enemy.

kitchel@iuvax.cs.indiana.edu (Sid Kitchel) (08/09/90)

In article <5286@mace.cc.purdue.edu> nvi@mace.cc.purdue.edu (Charles C. Allen) writes:
|> 	I submit that your situation is something of an unusual case, and is
|> likely to remain unusual for at least a decade, perhaps 2.  Few machines
|> (percentage-wise) even have 4 GB of storage, let alone files larger that 4GB
|> (I've never even seen a file larger than 100MB, even on mainframes).
|
|Until recently, the "standard" media for transporting files has been
|9-track 6250 tape, which holds around 200M.  Until recently, all our
|data files were less than 200M (hmm... I hope you see the
|correlation).  Now that we have some 8mm tape drives, we routinely
|have 400M files.  We'd have bigger ones, but all our disks are little
|SCSI 600-700M thingies (access time is not very critical), and we
|can't easily have a single file span volumes.  This is for high energy
|physics data analysis.

	Ah, the joy of isolation at Purdue!!  Here at Indiana University
we have something called Sociology. Some sociologists have developed the
nasty habit of investigating the U.S. Census data. Currently I'm working
with a group studying a fairly restricted set of county data from the
1970 Census that is 6 tapes long. These are 9-track 6250 bpi tapes. The
Census bureau makes standard extracts each census year that are often
20 to 30 tapes long.
	Yes, Virginia, big files do exist!  And our best guess is that
some of them will only get bigger.
					Now if VMS only had tape handling...
						--Sid

-- 
Sid Kitchel...............WARNING: allergic to smileys and hearts....
Computer Science Dept.                         kitchel@cs.indiana.edu
Indiana University                              kitchel@iubacs.BITNET
Bloomington, Indiana  47405-4101........................(812)855-9226

jr@oglvee.UUCP (Jim Rosenberg) (08/11/90)

In <dczq02zP01Hc01@JUTS.ccc.amdahl.com> rbw00@ccc.amdahl.com (  213  Richard Wilmot) writes:

>I still think it is time to stop trying to use integers for addressing.
>They always break down and probably always will. Many computers today have
>floating point units. I would like to see floating point used for addressing.

Excuse me?  You're proposing that addressing be based on *INEXACT*
arithmetic??  Sure sounds like a can of worms to me!  In scientific
programming one has to be careful to test floating point numbers for
difference within some epsilon rather than for absolute equality.  Not being
able to test addresses for exact equality seems like a fatal weakness, IMHO.
You could get around this problem by just using the fraction and exponent as
"keys" to the address, stripped of their floating point semantic content.  But
all this does is give you an integer with as many bits as the fraction + the
exponent could be.  To be assured you could always interpose something between
two addressed entities you *need* floating point semantics.  And in fact you
obviously want them.

How are you proposing this would work???

(Alas, this discussion may not belong in comp.databases ...)
-- 
Jim Rosenberg             #include <disclaimer.h>      --cgh!amanue!oglvee!jr
Oglevee Computer Systems                                        /      /
151 Oglevee Lane, Connellsville, PA 15425                    pitt!  ditka!
INTERNET:  cgh!amanue!oglvee!jr@dsi.com                      /      /

jkrueger@alxfac.UUCP (Jon Krueger) (08/11/90)

dave@fps.com (Dave Smith) writes:

>In article <dczq02zP01Hc01@JUTS.ccc.amdahl.com> rbw00@JUTS.ccc.amdahl.com (  213  Richard Wilmot) writes:
>>I still think it is time to stop trying to use integers for addressing.
>>They always break down and probably always will. Many computers today have
>>floating point units. I would like to see floating point used for addressing.

>This won't work.  There are only so many distinct numbers representable by
>floating point.

Richard implied the usual exponent and mantissa
representation of floats.  One might use two bignums instead.
Their ratio represents all rational numbers exactly: arbitrary
precision, no overflow, no underflow, no loss of precision.
High cost?  TANSTAAFL.  Consider associative arrays.

-- Jon

ge@phoibos.cs.kun.nl (Ge Weijers) (08/13/90)

davecb@yunexus.YorkU.CA (David Collier-Brown) writes:

>	I confess I'd have **real** trouble selling a raw bullet file system
>to a customer doing anything but cad/cam, software development[2] or small
>databases. 

In the context of Amoeba no limit is posed on the number of differently
implemented file systems. Bullet is certainly not the solution to all
storage problems. It is supposed to support a limited set of operations
(e.g. loading code images from disk) very quickly.

So you'd sell him a DB file system AND Bullet for the 'normal' files.
Directories are stored elsewhere, so the DB files and the other files
can still be stored in the same directory.

Ge'
Ge' Weijers                                    Internet/UUCP: ge@cs.kun.nl
Faculty of Mathematics and Computer Science,   (uunet.uu.net!cs.kun.nl!ge)
University of Nijmegen, Toernooiveld 1         tel. +3180612483 (UTC+1,
6525 ED Nijmegen, the Netherlands               UTC+2 march/september

ge@phoibos.cs.kun.nl (Ge Weijers) (08/13/90)

rminnich@super.ORG (Ronald G Minnich) writes:

]And once it gets enough ram we will run out of address bits again! this 
]ds 5000 on my desk as 128 mb of memory, and can have 512 mb. That is getting
]uncomfortably close to running out of address bits. I figure we will be there 
]in two years, at a bit a year. I guess there was a 32-bit war here before,
]judging by earlier comments,  but fact is we are about to run out.

I propose using 256 bits. With one atom/bit storage the universe
will not support more than a few thousand PCs. Of course there should
be an option to use segments (max 2^256 for uniformity's sake).

Ge'

Ge' Weijers                                    Internet/UUCP: ge@cs.kun.nl
Faculty of Mathematics and Computer Science,   (uunet.uu.net!cs.kun.nl!ge)
University of Nijmegen, Toernooiveld 1         tel. +3180612483 (UTC+1,
6525 ED Nijmegen, the Netherlands               UTC+2 march/september