[comp.databases] Extremely Fast Filesystems

davecb@yunexus.YorkU.CA (David Collier-Brown) (07/31/90)

In <5465@darkstar.ucsc.edu> Craig Partridge writes:
|  I'm curious.  Has anyone done research on building extremely
|  fast file systems, capable of delivering 1 gigabit or more of data
|  per second from disk into main memory?  I've heard rumors, but no
|  concrete citations.

puder@zeno.informatik.uni-kl.de (Arno Puder) writes:
| Tanenbaum (ast@cs.vu.nl) has developed a distributed system called
| AMOEBA. Along with the OS-kernel there is a "bullet file server".
| (BULLET because it is supposed to be pretty fast).
| 
| Tanenbaum's philosophy is that memory is getting cheaper and cheaper,
| so why not load the complete file into memory? This makes the server
| extremely efficient. Operations like OPEN or CLOSE on files are no
| longer needed (i.e. the complete file is loaded for each update).

  Er, sorta...  You could easily write an interface that did writes
or reads without open or closes, for some specific subset of uses.

| 
| The benchmarks are quite impressive although I doubt that this
| concept is useful (especially when thinking about transaction
| systems in databases).

  Well, I have something of the opposite view: a system like Bullet makes a
very good substrate for a database system.  

  The applicable evidence is in the article "Performance of an OLTP
Application on Symmetry Multiprocessor System", in the 17th Annual
International Symposium on Computer Architecture, ACM SigArch Vil 18 Number
2, June 1990. (see, a reference (:-))
  The article uses all-in-memory databases in the TP1 benchmark as a
limiting case while investigating the OS and architectural support that
are necessary for good Transaction Processing speeds, and the speeds
are up in the range that Craig may find interesting...

  My speculation is that a bullet-like file system with a relation-
allocating layer (call it the Uzi filesystem? the speedloader filesystem??)
on top would make a very good platform for a relational database.  Certainly
the behavior patterns of an in-memory, load-whole-relation database would be
easy to reason about, and would be easy and interesting to investigate.

| You can download Tanenbaum's original paper (along with a "complete"
| description about AMOEBA) via anonymous ftp from midgard.ucsc.edu
| in ftp/pub/amoeba.


--dave
-- 
David Collier-Brown,  | davecb@Nexus.YorkU.CA, ...!yunexus!davecb or
72 Abitibi Ave.,      | {toronto area...}lethe!dave 
Willowdale, Ontario,  | "And the next 8 man-months came up like
CANADA. 416-223-8968  |   thunder across the bay" --david kipling

rbw00@ccc.amdahl.com ( 213 Richard Wilmot) (08/08/90)

davecb@yunexus.YorkU.CA (David Collier-Brown) wrote:

> In article <30728@super.ORG> rminnich@super.UUCP (Ronald G Minnich) writes:
> [in a discussion of Tannenbaum's Bullet fileserver]
> | >This is very elegant, but there is
> | >a problem. We're running out of address bits again.
>
    ... stuff deleted

> It's always a bad idea to put a hard addressing limit on things:
> Intel, based on their past needs, octupled the addressable memory available
> when they introduced the 8086, even though they expected 16k was adequate.
> Experience has shown them wrong (;-)).  They needed to increase it by a
> somewhat larger factor [see, we're back to computer architecture again].

  ... more deleted.

>--dave

Indeed. I began administering a database of 1,600 MB in 1974 for Kaiser
Medical of Northern California. I think their membership has grown from
the 1,000,000 active and 2,000,000 inactive of that time and Kaiser very
likely keeps much more data about each member. This was done with an IBM
370/158 computer having 1 MB of main memory. All of the performance parameters
of that hardware system can be easily exceeded by today's high-end PCs.
I am certainly willig to bet that they've exceeded the 4,300 MB addressing
limit of IBM's VSAM by now (for them there are ways around this).
We always seem to exceed any addressing limit we can imagine.

I still think it is time to stop trying to use integers for addressing.
They always break down and probably always will. Many computers today have
floating point units. I would like to see floating point used for addressing.
It would help a great deal if addresses were not dense. What I have in mind
is for data objects to have addresses but these would be floating point and
between any two objects we could *always* fit another new object. In this way
I could expand a file from a hundred bytes to a hundred gigabytes without
changing the addresses of any stored objects. An index pointing to objects
in such a file would never need to be adjusted as the file grew or shrank.
Plastic addressing. This could still be efficient since addresses are almost
never real anymore anyway. Addressing is used to locate things in high speed
processor caches, in virtual memory, in virtual memory and on disk drives
(and in disk controller caches). Integer addresssing is unsuited to all these
different tasks. Fractional addressing could be flexible enough to allow for
all these locational needs.

Some things are nicely stored by hashing instead of by b*tree organization
(e.g. person records by driver license number) (it minimizes update locking
problems prevalent in b*trees as well as saving one or more extra levels of
access. This is hard to do as a file grows but would be simple with a file
addressed by fractions (0.5 is *logically* half way through the file). I
think this was used by one of the graphics systems for describing picture
objects (GKS?).

So when will I see fractional addressing?
-- 
  Dick Wilmot  | I declaim that Amdahl might disclaim any of my claims.
                 (408) 746-6108

cliffc@sicilia.rice.edu (Cliff Click) (08/08/90)

In article <dczq02zP01Hc01@JUTS.ccc.amdahl.com> rbw00@JUTS.ccc.amdahl.com (  213  Richard Wilmot) writes:
>davecb@yunexus.YorkU.CA (David Collier-Brown) wrote:
>
>> In article <30728@super.ORG> rminnich@super.UUCP (Ronald G Minnich) writes:
>> [in a discussion of Tannenbaum's Bullet fileserver]
>> | >This is very elegant, but there is
>> | >a problem. We're running out of address bits again.
>>
>    ... stuff deleted
>
> [ ...stuff about using fractional addressing deleted... ]
>
>So when will I see fractional addressing?

Humm... infinite precision fractional addressing... isn't this equivalent to
using "bignums"?  Suppose I take a fractional addressing system, and generate
a fraction digit-by-digit, 2 digits for each letter of a person's name.  
Suddenly I have a unique fraction which is some permutation of a name.  In
other words I have a system where names are addresses, and since names have 
no length limit, I have no limit to my file sizes.  I say a rose by any other 
addressing mode is a rose; what your describing is nothing more than a 
key-access file system.

Cliff Infinite Fractions Click
-- 
Cliff Click                
cliffc@owlnet.rice.edu

dave@fps.com (Dave Smith) (08/08/90)

In article <dczq02zP01Hc01@JUTS.ccc.amdahl.com> rbw00@JUTS.ccc.amdahl.com (  213  Richard Wilmot) writes:
>I still think it is time to stop trying to use integers for addressing.
>They always break down and probably always will. Many computers today have
>floating point units. I would like to see floating point used for addressing.
>It would help a great deal if addresses were not dense. What I have in mind
>is for data objects to have addresses but these would be floating point and
>between any two objects we could *always* fit another new object. 

This won't work.  There are only so many distinct numbers representable by
floating point.  There will be points where there is no "room" between
two numbers because the granularity of the floating point doesn't allow
a number to be represented between them.  As a simple example with two digit
decimal floating point (two digits of mantissa, one digit of exponent),
find the number between 1.01x10^1 and 1.02x10^1.  With a 64-bit (combined
mantissa and exponent) floating point number there are 2^64 distinct numbers
that can be represented.  The range is very large, but the numbers are sparse.

I liked the idea of "tumblers" as put forth by the Xanadu project.  Variable
length indices, that's the only way to go.  I think I'll go over and hit
the hardware engineers over the head until they figure out a way to make
it fast :-).

--
David L. Smith
FPS Computing, San Diego        
ucsd!celerity!dave or dave@fps.com

clj@ksr.com (Chris Jones) (08/08/90)

In article <dczq02zP01Hc01@JUTS.ccc.amdahl.com>, rbw00@ccc (  213  Richard Wilmot) writes:
>We always seem to exceed any addressing limit we can imagine.

This is very true, at least so far.

>I still think it is time to stop trying to use integers for addressing.
>They always break down and probably always will. Many computers today have
>floating point units. I would like to see floating point used for addressing.
>It would help a great deal if addresses were not dense. What I have in mind
>is for data objects to have addresses but these would be floating point and
>between any two objects we could *always* fit another new object. In this way
>I could expand a file from a hundred bytes to a hundred gigabytes without
>changing the addresses of any stored objects.

Um, I think that between any two *real* numbers you can always find another
real number.  Real numbers are a mathematical concept, and what is called
floating point on computers merely implements a useful approximation of them.
On computers with floating point, it is most definitely not the case that you
can always fit another floating point number between two other such numbers.
These things take up a finite number of bits, right, and that means there's a
finite limit to their ordinality.
--
Chris Jones    clj@ksr.com    {world,uunet,harvard}!ksr!clj

peter@ficc.ferranti.com (Peter da Silva) (08/08/90)

In article <1990Aug8.010203.18560@rice.edu> cliffc@sicilia.rice.edu (Cliff Click) writes:
> In article <dczq02zP01Hc01@JUTS.ccc.amdahl.com> rbw00@JUTS.ccc.amdahl.com (  213  Richard Wilmot) writes:
> >So when will I see fractional addressing?

When someone invents a floating point unit that maps to the real number system.

> Humm... infinite precision fractional addressing... isn't this equivalent to
> using "bignums"?

Or Ted Nelson's "Tumblers"?

"Any problem in programming can be solved with another level of indirection"
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
<peter@ficc.ferranti.com>

jr@oglvee.UUCP (Jim Rosenberg) (08/11/90)

In <dczq02zP01Hc01@JUTS.ccc.amdahl.com> rbw00@ccc.amdahl.com (  213  Richard Wilmot) writes:

>I still think it is time to stop trying to use integers for addressing.
>They always break down and probably always will. Many computers today have
>floating point units. I would like to see floating point used for addressing.

Excuse me?  You're proposing that addressing be based on *INEXACT*
arithmetic??  Sure sounds like a can of worms to me!  In scientific
programming one has to be careful to test floating point numbers for
difference within some epsilon rather than for absolute equality.  Not being
able to test addresses for exact equality seems like a fatal weakness, IMHO.
You could get around this problem by just using the fraction and exponent as
"keys" to the address, stripped of their floating point semantic content.  But
all this does is give you an integer with as many bits as the fraction + the
exponent could be.  To be assured you could always interpose something between
two addressed entities you *need* floating point semantics.  And in fact you
obviously want them.

How are you proposing this would work???

(Alas, this discussion may not belong in comp.databases ...)
-- 
Jim Rosenberg             #include <disclaimer.h>      --cgh!amanue!oglvee!jr
Oglevee Computer Systems                                        /      /
151 Oglevee Lane, Connellsville, PA 15425                    pitt!  ditka!
INTERNET:  cgh!amanue!oglvee!jr@dsi.com                      /      /