[comp.databases] Databases: separate-file vs. monolithic file structure

tim@linus.sybase.com (Tim Wood) (09/02/88)

In article <6178@dasys1.UUCP> mtxinu!uunet!dasys1!alexis writes:

>
>There has been some discussion recently in comp.sys.mac concerning the relative
>merits of two different methods of storing database files: the monolithic 
>(all-in-one-file) way and the distributed (files-all-over-the-place) method. 
>...
>
>The benefits of monolithic structure are few. 

It's dangerous to lead with your conclusion.  Your position is not
100% false or true.  Mono vs. separate file performance is very
dependent on the underlying software platform (i.e. OS) and
its implementation.  I object to the use of "distributed" also;
this term as used with DBMS's refers to a database that is
physically fragmented across > 1 machine, but allows the 
user transparent access to it, i.e. access not requiring knowlege of
or even showing the physical dispersion.

I should mention I am from the POV of a transaction-processing 
relational DBMS vendor.  Many users, much traffic, much multi-user 
contention.  Issues are different from the PC/Mac DBMS ones.

>With the Monolithic structure, it is possible (or guaranteed, depending on 
>your particular choice >of DBMS) that you will corrupt some other portion 
>of your database. 

Again, you are assuming your conclusion.  Properly designed databases
decouple their indexes sufficiently so that a corrupt one may be
deleted and a new one created.  Now, an error can happen with a data
structure that the index and its table (or d.b.) share; that can lead 
to serious corruption.  But if such sharing is designed into 
a multi-file system, then the same vulnerability exists.  Also, the OS
could crash, leaving several files inconsistent.

>Good luck 
>finding out what got damaged- it may take you weeks to find it, 

Unless your system is properly instrumented, with automatic checkpointng,
and properly managed with regular backups-

>by which time all your backups may have the damage as well. 
>In rare cases you could trash your entire database....
>The danger, however, never goes away entirely. This reason alone should 
>convince most people.

Our customers don't seem to be convinced of the danger.  
Of course, most software is not bug-free, and that's where customer
support comes in.  The idea is to make the catastrophe scenario very
unlikely, with good software, fault-tolerance features, regular
checkpoints and backups.  Reducing the likelihood of catastrophe makes
the optimization gained from a single, specially-managed file worth
it.

>
>The other overriding reason to use a distributed structure is performance. If 
>your DBMS has to go through its own file-management code as well as the OS's, 
>it will always be slower than if it only needed to go through the OS. 

Unless you don't go thru the OS code.  Know what a "raw disk" (in UNIX)
or a "foreign mounted" disk (in VMS) is/are?    They are physical-block
interfaces to disks and disk partitions.  They allow I/O to/from the
user's own buffer rather than the disk cache's, and immediate write-thru
instead of being subject to the OS's caching policies.  Why use OS
services for a DBMS when they weren't written for a DBMS?  I think 
you are also exaggerating the trade-off.  Why are 2 directory lookups
(to reach table T in database D) faster than two file reads (1 for data
dictionary to get table location and the other for the table page).
Also with an OS approach, to get decent caching, you'd use up file 
descriptors very quickly keeping previously-used tables open.  
Not so if you have your own table handles within the DBMS, especially
if they are a configurable resource.

>...With a distributed structure, the middle step doesn't 
>exist. This time savings is not immense if the DBMS is very well-written, but 
>the vast majority are not (at least when it comes to speed optimization).

Herewith, I toot my own horn: Sybase is very well-written (having written 
about 8-10% myself and reviewed about 90).  The system was designed
with data integrity and performance as the primary goals.

>
>There is a much bigger performance gain for distributed structures in a 
>multiple-machine or multiple-hard-disk environment. ...
>	[ discussion of multiple-disk interleaving, RAM-disks,
>	  arm-waving around the problem of multi-user access management:
>...Of course there are 
>important logistics to consider, such as how to lock out access to data which 
>is temporarily invalid because it is being updated privately by that node, but 
>often this is not a problem at all. Even when it is, it's better that not being
>able to do the job at all. ]

Unless you don't mind inconsistent transactions and corrupt data.

As for the disk strategies, they're fine.  I should mention that a
mono organization should not prohibit you from using other disk 
devices (or files).  If you have two disks, you want
to lay things out so that they carry about the same load at any
given time.  A good mono organization allows this interleaving 
without forcing dependency on OS mechanisms for implementing it.

>
>There is one other very important reason to use a distributed structure that 
>comes to mind. Any monolithic structure will impose arbitrary restrictions on 
>the number of data files or fields (tables and columns) allowed in the 
>database. Sometimes these restrictions are very severe. Furthermore, if you 
>have a very large database, it may not fit on one physical disk, and with the 
>monolithic structure you are limited (generally) to one device. With the 
>distributed structure, these limitations just go away.

Continued simplistic tone.  OS's are software.  DBMS's are software.
Why can't a DBMS be written with the same liberal (or practically nonexistent)
limits that the magical OS managing the separate-file structure has?

>
>....For example, one of the big DBMSs 
>that runs under UNIX (RTI Ingres? Oracle?) bypasses the file system to write 
>directly to the disk (let's skip the technical details...). While this is like 
>the monolithic system in some ways, it still allows for some of the benefits of
>the distributed structure. UNIX wizards may have a lot to say about this...

That's us (at least).  We went so far as to modify the good ol' OS (Sun UNIX)
to support async I/O to raw disks, so that our (UNIX) process can
compute while the hardware deals with our last request(s).  This
feature is native to VMS, and we use it there too.

Really the issue here is OS independence, not how many files you use.
Our experience is that we know best how to deploy system resources
to run our software, not the writers of the host operating system,
fine as that OS may be for other purposes.  

-TW

Disclaimer: I am responsible for these opinions.
{ihnp4!pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
..not an @ in the bunch...

jkrueger@daitc.daitc.mil (Jonathan Krueger) (09/04/88)

In article <861@sybase.sybase.com> tim@linus.sybase.com (Tim Wood) writes:
>Herewith, I toot my own horn: Sybase is very well-written (having written 
>about 8-10% myself and reviewed about 90).  The system was designed
>with data integrity and performance as the primary goals.

And have you met those goals?  How do you measure your success?

-- Jon

-- 
Jonathan Krueger  uunet!daitc!jkrueger  jkrueger@daitc.arpa  (703) 998-4777

Inspected by: No. 15

alexis@dasys1.UUCP (Alexis Rosen) (09/07/88)

Recently, Tim Wood wrote a response to my article on database file structures
that was, I feel, much more on-target than the other response by Jon Krueger.
He missed the same thing as Jon, though- that I was really addressing smaller
machines. At any rate, I specifically excluded DBMSs which did raw I/O from my
analysis.

BTW, the raw I/O stuff looks like distributed structure to me. In fact, the 
real breakdown, I guess, is between products that layer two levels of file 
access on top of each other vs. the ones that only have one layer (i.e. they 
either use the native file system for each of their logical files, or do raw 
I/O and use their own file system).

In article <861@sybase.sybase.com>, Tim Wood (tim@linus.sybase.com) writes:
>In article <6178@dasys1.UUCP> mtxinu!uunet!dasys1!alexis writes:
>>There has been some discussion recently in comp.sys.mac concerning the
>>relative merits of two different methods of storing database files: the
>>monolithic(all-in-one-file) way and the distributed 
>(files-all-over-the-place) >method. [etc.]
>>The benefits of monolithic structure are few.
>
>It's dangerous to lead with your conclusion.  Your position is not
>100% false or true.  Mono vs. separate file performance is very
>dependent on the underlying software platform (i.e. OS) and
>its implementation.

Very true. But note my previous comments- for the general PC market, which is 
(right now) MS-DOS and MacOS, my original comclusions are true...

>                    I object to the use of "distributed" also;
>this term as used with DBMS's refers to a database that is
>physically fragmented across > 1 machine, but allows the
>user transparent access to it, i.e. access not requiring knowlege of
>or even showing the physical dispersion.

Also true. I was aware of this use of the term, which is why I was careful to 
describe exactly what I meant. Still, if you've got a clearer term, I'd be glad 
to adopt it. I'm not sure that just 'separate' fits the bill...

>I should mention I am from the POV of a transaction-processing
>relational DBMS vendor.  Many users, much traffic, much multi-user
>contention.  Issues are different from the PC/Mac DBMS ones.

I wish they weren't... I'd give an arm and a leg for a good fast transaction 
processing DBMS for the Mac. FoxBase has some promise, but it ain't there yet, 
not by a long shot. So when will Sybase run under Mac's OS, Tim?

> [Makes a good point about invulnerability of large-system DBMSs which has,
>  unfortunately for me, little to do with PCs]
>
>>Good luck
>>finding out what got damaged- it may take you weeks to find it,
>
>Unless your system is properly instrumented, with automatic checkpointng,
>and properly managed with regular backups-

Again, I just wish this kind of thing were available on smaller systems. I have 
a Mac II with 5 MB of RAM and 300 MB of disk. This is considerably more 
horsepower than a VAX 11/750 which I used to use. Why aren't such tools 
available?  The same could be said for the 25MHz '386 systems out there.

>>The other overriding reason to use a distributed structure is performance. If
>>your DBMS has to go through its own file-management code as well as the OS's,
>>it will always be slower than if it only needed to go through the OS.
>
>Unless you don't go thru the OS code.  Know what a "raw disk" (in UNIX)
>or a "foreign mounted" disk (in VMS) is/are?
> [followed by fairly good discussion of why raw I/O is Good.]

I specifically mentioned that raw I/O was excepted from my analysis.

>>There is a much bigger performance gain for distributed structures in a
>>multiple-machine or multiple-hard-disk environment. ...
>>	[ discussion of multiple-disk interleaving, RAM-disks,
>>	  arm-waving around the problem of multi-user access management:
>>...Of course there are
>>important logistics to consider, such as how to lock out access to data which
>>is temporarily invalid because it is being updated privately by that node,
>>but often this is not a problem at all. Even when it is, it's better that not
>>being able to do the job at all. ]
>
>Unless you don't mind inconsistent transactions and corrupt data.

Since Jon misunderstood this remark also, I guess I was very unclear here. 
What I was saying was that in many cases, with canned applications, you have 
knowledge that certain tables will never be written to except by one user 
(exceept on weekends, on Feb. 29, and during a Solar eclipse. Whatever...)
In this case, you need not establish a lock and can just do whateer you like 
without worries. This is what I meant by it being 'not a problem'.

>As for the disk strategies, they're fine.  I should mention that a
>mono organization should not prohibit you from using other disk
>devices (or files).  [etc.]

Well, it's easy enough to do disk striping, but I'm not convinced that the DBMS 
is always going to be smarter than I am. What about prioritization? I want 
certain users of table A, who don't need indices I, J, K, and L, to have very 
fast response time. Other users of A, who need I J K and L, can go hang because 
they aren't upper management. :-) In this case I want to put A on one spindle 
and everything else on another. I am not saying that it's impossible to write a 
DBMS which can handle things like this. I am saying that that would involve a 
fair amount of AI, and I haven't seen anything like it yet.

>>There is one other very important reason to use a distributed structure that
>>comes to mind. Any monolithic structure will impose arbitrary restrictions on
>>the number of data files or fields [etc.]
>
>Continued simplistic tone.  OS's are software.  DBMS's are software.
>Why can't a DBMS be written with the same liberal (or practically nonexistent)
>limits that the magical OS managing the separate-file structure has?

Very true. I got carried away bitching about the current state of the art in
PCs. This is, of course, not necessarily the case with all systems. My mistake.

I wonder, though, how many large DMBSs do have (virtually) unbounded file size?
Surely Sybase does (;-) but what of the others? Years ago I used a DBMS on a
VAX which had precisely this problem. Don't remember which one it was, though.

>>....For example, one of the big DBMSs
>>that runs under UNIX (RTI Ingres? Oracle?) bypasses the file system to write
>>directly to the disk (let's skip the technical details...). While this is
>>like the monolithic system in some ways, it still allows for some of the
>>benefits of the distributed structure. UNIX wizards may have a lot to say 
>>about this...
>
>That's us (at least).  [Toots own horn, has probably earned that right]
>Really the issue here is OS independence, not how many files you use.
>Our experience is that we know best how to deploy system resources
>to run our software, not the writers of the host operating system,
>fine as that OS may be for other purposes.
>
>-TW

Yes. As I wrote at the beginning of this article, raw I/O fits my 'distributed' 
model more closely than it does the 'monolithic' model. Excluding this, though, 
does anyone think that mono structures have any big advantages?

Anyway I am glad that you have taken the time to write the perfect DBMS 
back-end. Now all you need to do is sell it for Macs and PCs (not under unix, 
either) and I'll be very happy.

---
Whew. Now, let me ask a question without making any sweeping statements:

One point we all missed is the possibility of combining multiple tightly- 
related tables on the same physical portion of the disk. The speed advantage 
here might be worth a mono structure (but I would guess not). Some SQLs allow 
you to create "clusters". Now, what exactly is the DBMS writing to the disk? 
The only way I can think of for doing this is to actually store a join of the 
two files. This could chew up an enourmous amount of disk space. What is it 
doing?

----
Alexis Rosen                       {allegra,philabs,cmcl2}!phri\
Writing from                                {harpo,cmcl2}!cucard!dasys1!alexis
The Big Electric Cat                  {portal,well,sun}!hoptoad/
Public UNIX                         if mail fails: ...cmcl2!cucard!cunixc!abr1
Best path: uunet!dasys1!alexis

tim@linus.sybase.com (Tim Wood) (09/10/88)

In article <6299@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes:
>
>Recently, Tim Wood wrote a response to my article on database file structures
>...
>He missed ... that I was really addressing smaller
>machines. At any rate, I specifically excluded DBMSs which did raw I/O from my
>analysis.

I don't believe the PC focus was clear.  The article made rather
sweeping statements about different storage structures in general, but
the conclusions did not generalize beyond the single-user environment,
and maybe not within that.

As for raw I/O, if your DBMS does not take advantage of a mechanism in
the environment that offers potentially higher throughput, then it is
not the DBMS to use in your environment.  It's akin to saying "I'm
going to compare vendor X's bubble-sort vs. vendor Y's selection sort routines 
but ignore the quicksort from vendor W."

>
>BTW, the raw I/O stuff looks like distributed structure to me. 

No.  A single raw disk partition (of a certain minimum size) can hold
the entire set of databases and tables the DBMS sees.  This is likely
to give suboptimal performance (because no interleaving) and impose 
space constraints (i.e., need to grow beyond confines of the partition.)   
It's the most restrictive case of a scalable system (1 CPU, 1 disk).
But a flexible mono system will let you scale upward and choose your physical
layout.

>In fact, the 
>real breakdown, I guess, is between products that layer two levels of file 
>access on top of each other vs. the ones that only have one layer (i.e. they 
>either use the native file system for each of their logical files, or do raw 
>I/O and use their own file system).

That's a lot closer to the real issue.

>>>...Of course there are
>>>important logistics to consider, such as how to lock out access to data which
>>>is temporarily invalid because it is being updated privately by that node,
>>>but often this is not a problem at all. Even when it is, it's better that not
>>>being able to do the job at all. ]
>>
>>Unless you don't [like] inconsistent transactions and corrupt data.
>
>... I guess I was very unclear here. 
>What I was saying was that in many cases, with canned applications, you have 
>knowledge that certain tables will never be written to except by one user 
>(exceept on weekends, on Feb. 29, and during a Solar eclipse. Whatever...)
>In this case, you need not establish a lock and can just do whateer you like 
>without worries. This is what I meant by it being 'not a problem'.

The only problem is trivial applications.  Inability to support general
concurrency with consistency severely limits the ability of your system to
model the real-world problems you are trying to solve.   If you know
that your business is going to continue to operate from your garage 
with 3 employees for the forseeable, you might go with manual control.
If your workload is going to grow, though, that will quickly become
untenable.  It's much better to adopt a transaction-oriented DB design
from the start.  Then your applications can evolve at the level of 
your business problem rather than "well, I guess it's time to find a DBMS
that supports locking and revamp my applications for it".  

Moreover, with an active data dictionary, there will be concurrent
updates to the master object directory (e.g. "sysobjects"), even if no
user ever accesses an object in use by another, because the data
dictionary must maintain changing information about all the objects.

>
>>As for the disk strategies, they're fine.  I should mention that a
>>mono organization should not prohibit you from using other disk
>>devices (or files).  [etc.]
>
>Well, it's easy enough to do disk striping, but I'm not convinced that the DBMS
>is always going to be smarter than I am. What about prioritization? I want 
>certain users of table A, who don't need indices I, J, K, and L, to have very 
>fast response time. Other users of A, who need I J K and L, can go hang because
>they aren't upper management. :-) In this case I want to put A on one spindle 
>and everything else on another. I am not saying that it's impossible to write a
>DBMS which can handle things like this. I am saying that that would involve a 
>fair amount of AI, and I haven't seen anything like it yet.

That's not AI, that's database design.
The DBMS needs to offer the facilities for the DBA to specify physical
storage usage at various levels.  Good DB design and layout are powerful means
of obtaining performance.

Your example is puzzling, since indexes are supposed to make table access
faster.  You want the people who are hammering on A to use whichever of
I, J, K & L will give them the fastest response.  To do much besides 
adding a row at the end of A or truncating it, you should use the
indexes.  The parititioning you suggest is a worthwhile one.

>As I wrote at the beginning of this article, raw I/O fits my 'distributed'
>model more closely than it does the 'monolithic' model. 

Um, well, you're taking an argument against your conclusion and trying
to make it one in favor.  The raw partition is used because one desires
to use only the DBMS storage management structures, not those of the
OS.  That fits the mono definition much more closely than the separate-
file definition.

>Excluding this, though,
>does anyone think that mono structures have any big advantages?

Hard to say, since you've qualified the monolithic idea so as to make it
nearly meaningless.

>
>Anyway I am glad that you have taken the time to write the perfect DBMS 
>back-end. Now all you need to do is sell it for Macs and PCs (not under unix, 
>either) and I'll be very happy.

Oh, I see.  UNIX (& OS/2?) non grata.
You are very attached then to underpowered, facility-poor "OS"s like 
MS-DOS and Mac?   I suggest you stick to toy applications to match those
environments, then.

>Whew. Now, let me ask a question without making any sweeping statements:

For a change.

>One point we all missed is the possibility of combining multiple tightly- 
>related tables on the same physical portion of the disk. 

This is known generically as "clustering".  I believe Oracle does it.

>The speed advantage 
>here might be worth a mono structure (but I would guess not). Some SQLs allow 
>you to create "clusters". Now, what exactly is the DBMS writing to the disk? 

One idea is that you try to place pages for joining tables contiguously on
the same disk track.  Then, when a join query comes along, you can suck 
the first set of (potentially) joining rows into memory with one disk
read.  Then your join becomes an in-memory operation.   This is only
really effective if you are joining on the primary (i.e. sort) keys
of the tables, because that is the basis for the clustering.  It
is not very amenable to schema changes because it reduces independence
of the data from the storage structure.  

>The only way I can think of for doing this is to actually store a join of the 
>two files. This could chew up an enourmous amount of disk space. What is it 
>doing?

Fortunately, it's not done that way.  I suggest that you read some 
textbooks on databases, such as C.J. Date's _Intro to Database Systems_,
2nd. Ed. before making many more declarative postings.

-TW
{ihnp4!pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
..not an @ in the bunch...

daveb@geac.UUCP (David Collier-Brown) (09/11/88)

In article <6299@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes:
| Recently, Tim Wood wrote a response to my article on database file structures
| ... I guess I was very unclear here. 
| What I was saying was that in many cases, with canned applications, you have 
| knowledge that certain tables will never be written to except by one user 
| (exceept on weekends, on Feb. 29, and during a Solar eclipse. Whatever...)
| In this case, you need not establish a lock and can just do whateer you like 
| without worries. This is what I meant by it being 'not a problem'.

From article <976@sybase.sybase.com>, by tim@linus.sybase.com (Tim Wood):
|  The only problem is trivial applications.  Inability to support general
|  concurrency with consistency severely limits the ability of your system to
|  model the real-world problems you are trying to solve.   

  Actually, one wants to use a system with locks even in the trivial
case: what you're avoiding is the cost of dealing with collisions 
as opposed to the cost of locking.
  In a good system, the cost of establishing a lock can be small.
For example, on Unix, one can write ones' lock-manager critical code
as a dummy "device" driver[1], and ensure the software path for a
successful lock is relatively short, on the order of magnitude of a
single system call.
  This is not necessarily true of a collision-resolution system:
there is a fair bit more work involved here, often (but not always)
on the order of a process switch.
  (Yes, there are PC/Mac databases in which locking is frighteningly
expensive... but they don't **have** to be that way: see also [1])

  In a large library system my employer was (re)building on top of
Ingres, we noted that the nature of the real-world construct being
modeled precluded almost all possible collisions (No two people can
borrow the same copy of a book at the same time, unless they tear it
in half).
  This allowed us to run prototypes with no locking (because they
had no real database!) and still get self-consistent results.  When
we did the next prototype, though, we used the database with locking
turned on and saw a moderate speed improvement (The DBMS was smarter
than the proto-kludge).  We kept the locking, arguing that we could
not preclude administrative-correction transactions which would
require locking to function (ie, we found out the patron was using
someone else's card, or that the book was mis-labeled, and a
librarian was unmunging the now-incorrect model).

  The system ran "far faster"[2] than one which had collisions on the
patron tuple when it tried to parallelize charge-out transactions...

--dave (once upon a time a boring programmer-type who
        worked with extra-boring librarian types) c-b

[1] This can be done on Macs and probably OSdiv2: see the last issue
    of MacTutor for an example. You can also cheat and make it always
    return YES in the single-machine single-user case.
[2] Purely estimation, alas.
-- 
 David Collier-Brown.  | yunexus!lethe!dave
 78 Hillcrest Ave,.    | He's so smart he's dumb.
 Willowdale, Ontario.  |        --Joyce C-B

alexis@dasys1.UUCP (Alexis Rosen) (09/13/88)

In article <976@sybase.sybase.com> tim@linus.sybase.com (Tim Wood) writes:
>In article <6299@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes:
>>Recently, Tim Wood wrote a response to my article on database file structures
>>...He missed ... that I was really addressing smaller machines. At any rate,
>>I specifically excluded DBMSs which did raw I/O from my analysis.
>
>I don't believe the PC focus was clear.  The article made rather
>sweeping statements about different storage structures in general, but
>the conclusions did not generalize beyond the single-user environment,
>and maybe not within that.

No. They don't generalize to OLTP environments on large systems. They DO
generalize to most multi-user DBMSs running on micros.

>As for raw I/O, if your DBMS does not take advantage of a mechanism in
>the environment that offers potentially higher throughput, then it is
>not the DBMS to use in your environment. [etc.]

That's not the situation for micro users. In general that environment doesn't
support raw I/O, which is why I excluded it.

>>BTW, the raw I/O stuff looks like distributed structure to me.
>No.  A single raw disk partition [is a rotten idea...]

You implied the ability to use multiple spindles with raw I/O. Obviously,
limiting access to one partition will always be suboptimal.

>But a flexible mono system will let you scale upward and choose your physical
>layout.

Then it's not a mono system. It's a hybrid, which is probably better than a
pure form of either structure. (I should have mentioned this possibility
originally, but again, there are no major DBMSs for PCs yet which offer this
ability).

>>In fact, the
>>real breakdown, I guess, is between products that layer two levels of file
>>access on top of each other vs. the ones that only have one layer [etc.]
>
>That's a lot closer to the real issue.
>
>> [discussion about various strategies to minimize record & table locks]
>                                         ... Inability to support general
>concurrency with consistency severely limits the ability of your system to
>model the real-world problems you are trying to solve.   If you know
>that your business is going to continue to operate from your garage
>with 3 employees for the forseeable, you might go with manual control.
>If your workload is going to grow, though, that will quickly become
>untenable.  It's much better to adopt a transaction-oriented DB design
>from the start. [etc.]

You're missing the point here. I agree that OLTP is a great thing. I'd love to
be able to use it. But I can do things on a pc network without OLTP, that would
take five times the money to do on a system powerful enough to use OLTP,
WITHOUT increasing the chances of blowing away important data. It won't take
enormous effort on my part, either. Of course, in five years or so OLTP will be
a lot easier (with '786 or '080 CPUs and 32 MB of RAM in the average micro).
For now, though, even large companies don't always have the extra money to
throw at a problem that OLTP demands. They want it done on a micro network,
next month, with industry-standard tools which hundreds of local programmers
are already familiar with.

>Moreover, with an active data dictionary, there will be concurrent
>updates to the master object directory (e.g. "sysobjects"), even if no
>user ever accesses an object in use by another, because the data
>dictionary must maintain changing information about all the objects.

So? Not too many micro DBMSs out there with any kind of data dictionary. That
sucks, I know, but I can still do useful work in them.

>>>As for the disk strategies, they're fine.  I should mention that a
>>>mono organization should not prohibit you from using other disk
>>>devices (or files).

Then it's not a monolithic structure, is it?

>>Well, it's easy enough to do disk striping, but I'm not convinced that the
>>DBMS is always going to be smarter than I am. What about prioritization?
>>[example of prioritization/partitioning] I am not saying that it's impossible
>>to write a DBMS which can handle things like this. I am saying that that
>>would involve a fair amount of AI, and I haven't seen anything like it yet.
>
>That's not AI, that's database design.
>The DBMS needs to offer the facilities for the DBA to specify physical
>storage usage at various levels.  Good DB design and layout are powerful means
>of obtaining performance. [etc.]
>The parititioning you suggest is a worthwhile one.

That's AI when the DBMS figures out the partitioning itself. Without that, the
DBA has to put various files in various different physical places, and that's
not a mono structure anymore...

>>As I wrote at the beginning of this article, raw I/O fits my 'distributed'
>>model more closely than it does the 'monolithic' model.
>
>Um, well, you're taking an argument against your conclusion and trying
>to make it one in favor.  The raw partition is used because one desires
>to use only the DBMS storage management structures, not those of the
>OS.  That fits the mono definition much more closely than the separate-
>file definition.

It's hardly an argument against my conclusion when I specifically excepted it.

Regardless, monolithic (by my definition) means 'one physical file' (this does
not prohibit disk striping). The DBMS you describe allows raw I/O on several
different volumes, which may live on different machines. They are separate
devices and separate physical files. It also allows the DBA to assign specific
logical files (tables) to specific physical files. This is the outstanding
characteristic of distributed structures. In fact, you're describing a DBMS
which is a hybrid. As I said before, the hybrid is probably the best way to do
things.

If you disagree with my definition, fine. I'll agree to disagree...

>>Excluding this, though,
>>does anyone think that mono structures have any big advantages?
>
>Hard to say, since you've qualified the monolithic idea so as to make it
>nearly meaningless.

I haven't. Actually there is one advantage which we both forgot (thanks to
Dennis Cohen who reminded me). Under certain OSs (especially Micro OSs) opening
a file can take a great deal of time. There may also be serious limits on the
number of files you can keep open at any one time. These are both problems for
a distributed file structure. I don't believe they outweigh the advantages,
though- at least not with MS-DOS or Mac OS.

>>Anyway I am glad that you have taken the time to write the perfect DBMS
>>back-end. Now all you need to do is sell it for Macs and PCs (not under unix,
>>either) and I'll be very happy.
>
>Oh, I see.  UNIX (& OS/2?) non grata.
>You are very attached then to underpowered, facility-poor "OS"s like
>MS-DOS and Mac?   I suggest you stick to toy applications to match those
>environments, then.

I suggest you stick to what you know about (OLTP). These OSs, whatever their
faults (and they are legion), dominate the PC market. Nevertheless I and many
other people have developed many non-trivial applications in these
environments.

>>Whew. Now, let me ask a question without making any sweeping statements:
>
>For a change.
>...I suggest that you read some
>textbooks on databases, such as C.J. Date's _Intro to Database Systems_,
>2nd. Ed. before making many more declarative postings.

Don't be nasty. It doesn't advance your position. Should I suggest to you that
you read an introductory book on microcomputers?

All of the foregoing discusses whether or not various benfits I attributed to
distributed file DBMSs are applicable to mono DBMSs as well. Except what I
wrote about file-opening overhead, I have yet to hear any advantages the mono
structure has. (This is _specifically_ a mono structure managed by the OS). Are
there any?

----
Alexis Rosen                       {allegra,philabs,cmcl2}!phri\
Writing from                                {harpo,cmcl2}!cucard!dasys1!alexis
The Big Electric Cat                  {portal,well,sun}!hoptoad/
Public UNIX                         if mail fails: ...cmcl2!cucard!cunixc!abr1
Best path: uunet!dasys1!alexis

jkrueger@daitc.daitc.mil (Jonathan Krueger) (09/15/88)

In article <6299@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes:
>The response by Jon Krueger wasn't on-target

I don't think we're aiming for the same target.  I'm interested in
defining and solving database problems.  By trade I'm expected to
identify classes of machines capable of implementing a given solution.
Articles listing capabilities of machines of a specified size, shape,
or price tag are relevant to this newsgroup and helpful to me.
Articles that present current limitations of unspecified machines as
inherent characteristics of database management systems waste my time
and mislead the uninformed.

For instance:
	>Any monolithic structure will impose arbitrary restrictions on
	>the number of tables and columns allowed in the database.
is false and misleading.  If it were qualified, for instance:
	As of this writing, I know of no commercially available
	software running under MS-DOS that uses a monolithic
	structure that does not also impose arbitrary
	restrictions on the number of tables and columns allowed
	in the database.
it might be correct.  If it were upgraded to market research:
	Surveying current limits of some commmercially available
	database management systems, I find:
		Name	OS	Type	MaxTables	MaxCols
		Foobase	MS-DOS	M	20		50
		Barbase	MS-DOS	D	Unlimited	255
it might even be helpful.  And if you stated and analyzed the trend:
	Total costs (hardware, operating system, database) are currently:
				arbitrary restrictions	large or unlimited
		monolithic	$10,000 and up		$50,000 and up
		distributed	$2000 to $10,000	$10,000 and up
	Most of the difference in cost results from moving from
	single-user machines to shared systems.  Cost per user
	may be calculated (etc.)
your contribution to this group would be welcome and appreciated.

-- Jon

-- 
Jonathan Krueger  uunet!daitc!jkrueger  jkrueger@daitc.arpa  (703) 998-4777

Inspected by: No. 15