[comp.databases] Databases: distributed vs. monolithic file structure

alexis@dasys1.UUCP (Alexis Rosen) (08/31/88)

There has been some discussion recently in comp.sys.mac concerning the relative 
merits of two different methods of storing database files: the monolithic 
(all-in-one-file) way and the distributed (files-all-over-the-place) method. 
Since this issue is very important (to those of us who use databases), and 
apparently not too well understood, I will try to explain why the distributed 
method is far superior to the monolithic method.

The benefits of monolithic structure are few. First, you don't have to look at 
all those files filling up your disk. This is especially important to some 
consultants who want to present a neat appearance to their clients (no joke- 
I've seen people choose not to buy a product for this reason alone). Also, 
there is less chance of losing one file and not noticing it. I do not know of 
any other reasons for using the monolithic structure. Both of the reasons 
listed above are purely aesthetic. As far as I'm concerned, just put everything 
in one folder (subdirectory) and there's no more problem.

Some of the monolithic-structure databases are:
Mac- Omnis 3+, Helix, Fourth Dimension (4D)
PC- R:Base

The benefits to a distributed structure are far more concrete and important. 
First of all, in today's imperfect (i.e., non-fault-tolerant) world, keeping 
all your data files (relations, tables) and indexes in separate files isolates 
any corrupt structures. If you lose power while creating an index on file A, 
that's what you have to fix: the index, and only the index. With the Monolithic 
structure, it is possible (or guaranteed, depending on your particular choice 
of DBMS) that you will corrupt some other portion of your database. Good luck 
finding out what got damaged- it may take you weeks to find it, by which time 
all your backups may have the damage as well. In rare cases you could trash 
your entire database. Fortunately, most DBMSs today are more robust than that. 
The danger, however, never goes away entirely. This reason alone should 
convince most people.

The other overriding reason to use a distributed structure is performance. If 
your DBMS has to go through its own file-management code as well as the OS's, 
it will always be slower than if it only needed to go through the OS. In the 
first case, to write file A the DBMS must first determine where in the logical 
data file (table) the data goes. Then it must find the location of the table in 
the physical database file. Then it can tell the OS to find that sector on the 
disk and update it. With a distributed structure, the middle step doesn't 
exist. This time savings is not immense if the DBMS is very well-written, but 
the vast majority are not (at least when it comes to speed optimization).

There is a much bigger performance gain for distributed structures in a 
multiple-machine or multiple-hard-disk environment. While this is a very 
complex subject, a few examples will make my point. All of these techniques are 
used in systems I have built and they can produce gains of one hundred to 
ten-thousand percent!

Case 1:
There is a very large file with several indices associated with it. Many people 
use this file, and some of those use the indices. Even with a very fast disk, 
access is slow, and the file is too large for a cache to provide significant 
help. The solution is to put the indices on a separate hard disk (same server). 
This results in immense speed improvements. The reasons are as follows:
 1) The CPU is inherently faster than the disk. The bottleneck is in getting
    data off of the disk. By pulling information from two disks at once, data
    can be retrieved twice as fast.
 2) Even more significant is the fact that the disk with the data file is
    doing far fewer seeks than before. Seeking is a disproportionately slow
    operation for computers. The result is a massive performance gain for
    those using the indices. Those who are not using the indices (and are
    generally going through the datafile sequentially) will gain even more.

There are some situations where you can do even better that that. If you have a 
group of files which are generally used for look-up information, and usually 
not written to, you can put them all, along with their indices, on a RAMdisk. 
In one extreme case this resulted in a program performing *100 TIMES* better 
than it had before.

There are all sorts of creative ways to distribute your files for maximum gain. 
Because every situation is different, it is impossible to tell just how much 
faster things will be, but I've seen improvements from a minimum of 100%, to a 
more average 300%, up to 10,000% (!) in extreme cases.


Case 2:
There are many nodes on a LAN all accessing a database, doing fundamentally 
different jobs with the same data. If all of the files (tables) in the database 
are left on the server, you have an especially bad bottleneck problem if any 
node other than the server wants to do some large-scale data manipulation. In 
particular, on slower networks like AppleTalk or StarLAN, or (even worse) on 
leased-line dial-in access, some tasks become impossible, either because they 
will bring the network to its knees or because they cannot be completed before 
it's time to start them again (i.e. a task which must be performed every hour 
taking 90 minutes to complete).

With a monolithic database you are S.O.L., but with a distributed structure you 
can just send the relevant fields (columns) of the relevant records (rows) to a 
hard disk local to the node which needs to manipulate the data. When the task 
is finished, the data is reloaded into the main file(s). Of course there are 
important logistics to consider, such as how to lock out access to data which 
is temporarily invalid because it is being updated privately by that node, but 
often this is not a problem at all. Even when it is, it's better that not being 
able to do the job at all.

This technique, which is actually a form of coarse-grained distributed 
processing, can make possible tasks that are simply beyond the powers of a 
monolithic system (in one case I had to deal with), and can speed up other jobs 
by tens or hundreds of times. Just as importantly, it can lighten the load on 
your network considerably.


There is one other very important reason to use a distributed structure that 
comes to mind. Any monolithic structure will impose arbitrary restrictions on 
the number of data files or fields (tables and columns) allowed in the 
database. Sometimes these restrictions are very severe. Furthermore, if you 
have a very large database, it may not fit on one physical disk, and with the 
monolithic structure you are limited (generally) to one device. With the 
distributed structure, these limitations just go away.

Some of the distributed-structure databases are:
Mac- FoxBase+/Mac, McMax
PC- dBase III+, PowerBase, FoxBase, Clipper, dbXL, etc.
UNIX- Most systems, I think (it's been a long time...)

-----

There are many more things to consider, but I've tried to hit the most 
important ones. There are also exceptions. For example, one of the big DBMSs 
that runs under UNIX (RTI Ingres? Oracle?) bypasses the file system to write 
directly to the disk (let's skip the technical details...). While this is like 
the monolithic system in some ways, it still allows for some of the benefits of 
the distributed structure. UNIX wizards may have a lot to say about this...

If you think I've missed any major points here, or that I've short-changed the 
monolithic-style DBMSs, please let me know why, and give an example.

PS to Mac people- this is another good reason to like FoxBase... The only other 
Mac DMBS that I'm aware of which uses the distributed file structure is McMax, 
which is just a failed FoxBase.

----
Alexis Rosen                       {allegra,philabs,cmcl2}!phri\
Writing from                                {harpo,cmcl2}!cucard!dasys1!alexis
The Big Electric Cat                  {portal,well,sun}!hoptoad/
Public UNIX                         if mail fails: ...cmcl2!cucard!cunixc!abr1
Best path: uunet!dasys1!alexis

jkrueger@daitc.daitc.mil (Jonathan Krueger) (09/02/88)

In article <6178@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes:

>If your DBMS has to go through its own file-management code as well as
>the OS's, it will always be slower than if it only needed to go
>through the OS.

This is incorrect.  The cost of operations like (create, destroy,
find, append, delete, replace) is a function of the data structure
used, not where the code resides.  If you implement a relational
database by putting each table in a file, you use the data structure
structure defined by the filesystem.  If you put all tables into one
large file, you define your own data structure.  Either way, the data
structure will optimize for certain operations and against others.

>In the first case, to write file A the DBMS must first determine where
>in the logical data file (table) the data goes. Then it must find the
>location of the table in the physical database file. Then it can tell
>the OS to find that sector on the disk and update it. With a
>distributed structure, the middle step doesn't exist.

This is also incorrect.

		One Large File		One File per Table
		==============		==================
find table	seek through file	open file
find row	seek through file	seek through file
update row	write location		write location

>There is a much bigger performance gain for distributed structures in
>a multiple-machine or multiple-hard-disk environment.  There is a very
>large file with several indices associated with it. Many people use
>this file, and some of those use the indices. Even with a very fast
>disk, access is slow, and the file is too large for a cache to provide
>significant help. The solution is to put the indices on a separate
>hard disk (same server).  This results in immense speed improvements.

The assumption seems to be that you can't do this with a single big
file.  People do this all the time: it's called disk striping.
Breaking data into smaller files has little to do with spreading data
across multiple spindles.

>If you have a group of files which are generally used for look-up
>information, and usually not written to, you can put them all, along
>with their indices, on a RAMdisk.

Or you can use your RAM and your time more profitably: disk caching,
virtual memory, code profiling, and dare we say it, database design.

>There are many nodes on a LAN all accessing a database, doing
>fundamentally different jobs with the same data. If all of the files
>(tables) in the database are left on the server, you have an
>especially bad bottleneck problem if any node other than the server
>wants to do some large-scale data manipulation. In particular, on
>slower [paths] With a monolithic database you are S.O.L.

This is incorrect.  Again, you can distribute pieces of a file to
multiple nodes just as you can spread it across multiple disks.

>but with a distributed structure you can just send the relevant fields
>(columns) of the relevant records (rows) to a hard disk local to the
>node which needs to manipulate the data. When the task is finished,
>the data is reloaded into the main file(s).

If you can't update tables without regard to physical location, it's
not a distributed structure.  If you trade transparent access for
application-specific speed, you're behind on the deal.  For instance:

>Of course there are important logistics to consider, such as how to
>lock out access to data which is temporarily invalid because it is
>being updated privately by that node

If you have to do your own record locking, hardcode data location
into applications, and retune performance for every new disk, why use
a database manager at all?

>but often this is not a problem at all. Even when it is, it's better
>that not being able to do the job at all.

Wrong twice. It's better to get a slow answer than a wrong one.  And
in no sense is one prevented from doing the job, adequate tools exist.

>This technique...can lighten the load on your network considerably.

This depends entirely on how well you predict which pieces of data
will be needed where.  If you guess poorly, it will increase the load.
Again, this has nothing to do with whether you partition your data by
table or other unit.  For instance, you could keep one table per file
and split each file among multiple nodes.

>There is one other very important reason to use a distributed
>structure that comes to mind. Any monolithic structure will impose
>arbitrary restrictions on the number of data files or fields (tables
>and columns) allowed in the database.

This is incorrect.  Again, the data structure determines whether you
can implement fixed or flexible field sizes, field or row width
limits, restrictions on number of fields or tables.  Consider a
monolithic tree.

>If you have a very large database, it may not fit on one physical
>disk, and with the monolithic structure you are limited (generally) to
>one device. With the distributed structure, these limitations just go
>away.

This is incorrect.  Several commercially available operating systems
support disk striping, bound volume sets, and the like.  For those
that don't, the limitations don't "just go away": what happens when a
single table must grow larger than the disk?

No, the reason why it's convenient to put each table into a file is
that we have a lot of tools that act on files.  It's good software
engineering to use them on tables.  For instance, the directory
listing program usually provides file size; in the monolithic
structure, that functionality has to be provided elsewhere.  For
another instance, the backup/restore utility knows how to restore
files.  For the monolithic structure, that complicates its ability to
recover from disasters.

-- Jon
-- 
Jonathan Krueger  uunet!daitc!jkrueger  jkrueger@daitc.arpa  (703) 998-4777

Inspected by: No. 15

alexis@dasys1.UUCP (Alexis Rosen) (09/02/88)

Recently, Jonathan Krueger (jkrueger@daitc.daitc.mil) wrote:
> [lots of stuff which I'll get to in a second]

I guess I made a mistake. When I originally wrote this article I was
specifically looking at the Macintosh and DBMSs available for it. When I was
done I saw that what I wrote was equally applicable to PCs. When I posted this
article to comp.sys.databases, I forgot that nowhere in my article did I
specifically say that I was focussing mainly on micros. So Jonathan is
absolutely correct about some things, in a large-systems environment. I feel
that what I originally wrote stands, in a Microcomputer context.

He does make, however, many statements which I think are wrong, regardless
of context. Here goes:

>In article <6178@dasys1.UUCP> alexis@dasys1.UUCP (Alexis Rosen) writes:
>>If your DBMS has to go through its own file-management code as well as
>>the OS's, it will always be slower than if it only needed to go
>>through the OS.
>
>This is incorrect.  The cost of operations like (create, destroy,
>find, append, delete, replace) is a function of the data structure
>used, not where the code resides.  If you implement a relational
>database by putting each table in a file, you use the data structure
>structure defined by the filesystem.  If you put all tables into one
>large file, you define your own data structure.  Either way, the data
>structure will optimize for certain operations and against others.
> [etc]
>		One Large File		One File per Table
>		==============		==================
>find table	seek through file	open file
>find row	seek through file	seek through file
>update row	write location		write location

It is true that you are always working within the context of _somebody's_ data
structure. However, it is likely that your file system is smarter than your
database (on a micro, at least). Still, this is a weak point.

However, as far as finding and updating rows, I think that the table above
cheats a lot. For example, to retrieve a row: Given that you've computed the
absolute offset from the beginning of the file to the row you want, the
distributed structure only requires the file system to seek to that offset.
Very simple.

The monolithic structure creates a huge headache by comparison. Remember, it's
not so easy as saying "seek to (row offset + table offset from beginning of
file)". You can't simply store multiple tables one after the other. You need to
be able to mix blocks of one table with blocks of another. You need pointers to
track the blocks of each file. You need master file pointers to track the file
block pointers... Or you need a similar, equally convoluted scheme. So
initially accessing a row (or even the index's pointer to that row) takes a lot
more work. Similarly, linear searches through the file (which should be avoided
like the plauge, usually) are much worse. Obviously, you can cache pointers to
all these things, but there is always a limit on how much you can cache, and
caching that stuff means that you are caching less important stuff: your data.

In fact, using a monolithic structure implies that you are creating your own
special-purpose file system that sits on top of the regular file system.
Therefore, I think that Jon is wrong in all cases, EXCEPT in the case I noted
before: DBMSs that bypass the file system entirely. Then they are on a fairly
equal basis.

>>There is a much bigger performance gain for distributed structures in
>>a multiple-machine or multiple-hard-disk environment.  There is a very
>>large file with several indices associated with it. Many people use
>>this file, and some of those use the indices. Even with a very fast
>>disk, access is slow, and the file is too large for a cache to provide
>>significant help. The solution is to put the indices on a separate
>>hard disk (same server).  This results in immense speed improvements.
>
>The assumption seems to be that you can't do this with a single big
>file.  People do this all the time: it's called disk striping.
>Breaking data into smaller files has little to do with spreading data
>across multiple spindles.

True, for large systems. Not true, generally, for micros. (There are some
special drivers for MS-DOS that do this, for some hardware. They are not
widespread, to the best of my knowledge).

Even given that it is available, how can you think that disk striping (which is
not aware of the structure of your single large data file) can be more
efficient that a human being intelligently deciding how to allocate storage? As
an example, with the distributed structure I can decide that indices A, B, and
C should sit on a separate disk. With disk striping and a monolithic file, I'll
get random pieces of every table and index in my database on that disk.

>>If you have a group of files which are generally used for look-up
>>information, and usually not written to, you can put them all, along
>>with their indices, on a RAMdisk.
>
>Or you can use your RAM and your time more profitably: disk caching,
>virtual memory, code profiling, and dare we say it, database design.

Code profiling (and related analysis, such as data-flow modeling) should be
independent of this. Likewise for virtual memory.

Still, I see that we have a fundamentally different view of how smart DBMSs are
(and are likely to get in the near future). Caches definitely have their place.
The best solution in these instances would be the ability to tell your DMBS
"cache this entire file". Unfortunately, I don't know of any DBMS that can do
this. Barring that, I am much smarter that any cache algorithm I've met
recently, so I just stuff critical look-ups in the RAM disk.

In fact, I see this kind of decision as part of database design. More on that
later...

>>There are many nodes on a LAN all accessing a database, doing
>>fundamentally different jobs with the same data. If all of the files
>>(tables) in the database are left on the server, you have an
>>especially bad bottleneck problem if any node other than the server
>>wants to do some large-scale data manipulation. In particular, on
>>slower [paths] With a monolithic database you are S.O.L.
>
>This is incorrect.  Again, you can distribute pieces of a file to
>multiple nodes just as you can spread it across multiple disks.

I guess this depends on what file system you are using. Still, what I said
before stands: I (and you) can better determine where to allocate data than the
file system can.

>>but with a distributed structure you can just send the relevant fields
>>(columns) of the relevant records (rows) to a hard disk local to the
>>node which needs to manipulate the data. When the task is finished,
>>the data is reloaded into the main file(s).
>
>If you can't update tables without regard to physical location, it's
>not a distributed structure.  If you trade transparent access for
>application-specific speed, you're behind on the deal.  For instance:
>
>>Of course there are important logistics to consider, such as how to
>>lock out access to data which is temporarily invalid because it is
>>being updated privately by that node
>
>If you have to do your own record locking, hardcode data location
>into applications, and retune performance for every new disk, why use
>a database manager at all?

Because it's better than writing my code in C. Seriously, I think Jon has a
somewhat myopic view here. He seems to be working with the "true relational"
model, but that's not always practical. Location independence is wonderful in
theory, but I may need to sacrifice it for performance's sake without wanting
to lose all the other conveniences of a database system. That's why I'm willing
to worry about hand-locking those records, if necessary (and it's not always
needed).

In general, I am describing techniques to gain major performance improvements.
They aren't free. Some extra coding is going to be necesary. That's life. No
DMBS I know of is smart enough to even begin doing this intelligently.

>>but often [locking out rows by hand] is not a problem at all. Even when it
>>is, it's better than not being able to do the job at all.
>
>Wrong twice. It's better to get a slow answer than a wrong one.  And
>in no sense is one prevented from doing the job, adequate tools exist.

You misunderstand me. If it's a problem, then you need to write some code to
prevent the problem. That may not be necessary, if the columns updated in
private are never written to by other users. (This can often be assured.)

Adequate tools don't exist if the tools can't do the job in less time than the
cycle from one excecution of the job to the next, for example.

>>This technique...can lighten the load on your network considerably.
>
>This depends entirely on how well you predict which pieces of data
>will be needed where.  If you guess poorly, it will increase the load.
>Again, this has nothing to do with whether you partition your data by
>table or other unit.  For instance, you could keep one table per file
>and split each file among multiple nodes.

Clearly, I am predicting things well. If I can't, I wouldn't try to mess with
the location of the files. A good database engineer should, generally, be able
to make intelligent guesses along these lines, and careful study of a live
system should give even better information.

I don't see what this 'for instance' shows.

>>There is one other very important reason to use a distributed
>>structure that comes to mind. Any monolithic structure will impose
>>arbitrary restrictions on the number of data files or fields (tables
>>and columns) allowed in the database.
>
>This is incorrect.  Again, the data structure determines whether you
>can implement fixed or flexible field sizes, field or row width
>limits, restrictions on number of fields or tables.  Consider a
>monolithic tree.

Well, true. This is a case where large and small systems differ. All of the
monolithic-style systems on micros that I know of (I know of fairly many) have
arbitrary limits that can cramp one's style rather severely.

>>If you have a very large database, it may not fit on one physical
>>disk, and with the monolithic structure you are limited (generally) to
>>one device. With the distributed structure, these limitations just go
>>away.
>
>This is incorrect.  Several commercially available operating systems
>support disk striping, bound volume sets, and the like.  For those
>that don't, the limitations don't "just go away": what happens when a
>single table must grow larger than the disk?

Your OSs has some sort of limit on the logical size of volumes. Extend the
argument from 'physical disk' to 'logical volume' and it's pretty strong. At
worst, the distributed structure just pushes back the boundaries.

>No, the reason why it's convenient to put each table into a file is
>that we have a lot of tools that act on files.  It's good software
>engineering to use them on tables.  For instance, the directory
>listing program usually provides file size; in the monolithic
>structure, that functionality has to be provided elsewhere.  For
>another instance, the backup/restore utility knows how to restore
>files.  For the monolithic structure, that complicates its ability to
>recover from disasters.

Yes. This is a good reason for the distributed structure. There are many others
as well.

----

Let's turn this whole subject on its head. Are there any significant benefits
to a monolithic structure? I have mentioned the DBMSs that bypass the file
system, and I'm sure that doing that right is more efficient, but that's really
alot like a distributed system in disguise. Anything else?

----
Alexis Rosen                       {allegra,philabs,cmcl2}!phri\
Writing from                                {harpo,cmcl2}!cucard!dasys1!alexis
The Big Electric Cat                  {portal,well,sun}!hoptoad/
Public UNIX                         if mail fails: ...cmcl2!cucard!cunixc!abr1
Best path: uunet!dasys1!alexis

sysop@stech.UUCP (Jan Harrington) (09/03/88)

in article <6178@dasys1.UUCP>, alexis@dasys1.UUCP (Alexis Rosen) says:
> 
> 
> There has been some discussion recently in comp.sys.mac concerning the relative 
> merits of two different methods of storing database files: the monolithic 
> (all-in-one-file) way and the distributed (files-all-over-the-place) method. 
> Since this issue is very important (to those of us who use databases), and 
> apparently not too well understood, I will try to explain why the distributed 
> method is far superior to the monolithic method.
> 
> The benefits of monolithic structure are few. First, you don't have to look at 
> all those files filling up your disk. This is especially important to some 
> consultants who want to present a neat appearance to their clients (no joke- 
> I've seen people choose not to buy a product for this reason alone). Also, 
> there is less chance of losing one file and not noticing it. I do not know of 
> any other reasons for using the monolithic structure. Both of the reasons 
> listed above are purely aesthetic. As far as I'm concerned, just put everything 
> in one folder (subdirectory) and there's no more problem.
> 
> Some of the monolithic-structure databases are:
> Mac- Omnis 3+, Helix, Fourth Dimension (4D)

Sorry, but the only single-file Mac DBMS is Double Helix II. The others leave
all sorts of files cluttering up your folder.

> PC- R:Base
> 
No way - R:base uses THREE files for every database. And heaven help you if
you rename any or lose one!

> The benefits to a distributed structure are far more concrete and important. 
> First of all, in today's imperfect (i.e., non-fault-tolerant) world, keeping 
> all your data files (relations, tables) and indexes in separate files isolates 
There is, however, a major problem with multi-file storage. You dare not
change a file name, index name, etc., etc., without goofing up all your 
application code. In other words, what you do logically with table names,
etc., is dependent on what things are named physically. If the two don't
match, then you've got a problem. (Try renaming a dBase file - any kind of
dBase file - from the desktop. Can dBase (any vesion) figure out that the
file has been renamed and dynamically fix all your application code? No
way. You'll get a "file not found" error!)

Jan Harrington, sysop
Scholastech Telecommunications
UUCP: husc6!amcad!stech!sysop or allegra!stech!sysop
BITNET: JHARRY@BENTLEY

********************************************************************************
	Miscellaneous profundity:

		"No matter where you go, there you are."
				Buckaroo Banzai
********************************************************************************

hoff@hp-sdd.HP.COM (Tom Hoff) (09/16/88)

>There is, however, a major problem with multi-file storage. You dare not
>change a file name, index name, etc., etc., without goofing up all your 
>application code. 

One point you forgot to mention is the number of files an OS or process is able
to have open at one time.  If a user process only gets 20 file handles, and 3
are used for each open table, it makes it pretty hard to join more than 6
tables.

Personally I prefer having each table and all of it's indices, etc. all in it's
own single file.  This allows reasonable conservation of finite OS resources,
and reasonable abstraction of the DBMS' internal structures (assuming the file
has the same name as the table).  Better yet, give the application programmer
the ability to specify the storage method, but independantly of the application
code.

--Tom
-- 
     Tom Hoff (...!hplabs!hp-sdd!hoff)
	"Dammit Jim, I'm a programmer not a spokesman!"