[comp.databases] Database Machines

jas@rtech.UUCP (05/29/87)

Thought I'd post a few thoughts on database machines to try to move the
discussion away from the meta-topic of what constitutes unacceptable
advertising.  In the following paragraphs, "vanilla" means "general
purpose" (not "unremarkable"), and "DBMS" means the DBMS proper, not
including user interfaces.

It seems to me that database machines try to achieve superior
performance in two different ways:  (1) by designing special-purpose
hardware that can do database-intensive things much faster than vanilla
hardware; and (2) by eliminating the vanilla operating system, allowing
the DBMS software to run directly on the hardware (a.k.a. writing a
special-purpose OS that will support only the DBMS software).

Personally, I'm a good deal more skeptical about (1) than about (2).
Special-purpose hardware has high leverage in some application areas:
4 x 4 matrix multiplication hardware is pretty handy for doing
real-time 3-D transformations, for example.  I'm not convinced that
DBMS software has similar, high-enough-leverage operations.  While
Britton-Lee tries to develop special-purpose hardware that does some
subset of the data manager's job so fast that the ENTIRE data manager
runs 10 times faster, the vanilla hardware vendors are building
computers that do EVERYTHING faster.  My money says the special-purpose
hardware people are going to have a hard time keeping up with Sun, DEC,
and the rest of the vanilla hardware vendors.

(2) seems more promising.  Lots of vanilla OS's are lousy platforms for
DBMS implementation (UNIX is certainly not an exception).  Strictly off
the cuff, I could imagine a 100% performance improvement to be had by
implementing a DBMS on a special-purpose OS (or maybe on a vanilla OS
that provided the right services, if such a beast existed), instead of
on UNIX.

Comments?

(Oh, yes:  these are my personal opinions only.)
-- 
Jim Shankland
 ..!ihnp4!cpsc6a!\
                  rtech!jas
..!ucbvax!mtxinu!/

mjr@well.UUCP (Matthew Rapaport) (05/29/87)

PICK by Pick Systems (So. Cal.) is a perfect example of the #2 approach
to database machines, and there is a very good discussion of the problems
and promises (and the promises are not great) of the #1 approach in
C.J. Date's "An Introduction to Database Systems" Vol II (Addison
Wesley 1983.

forrest@blia.BLI.COM (Jon Forrest) (05/29/87)

In article <863@rtech.UUCP>, jas@rtech.UUCP (Jim Shankland) writes:
> Thought I'd post a few thoughts on database machines to try to move the
> discussion away from the meta-topic of what constitutes unacceptable
> advertising.

Thanks. One of the reasons I posted my original message was to
stir up activity in this group.

> 
> It seems to me that database machines try to achieve superior
> performance in two different ways:

> (1) by designing special-purpose
> hardware that can do database-intensive things much faster than vanilla
> hardware; 

I'm skeptical about this one too but for non-technical reasons.
As new technology comes out we will probably always be able to construct
a database machine that uses it. This will reestablish the price/
performance ratio that we probably will have lost due using what has
become old technology in the previous version of our database machine.
The problem here is that hardware development is terribly expensive so
it will be very difficult for us (or anyone else) to make any money
following this approach. The approach taken by Sybase will result
in machines with less performance than ours but also with lower costs.
This could result in a price/performance ration as good (or better)
than ours. If their sofware is as good as ours then they will have
a very competitive product, all other things being equal.

> (2) by eliminating the vanilla operating system, allowing
> the DBMS software to run directly on the hardware (a.k.a. writing a
> special-purpose OS that will support only the DBMS software).
> 

This is probably the largest difference between us and Sybase.
I think your comments were exactly right on this. Our operating
system doesn't have to worry about lots of the stuff that Unix
or VMS has to look after. I don't know anything about the
kernal that runs on the IDM but from what I've heard, it is a
very important reason why our performance is so good, especially
given the age of the IDM.

These opinions are mine only, which should be clear since the first
one contradicts our whole approach. I should also add that I make
no claim of being a database expert. I'm in charge of our VMS host
software.

Jon Forrest
ucbvax!mtxinu!blia!forrest

vollmer@manta.UUCP (Tom Vollmer) (05/31/87)

We went through a bit a agonizing a year or so ago.  There was a
heavily loaded VAX 11/780 also running a software RDBMS and complaints
about performance were the soup du jour.  We were seriously looking
at some database machines for the application, when the VAX was
upgraded to an 8600 for other reasons.  We piggy backed on the
upgrade and performance complaints disappeared, and no the 8600
has not been bogged during with increasing load during this year.

I do understand there are dedicated applications that make sense
for database engines but 1) sometimes its nice to have your applications
on general purpose systems and get a 'free' ride on a hardware
upgrade and 2) sometimes a faster CPU makes up for a lot of problems with
RDBMS and OS design.

Tom Vollmer (vollmer@nosc)
Computer Sciences Corporation3

miket@blic.UUCP (06/08/87)

Jim Shankland (jas@rtech.UUCP) has suggested the two major differences
between a 'traditional' software RDBMS and a data base machine based
RDBMS are (1) the presence of special purpose hardware and (2) the use
of a special purpose O/S.  I agree.

He went on to speculate that "the special-purpose hardware people are going
to have a hard time keeping up with Sun, DEC, and the rest of the vanilla
hardware vendors."  My problem with this speculation is that it implies that
a vendor of database machine based RDBMSs must use a fixed mix of special
purpose hardware and software over time and at all levels of its product line.

For a counter example, look at the three biggest U.S. 'database machine'
vendors today (Britton Lee, Teradata, and Sybase) and you will find a large
varriation special purpose hardware content.  Teradata has the most specialized
hadware (a group of up to 1024 micros interconnected by a special bus called
'Y-net').  Sybase has no special hadware at all (it runs Suns and Vaxen).
Britton Lee's products are in between (a mini computer like design that has
been 'biased' in some ways to make it run RDBMS faster.)  In Britton Lee's case
the amount of special purpose hardware even depends on the product line
(higher priced / higher performance BL 700 having more special purpose hardware
than the lower priced / lower performance BL 300).  Not suprisingly the 
Teradata is the most expensive, while Sybase is least expensive, and Britton
Lee is again in between.

This same pattern also occurs over time.  In 1980 'N typical foo type 
transactions' might have required special hardware.  In 1987 that same
load might be achieveable with normal hardware; but might be cheaper if
done with special purpose hardware.  By 1990 it may be cheaper if done
with vannilla hardware.  But even in 1990 there will be other loads (perhaps
2*N or 10*N, or 'foo bar' transactions rather than 'foo' transactions)
which do require special purpose hardware.  Does that restrict database
machines to the high end?  Perhaps, we'll just have to wait and see.  Does
that restrict vendors of database machine based Relational Systems to the
high end?  Only if they perceieve themseleves as "special-purpose hardware
people".  If we perceieve ourseleves as "high performance relational database"
people then we don't have to compete with DEC and Sun.  We buy from others 
what works well and we make only what we can do better.  The exact mix of
what we make vs buy depends on the price/performance target for the product
and availability of approprate products from other vendors.

In summary, perhaps Jim and I do agree about the low end.  In the medium end
perhaps a mixture of vanilla hardware and some special hardware.  In the
high end for for some time to come I expect to see special purpose hardware.

--miket  (Mike Tossy)

(I do work for Britton Lee, Inc.  However these are only my personal opinions
and do not necessarily represent they views of Britton Lee, Inc.)

larry@ingres.Berkeley.EDU (Larry Rowe) (06/17/87)

Several comments on the recent discussion of ``database machines.''

1. I too am skeptical that custom hardware can be made price/performance
competitive with software database systems.  While I agree with the folks 
from Britton-Lee that they can use new technology to build the next
generation hardware sooner than a vanilla hardware vendor, I don't think
they can sell enough boxes to make a very profitable business.  Britton-Lee
has had a rough time the past 12-18 months because they are selling products
based on 2-5 year old technology (Z8000 + custom processor).  The rumors
about their new machine are that it is a tightly-coupled, shared memory
processor.  You can buy the same hardware from a vanilla vendor today
and run a software DBMS on it.  Examples are: stratus, sequent, encore, 
mips, etc.  The software DBMS's will do the same thing that Britton-Lee
will do in terms of shared memory buffer managers, etc so the solutions will
be roughly the same.  (Of course, at any given time one vendor's product will
be ahead or behind another vendor's -- Britton-Lee has done a good job
delivering DBMS software.)

   Now here's the rub.  The vanilla hardware vendors will sell several 
thousand of their boxes.  When DEC delivers their tightly-coupled, shared 
memory processor, they will deliver tens of thousands.  Britton-Lee will be 
lucky to sell a thousand.  The vanilla vendors will have more sales over 
which to amortize their costs.  They will drive the cost down on the boxes 
as they compete and Britton-Lee will have a harder time maintaining 
their margins.

   The key advantage that Britton-Lee has is their software.  The software
DBMS vendors have not directly attacked this market (e.g., by oem'ing 
hardware and doing more software customization) because the market for sales
is much, much greater in the ``run everywhere'' and distributed, heterogeneous
DBMS markets.  They have 50-100 man-years of development to do to be
competitve in that market.  The dbmachine market is too small.

2. The above analysis says that Sybase has a creditable strategy because they
are doing software customization on a few machines.  A good example is their
Unix kernel mods for the Sun.  They deliver improved performance at a 
specific cost -- running a nonstandard OS.  It remains to be seen if Sybase
can deliver a robust system that matches the advertised performance claims 
in a production environment.  Also, they will be pressured into ``running 
everywhere'' (they've announced a VAX product and their recent venture with
Microsoft suggests a lot of work on PC hardware) and they will quickly fall
into the morass of customizing the code for N environments (e.g., do you
run on the VAX cluster yet, how's your MAP network protocol support, ...).

3. Teradata sells custom hardware and software.  However, it is my opinion
that 90% of their advantage comes from the fact they are running a distributed
DBMS on multiple machines.  The only novel feature of their architecture
is the Y-Net that they claim gives them big performance improvements.  I'd
really love to see a benchmark with identical hardware/software except
a different network.  One advantage that I can see to the Y-Net is 
the parallel sorting capability.  However, Jim Gray wrote a parallel sort
package at Tandem that speeded-up sorting to roughly twice the time to
read the data (i.e., you must read and write the data at least once --
sorting is overlapped completely with it).  So, how does a software parallel
sort compare to the Y-Net.  Another possible advantage of the Y-Net is
response-time and throughput during heavy loads.  A loaded LAN can be clogged
when many messages are clogging the net.  Remembe that distributed DBMS's
will ship a lot of data around to answer adhoc queries.  Another interesting 
experiment, how fast an ethernet or token-ring is needed to achieve the same
performance.

4. The Tandem high transaction rates come from a vanilla distributed relational
DBMS running on vanilla hardware.  The big difference is that they have spent
10 years optimizing their storage system, buffer manager, logging system, etc.

5. Another thing.  When doing performance comparisons, it is important to
compare apples and apples.  Numbers ought to be $'s/xact (guess what, a
program on an ibm 3094 is faster than a vax!) and/or use identical hardware
(2 processors are better than 1).  i'm tired of seeing claims that a dbmachine
is faster than a loaded central machine.  of course it is, the central machine
has other things to do.  compare the performance/cost to buying a larger
central machine or buying a second general purpose processor.  

-------
Bottom line:

1. Hardware is nice but software is cheaper and probably faster.
   (Larry's Lament:  Hardware companies build bigger valuations faster
   because of the size of the business (i.e., more revenues and more expenses).
   Software companies ought to produce higher higher profits....)

2. Distributed DBMS's are a big, big win.  Every DBMS vendor better have
   one by 1990 or they be seriously disadvantaged in the marketplace.  So,
   where are all these vendors going to find captial to fund a 20 man-year
   project to build a distributed DBMS?

3. Benchmark wars will continue to be fought and they might tell you something,
   and then again, they might lie.

billc@blia.BLI.COM (Bill Coffin) (06/22/87)

>From larry@ingres.Berkeley.EDU (Larry Rowe) Wed Jun 17 09:19:02 1987
>Several comments on the recent discussion of ``database machines.''
>
>1. I too am skeptical that custom hardware can be made price/performance
>competitive with software database systems.  [ ... ]

I may be biting the hand that feeds, but I agree with this.  Partly, Britton
Lee's approach has historical reasons.  When the first BLI box came out,
there was no off-the-shelf hardware that could easily be used. That's
no longer the case.

>2. The above analysis says that Sybase has a creditable strategy [ ... ]
> [ ... ] they will quickly fall
>into the morass of customizing the code for N environments [ ... ]

This is a problem on all distributed and all server architectures.
Even on a one-machine server architecture you need host software.

> [ ... ]
>5. Another thing.  When doing performance comparisons, it is important to
>compare apples and apples.  Numbers ought to be $'s/xact (guess what, a
>program on an ibm 3094 is faster than a vax!) and/or use identical hardware
>(2 processors are better than 1).  i'm tired of seeing claims that a dbmachine
>is faster than a loaded central machine.  of course it is, the central machine
>has other things to do.  compare the performance/cost to buying a larger
>central machine or buying a second general purpose processor.  

This is odd.  When you compare a loaded front end vs. a dbmachine, you
are comparing real-life usages.  If you care about db speed, then you must
consider the typical work loads.  Secondly, buying a bigger central 
machine may solve a "raw" speed problem, but server architectures still
solve the problem of sensitivity to the host work-load.  Most people 
really do care if some host process causes db accesses to slow to a
crawl, or if db access causes other important (non-db) processes to
wimp out.  A faster machine may get things going faster, but the
sensitivity is still there. 

(Server architectures have other benefits as well.
I won't go into all the server-architecture arguments
here, but you can't ignore these factors.)

>2. Distributed DBMS's are a big, big win.  [ ... ]

Why?  I'm convinced that many of the people who THINK they want 
distributed DBMS's REALLY need server architectures.  See Jim Gray's
article in the May UNIX REVIEW.  I won't elaborate here -- but
I would like to see a distribution vs. server discussion on the net.
For the record, I'm not "opposed" to distributed DBMS architectures,
but I DO think they're being oversold.  There are job mixes that
make servers look bad, and there are job mixes that will make
distributed dbms's look bad.  Mr. Natural says, "get the right tool
for the right job."

>3. Benchmark wars will continue to be fought and they might tell you something,
>   and then again, they might lie.

Or to paraphrase an old quote, "There are lies, there are damned lies,
and then there are benchmarks."
-- 
W.H.Coffin.  billc@blia.BLI.COM (ucbvax!{mtxinu|ucsfcgl}!blia!billc)
 >> the usual disclaimer about my employer and my wretched opinions. <<
 >> the usual witticisms that swell netnews to ridiculous proportions. <<

larry@ingres.Berkeley.EDU (Larry Rowe) (06/23/87)

In article <2861@blia.BLI.COM> billc@blia.BLI.COM (Bill Coffin) writes:
>>From larry@ingres.Berkeley.EDU (Larry Rowe) Wed Jun 17 09:19:02 1987
>>Several comments on the recent discussion of ``database machines.''
>>
>>2. The above analysis says that Sybase has a creditable strategy [ ... ]
>> [ ... ] they will quickly fall
>>into the morass of customizing the code for N environments [ ... ]
>
>This is a problem on all distributed and all server architectures.
>Even on a one-machine server architecture you need host software.

Remeber to distinguish between a hardware server and a software server.
again, you can build a software server dbms and run it on conventional
hardware.  btw, from what i can tell, most software vendors are implementing
(software) servers (e.g., rti, sybase, oracle, ...).
>
>> [ ... ]
>>5. Another thing.  When doing performance comparisons, it is important to
>>compare apples and apples.  Numbers ought to be $'s/xact (guess what, a
>>program on an ibm 3094 is faster than a vax!) and/or use identical hardware
>>(2 processors are better than 1).  i'm tired of seeing claims that a dbmachine
>>is faster than a loaded central machine.  of course it is, the central machine
>>has other things to do.  compare the performance/cost to buying a larger
>>central machine or buying a second general purpose processor.  
>
>This is odd.  When you compare a loaded front end vs. a dbmachine, you
>are comparing real-life usages.  If you care about db speed, then you must
>consider the typical work loads.  Secondly, buying a bigger central 
>machine may solve a "raw" speed problem, but server architectures still
>solve the problem of sensitivity to the host work-load.  Most people 
>really do care if some host process causes db accesses to slow to a
>crawl, or if db access causes other important (non-db) processes to
>wimp out.  A faster machine may get things going faster, but the
>sensitivity is still there. 
>

the point here is that without including cost/performance in your comparison
everyone would buy the biggest machine(s) that ran their desired software.
personally, i'd love to have a cray performance machine for my personal
workstation.  but, as we all know, it isn't practical.  i agree that a
``back-end'' dbmachine may be the most cost effective and may offer substantial
performance benefits.  what i don't agree is that it is the only solution
and that people should accept without questioning what their problem really
is.  for example, an equally plausible solution to a heavily loaded central
machine is to off-load user-interface/application programs to a personal
workstation.  from what i can tell, people don't really do these comparisons.

btw, another major advantage of all back-end dbservers is the ability to
interconnect different host computers.  for example, i think 50% of teradata's
sales come from the fact that they are the only dbms available today that
allows applications on VM and MVS to share access to a database.  britton-lee
has a similar advantage with pdp-11's and vaxes.  over time, all software
dbms's will offer similar features.
>
>>2. Distributed DBMS's are a big, big win.  [ ... ]
>
>Why?  I'm convinced that many of the people who THINK they want 
>distributed DBMS's REALLY need server architectures.  See Jim Gray's
>article in the May UNIX REVIEW.  I won't elaborate here -- but
>I would like to see a distribution vs. server discussion on the net.
>For the record, I'm not "opposed" to distributed DBMS architectures,
>but I DO think they're being oversold.  There are job mixes that
>make servers look bad, and there are job mixes that will make
>distributed dbms's look bad.  Mr. Natural says, "get the right tool
>for the right job."
>
i haven't read jim's article, but knowing him and having read some tandem
tech reports on the topic, i think i have some additional insights.  first,
application design for distributed applications is very, very hard.  average
people don't have the experience and the vendors products do not offer enough
help yet to make it easy to define them.  consequently, only pioneers and 
very brave people will attempt to build them.  btw, tandem has only recently
come out with an SQL interface to its distributed dbms offering.  i'll be
curious to see how much usage goes up now that end-users and mere humans
can access the distributed databases.

second, tandem's distributed dbms is a single-vendor hardware solution.  
when i visit companies and universities, senior managers say their number 
1 problem is managing the diversity of hardware/software that proliferates 
through the organization.  a distributed heterogenous dbms can cover up
this diversity and give people control again of their corporate data.  the
complete solution to this will take years to achieve, but from my discussions,
people really want it.  also, some new application growth areas (e.g.,
factory autmation) insist on distributed dbms's.  so, i stand by my statement.

btw bill, what happens to a britton-lee customer who's bought 5 machines
and now wants to query data that is spread across the machines?  do they have
to copy it by hand to one machine and run the query?  all a distributed
dbms does, is make that operation easier.

eric@hippo.UUCP (Eric Bergan) (06/24/87)

In article <2918@zen.berkeley.edu>, larry@ingres.Berkeley.EDU (Larry Rowe) writes:
> In article <2861@blia.BLI.COM> billc@blia.BLI.COM (Bill Coffin) writes:
> >>From larry@ingres.Berkeley.EDU (Larry Rowe) Wed Jun 17 09:19:02 1987
> >>2. Distributed DBMS's are a big, big win.  [ ... ]
> >
> >Why?  I'm convinced that many of the people who THINK they want 
> >distributed DBMS's REALLY need server architectures.  See Jim Gray's
> >article in the May UNIX REVIEW.  I won't elaborate here -- but
> >I would like to see a distribution vs. server discussion on the net.
> 
> i haven't read jim's article, but knowing him and having read some tandem
> tech reports on the topic, i think i have some additional insights.  first,
> application design for distributed applications is very, very hard.  average
> people don't have the experience and the vendors products do not offer enough
> help yet to make it easy to define them.  consequently, only pioneers and 
> very brave people will attempt to build them.  btw, tandem has only recently
> come out with an SQL interface to its distributed dbms offering.  i'll be
> curious to see how much usage goes up now that end-users and mere humans
> can access the distributed databases.
> 
> second, tandem's distributed dbms is a single-vendor hardware solution.  
> when i visit companies and universities, senior managers say their number 
> 1 problem is managing the diversity of hardware/software that proliferates 
> through the organization.  a distributed heterogenous dbms can cover up
> this diversity and give people control again of their corporate data.  the
> complete solution to this will take years to achieve, but from my discussions,
> people really want it.  also, some new application growth areas (e.g.,
> factory autmation) insist on distributed dbms's.  so, i stand by my statement.

	I think for this argument (and particularly for Gray's papers,
both in Unix Review, and also in the June, 1986 issue of IEEE Transactions
on Software Engineering), it is important to distinguish two very different
uses of relational databases. The first is what most of the database products
have been initially used for - ad hoc queries against a database, where the
number of queries far exceeds the number of updates. Typically such
applications are characterized by a relatively low number of transactions
per second, but the transactions themselves are probably more complex -
joins, aggregates, etc. The second is a much more transaction-oriented
system, the classic case being airline reservations. Here, transactions
tend to be simple, but the transaction rate is much higher. I think
Gray's comments are much more addressed to the transaction oriented
distributed applications.

	In a transaction oriented system, transparency is much less important.
At the time the application is written, the queries are determined, and it
is possible to map out what servers have the data that is needed for a
given transaction. There are almost no "ad hoc" queries which would require
some kind of distributed optimizer to sort out at query time. Gray's point
(which I think Larry is talking about), is that there are enough other
headaches in a transaction oriented system just with the networking, and
understanding the design of the application, without having to worry about
how efficiently the database system decides to process the query. This is
especially true in hooking up heterogeneous machines and databases. The
chance of having an MVS VSAM file become a "transparent" part of
a distributed relational database are pretty small. But it is feasible
to hook up a server to it, that can participate in a distributed
requester/server model.

	This does, of course, have the problem of having to change the
application(s) if you decide to move the data partitioning around. I 
like the Sybase approach to this - namely the transactions are stored
in the database itself, rather than in the applications. While you still
have to change the transactions if you change the schema, at least they
are all in one place, and the applications themselves do not have
to be rebuilt.

	I think the real challenge for the database vendors will be
how to interact with other database products - primarily non-relational
ones. Surprisingly many of the corporate databases are under VSAM files
or the like - very few are relational. Given that, trying to force 
relational semantics on these databases is going to be very difficult.
While I believe that it is likely that someday, most of these corporate
databases will convert to relational databases, I think that the transition
time will be at least 10 years, maybe longer. The transition will happen
as applications are replaced - not because they are converted.

	One final point in the distributed vs. single server discussion.
Very few applications live in a vacuum (or if they do, it was because
they were forced to). Almost all of them would like to be able to
share data with other related applications that already exist, have their
own database systems in place, and either work perfectly well, or would
be too expensive to convert to something else. A single server model
does not seem able to handle the economics (and sometimes politics) of
such a case. A distributed system (not necessarily "transparent") does
allow the new applications to share the data with a minimum (no?) 
impact on the existing applications.

	Bill - do you envision a single server approach also being
desirable in the case of a geographically distributed system, where the
sites are primarily autonomous, but some data replication and cross
site queries are good? How would you design such a system?
-- 

					eric
					...!ptsfa!hippo!eric

whwb@cgcha.UUCP (Hans W. Barz) (06/25/87)

In article <2891@zen.berkeley.edu>, larry@ingres.Berkeley.EDU (Larry Rowe) writes:
> Several comments on the recent discussion of ``database machines.''
> 
>   THE OTHER LINES HAVE BEEN REMOVED FOR READABILITY OF THIS NEWS

Usually I do not add news in this group, since I have the impression that
most of the partipicants are working only with small databases -- i.e.
< 300 MB

Some comments in a recent news on database machines -- see above -- did make
me to write these lines since these databases are usefull for bigger
applications:

1) The Teradata system -- name origins from TERABYTE -- is no competitive
   system to the Britton-Lee. The performance and price range is much above.

2) Distributed Database systems have usually a worse performance compared to non
   distributed systems. The only chance for a distributed database to gain
   performance is by partitioning a single job or query to multiple
   processors. This is only works currently for the Teradata.

3) The Teradata is not really a distributed database, since the communication
   in the Y-NET is synchronously while different processors are working
   asynchronously on the data -- this concept is not trivial and I cannot
   explain it completely in some lines.

4) I have done benchmarks on Teradata, DB2(3090), Britton-Lee(IDM) and 
   Tandem-SQL.  It is rather difficult to summarize that, but Teradata has the 
   best price/ performance rate and delivers usually the best performance for 
   the four above machines.

H.W.Barz, WRZ, CIBA-GEIGY, CH

billc@blia.BLI.COM (Bill Coffin) (06/25/87)

>From larry@ingres.Berkeley.EDU (Larry Rowe) 
>>Even on a one-machine server architecture you need host software.
>
>Remeber to distinguish between a hardware server and a software server.
>again, you can build a software server dbms and run it on conventional
>hardware.  

Yes, I noted that.  In fact, I think a s/w server will beat a hardware
server in price/performance except at the high end (the people who
will pay anything for the extra speed).  However, an effective server
needs to control its hardware.  I'm saying that a server should have
a whole machine to itself and should have a custom operating system
(or an off-the-shelf OS that has been pacified).

> btw, from what i can tell, most software vendors are implementing
>(software) servers (e.g., rti, sybase, oracle, ...).

I think there is a difference between a server (hardware or software) and
a distributed dbms.  Oracle and, I think, RTI are implementing 
distributed dbms's.  Sybase is implementing a server.

My point was that there is a proliferation problem
no matter which architecture.  Perhaps life is a bit simpler at BLI,
since we don't have to port the dbms internals, but we still have to 
port the host software.  There's no free lunch in this area.

>the point here is that without including cost/performance in your comparison
>everyone would buy the biggest machine(s) that ran their desired software.
>personally, i'd love to have a cray performance machine for my personal
>workstation.  but, as we all know, it isn't practical.  i agree that a
>``back-end'' dbmachine may be the most cost effective and may offer substantial
>performance benefits.  what i don't agree is that it is the only solution
>and that people should accept without questioning what their problem really
>is.  for example, an equally plausible solution to a heavily loaded central
>machine is to off-load user-interface/application programs to a personal
>workstation.  from what i can tell, people don't really do these comparisons.

I guess that I'm in agreement here.  People need to analyze real-life
situations when they choose a dbms configuration.  Certainly a back-end isn't
the only solution, and frequently it's a poor solution.  btw, I think
your alternative solution enhances the server argument -- why not
offload the user-interfaces to bitmapped workstations (application
servers) AND offload the dbms to a dbms server?  And setup the old
mainframe as a number-crunching server, and so on.  It's nice to be 
able to hook up servers based on specialized abilities rather than put apples
and pancakes in the same bag (ie: mutually antagonistic processes in
the same machine).

Anyway, comparing a back-end with an unloaded front-end is certainly
not examining a real-life situation.

>btw, another major advantage of all back-end dbservers is the ability to
>interconnect different host computers.
> [ ... ] britton-lee
>has a similar advantage with pdp-11's and vaxes.  

Minor plug (here we go!): We also support data sharing between 
VM/CMS, Vax VMS, most major flavors of UNIX (SysV, BSD, Ultrix), MS-DOS, 
and a lot of other machines.

> [ ... ]
>second, tandem's distributed dbms is a single-vendor hardware solution.  
>when i visit companies and universities, senior managers say their number 
>1 problem is managing the diversity of hardware/software that proliferates 
>through the organization.  a distributed heterogenous dbms can cover up
>this diversity and give people control again of their corporate data.  the
>complete solution to this will take years to achieve, but from my discussions,
>people really want it.  also, some new application growth areas (e.g.,
>factory autmation) insist on distributed dbms's.  so, i stand by my statement.

OK, but it seems to me that heterogeneity and distribution are two different
issues.  I haven't yet seen any good solutions to the heterogeneity problem,
just some talk.  (Gray's article has some interesting observations on
this.)  Server architectures are well-understood and
currently available.  Mature heterogeneous/distributed dbms's are still 
in the future.  Even whey they do mature, there will still be many 
situations in which a server is a clear win.

>btw bill, what happens to a britton-lee customer who's bought 5 machines
>and now wants to query data that is spread across the machines?  do they have
>to copy it by hand to one machine and run the query?  all a distributed
>dbms does, is make that operation easier.

No, it's easier than that (now).  However, it is not location transparent.
OK, a distributed dbms is different from a server.  They solve some similar
problems, but they are not the same.  I think it's going to be awhile
before distributed dbms's can match the performance, reliability,
and security of servers.  The very nature of a server is that you don't 
worry about distributing data over many of them.  If you do worry about 
that, then the server is probably not the solution you're looking for.
(And, if you lick the heterogeneity problem then you can have N servers
transparently within your distributed database.)
-- 
W.H.Coffin.  billc@blia.BLI.COM (ucbvax!{mtxinu|ucsfcgl}!blia!billc)
 >> the usual disclaimer about my employer and my wretched opinions. <<
 >> the usual witticisms that swell netnews to ridiculous proportions. <<

billc@blia.BLI.COM (Bill Coffin) (06/25/87)

In article <131@hippo.UUCP>, eric@hippo.UUCP (Eric Bergan) writes:

> 	Bill - do you envision a single server approach also being
> desirable in the case of a geographically distributed system, where the
> sites are primarily autonomous, but some data replication and cross
> site queries are good? How would you design such a system?

Of course not.  This is the classic case where a distributed dbms is
exactly what's wanted.  I think there are many cases where you can
change a few of those requirements and find that a server will do.
For instance, salesmen who carry laptops and occasionally dial in
updates or queries.  Or systems where security is critical.

Anyway, servers and distributed dbms's are not necessarily mutually
exclusive.  The classic distrib model has single machines in separate
cities (this is the model shown on the teacher's blackboard when
"distributed databases" is the day's topic).  This is probably
unrealistic; nodes on a distributed system could include several
dissimilar LANs connected by long-haul lines and/or gateways. 
A LAN is a great place for a server.  A distributed dbms could 
be built on top of this model, treating the whole LAN, via its server, 
as a single node in the distributed dbms.

-- 
W.H.Coffin.  billc@blia.BLI.COM (ucbvax!{mtxinu|ucsfcgl}!blia!billc)
 >> the usual disclaimer about my employer and my wretched opinions. <<
 >> the usual witticisms that swell netnews to ridiculous proportions. <<

larry@ingres.Berkeley.EDU (Larry Rowe) (06/26/87)

In article <2877@blia.BLI.COM> billc@blia.BLI.COM (Bill Coffin) writes:
>I think there is a difference between a server (hardware or software) and
>a distributed dbms.  Oracle and, I think, RTI are implementing 
>distributed dbms's.  Sybase is implementing a server.
>
[yet another minor plug (YAMP)] rti currently sells a distributed dbms
that runs on vax's under vms.  it probably runs on some of the other
systems too, but i can't keep up with them.  oracle announced a distributed
dbms at a big news conference in new york with projected delivery
dates in late 86 early 87.  in early 87 they ``withdrew the product.''
of course the press covered the first announcement but neglected to mention
the second.  so, it doesn't seem to matter whether a system actually works
as advertised... sigh!

on another point.  rti did a joint study with one of their larger customers
that built a database gateway to ibm/mvs databases.  the architecture is:

			ingres/star (dist-dbms)
		      /       |      \
		    /         |       \
		  /           |        \
		/             |         \
	     ingres	   ingres 	ibm-gateway
					    |
					    |
					    |
					db2/ims/vsam

the user enters arbitrary sql queries to ingres/star which optimizes 
and executes the query.  joins across machine boundaries worked.  
interestingly, the ibm-gateway used an ibm data extract product (DXT)
to get data out of the databases on mvs.  it worked faster than most folks
expected so that joins across machine boundaries actually ran credibly.
this system is not a product today, but i suspect rti and most relational
system vendors will be deliverying similar products over the next couple
of years.

distributed databases and gateways to other data stores (file systems
or data managers) will be very useful tools when they are widely available.
i agree with bill that it will be a couple of years before these configurations
are widely available and as reliable as single-site relationa systems are
today.  but, if your company/vendor isn't working on them today, you'll
be significantly behind and struggling to catch up.
	larry

erics@cognos.uucp (Eric Schurr) (07/02/87)

>	This does, of course, have the problem of having to change the
>application(s) if you decide to move the data partitioning around. I 
>like the Sybase approach to this - namely the transactions are stored
>in the database itself, rather than in the applications. While you still
>have to change the transactions if you change the schema, at least they
>are all in one place, and the applications themselves do not have
>to be rebuilt.
>

This statment intrigues me.  I don't know anything about SyBase--do
they allow you to model/define *transactions*?  Is this simply referring
to table (file/record) definintions or to the much broader--and more
complicated--notion of a transaction?  What mechanism do they use to
define and report this?


-- 

Eric Schurr           3755 Riverside Dr.
Cognos Incorporated   Ottawa, Ontario       decvax!utzoo!dciem!
(613) 738-1440        CANADA  K1G 3N3       nrcaer!cognos!erics

markh@rtech.UUCP (Mark Hanner) (07/06/87)

In article <1026@sirius.UUCP> erics@cognos.UUCP (Eric Schurr) writes:
>In article <131@hippo.UUCP> eric@hippo.UUCP (Eric Bergan) writes:
>>	This does, of course, have the problem of having to change the
>>application(s) if you decide to move the data partitioning around. I 
>>like the Sybase approach to this - namely the transactions are stored
>>in the database itself, rather than in the applications. While you still
>>have to change the transactions if you change the schema, at least they
>>are all in one place, and the applications themselves do not have
>>to be rebuilt.

you can go one step further and prevent having to alter your applications
at all when redistributing data in a distributed database: have the query 
language use a distributed database catalog for defining "aliases" for the data 
locations. in ingres/star, there is a catalog which contains the "alias",
and the associated node/database/table name information required to locate the
data. thus, the following statement in an application:

	select * from staff;

does not need to be changed only because the database administrator moved the
data table:

	create temporary link oldstaff 
	with node = corp, database = personnel, table = staff;

	create table staff 
	as select * from oldstaff
	with node = newcorp, database = newpersonnel, table = staff;

this brings up the general problem of maintaining large systems with hundreds
or thousands of applications in various stages of the product life cycle. the
more difficult it is for the system administrators to tune their systems 
(including load balancing, where distributed database provides some unique
opportunities), the slower those systems will run. applications ideally 
should be built without respect to performance, but allow the system
administrator to tune the performance of applications to match real system 
loads through techniques such as changing storage structures, adding indexes,
moving data to different disks or nodes, etc. as networks get more complex
and user demand more distributed data capability ("i know it's on the ibm
mainframe, but i want to be able to use it with data on my department's vax"),
giving this flexibility to the system administrator will be essential.

>This statment intrigues me.  I don't know anything about SyBase--do
>they allow you to model/define *transactions*?  Is this simply referring
>to table (file/record) definintions or to the much broader--and more
>complicated--notion of a transaction?  What mechanism do they use to
>define and report this?
>

yes, sybase does store transactions in the database:

	define transaction getpayroll as
	select s.name,s.salary*t.hours
	from staff s, timecard t
	where s.name = t.name and s.type="HOURLY";

and then use:

	exec sql getpayroll;

in an application to run the query. 

the next step is to allow the definition of assertions (referential integrity)
and asynchronus actions ("whenever reactor_coolant < 100 then update valves 
(coolant = coolant + 3)"). making these features work will depend greatly 
upon how much application development support is provided to help manage the
proliferation of application objects stored in databases. the world is just
now beginning to agree on sql as a standard for interaction with relational
databases. but CASE and other major advances in application development
technology are begging for standards in data dictionaries and display 
environments, where there are multitudes of feuding factions. this discussion
may belong in comp.case, but a database system is useless unless its easy to
build applications with it...

cheers,
mark
-- 
markh@rtech.UUCP
ucbvax!mtxinu!rtech!markh
"someone else was using my login to express the above opinions..."

bradbury@oracle.UUCP (Robert Bradbury) (07/16/87)

In article <2943@zen.berkeley.edu>, larry@ingres.Berkeley.EDU (Larry Rowe)
 writes:

> [yet another minor plug (YAMP)] rti currently sells a distributed dbms
> that runs on vax's under vms.  it probably runs on some of the other
> systems too, but i can't keep up with them.  oracle announced a distributed
> dbms at a big news conference in new york with projected delivery
> dates in late 86 early 87.  in early 87 they ``withdrew the product.''
> of course the press covered the first announcement but neglected to mention
> the second.  so, it doesn't seem to matter whether a system actually works
> as advertised... sigh!
> 
Oracle version 5.1 does support distributed access to hemogeneous and
heterogeneous machines.  The big stumbling block which caused the delay was
building enough network interfaces to make the product really useful.
The database stuff is relatively simple compared to the variety of network
interfaces required:  VMS: (DECNET,TCP/IP(Excelan,Wollongong),Async);
IBM: (3270,DECNET,VTAM,TCP/IP); PC: (3270,DECNET,TCP/IP,Async);
UNIX: (Async,TCP/IP).

The VMS production release with DECNET support has been around for months.

The UNIX production releases of 5.1 with TCP/IP support should be available
next month for: 3B2, 3B5, 3B20, Sun, Apollo, Ultrix, Sequent and Xenix.

The PC and mainframe products with network support should see the light of
day in the Sept-Oct time frame.

A simple calculation of the number of machines on which Oracle runs
and the number of network interfaces possible on those machines indicates
that there are 10's (perhaps 100's) of different machine/network combinations
for which code must be written.  (Opinion: vendors are going to fritter
away man years interfacing to networks unless an IEEE/ANSI committee adopts a
standard interface).

>
>  Comments about Ingres to DB2 interface.
>
> this system is not a product today, but i suspect rti and most relational
> system vendors will be deliverying similar products over the next couple
> of years.
> 
Oracle's interface to DB2 (SQL*CONNECT) is currently in alpha testing at
a major customer site.  It should be generally available before the end
of the year.

> distributed databases and gateways to other data stores (file systems
> or data managers) will be very useful tools when they are widely available.
> i agree with bill that it will be a couple of years before these configs
> are widely available and as reliable as single-site relationa systems are
> today.  but, if your company/vendor isn't working on them today, you'll
> be significantly behind and struggling to catch up.

While the interface between a RDBMS and DB2 is fairly straight forward
interfaces to IMS and ISAM files are less so.  We estimate a good
interface to IMS (including 2 phase commit and transaction recovery)
to be a 10+ man-year project.  An interface to ISAM files is simpler
(perhaps 6 man-months) but requires alot of user "interfacing" due
to the lack of a data dictionary.  It isn't clear that RDBMS and
hierarchical/flat-file interfaces will ever be useful for anything
other than retrievals due to the matching problems in transaction
and locking models.

As always none of the above should be construed as a commitment by Oracle.
The dates are however from the product managers and should be accurate.

-- 
Robert Bradbury
Oracle Corporation
(206) 364-1442                            hplabs!oracle!bradbury

mcclure@tut.cis.ohio-state.edu (James Edward McClure) (09/14/88)

	Could someone please tell me exactly what a data base machine is and
how if differs from a DBMS (besides being hardware based)?

Thanks for your help!!

andy@garfield (Andy Lowry) (09/15/88)

In article <21755@tut.cis.ohio-state.edu> mcclure@tut.cis.ohio-state.edu (James Edward McClure) writes:
>
>	Could someone please tell me exactly what a data base machine is and
>how if differs from a DBMS (besides being hardware based)?
>
>Thanks for your help!!

A database machine is not necessarily hardware based (in the sense of
containing hardware that was specially designed for database
processing).  The term has been used in many different ways, but
generally I think they all share the feature that there is some piece
of hardware that is dedicated to some or all of the task of database
processing.  That could be anything from a general purpose computer
that isn't used for anything but database processing (in which case
there is probably a severely stripped-down low-overhead operating
system supporting the database software) to a highly engineered
collection of special-purpose hardware like associative memories and
hardware sorters.  The system configuration could be such that the
database machine sits as a self-contained unit serving as a back-end
processor providing complete database services to one or more hosts,
or it could simply mean that there are some smart peripherals like
disks with on-the-fly filters on their heads to make the disk behave
associatively.  There have been over a hundred of proposals since the
late 1960's for database machines of varying scope and complexity, and
many of these machines have been prototyped.  A few have even been
offered commercially.  A book by Stanley Su, just published this year,
gives the most comprehensive survey of the area that I have
encountered.  Here's the reference, a la Scribe:

@book(su88a,
	key="Su",
	author="Stanley Y.W. Su",
	title="Database Computers: Principles, Architectures, and
Techniques",
	publisher="McGraw-Hill",
	address="New York",
	year="1988")

If you'd like something a little less ambitious, I wrote a 30-page
survey this past spring titled "Synchronization, Communication and I/O
Factors in Database Machine Performance" that I would be glad to send
you (or anybody else).  It does not describe the machines it covers in
great detail, but rather explores the problems mentioned in the title
and the ways various designs have attempted to overcome them.  The
bibliography will also point you to many detailed papers and some
other good surveys.

-Andy Lowry

dberg@cod.NOSC.MIL (David I. Berg) (09/15/88)

In article <21755@tut.cis.ohio-state.edu>, mcclure@tut.cis.ohio-state.edu (James Edward McClure) writes:
> 
> 	Could someone please tell me exactly what a data base machine is....

A database machine (DBM) is a unit of hardware with a hard coded DBMS in its
firmware and some amount of high-speed disk space to store the data base.
It is usually used as a back-end processor to offload the processing of
data base queries and I/O to the data base from the remote computer(s).
One or more remote computers can be connected to it directly or via a 
network.  Queries are formulated remotely and sent to the DBM for processing;
results are then directed back to the source of the query.

-- 
David I. Berg (dberg@nosc.mil)
GENISYS Information Systems, Inc., 4250 Pacific Hwy #118, San Diego, CA 92110
MILNET: dberg@nosc.mil
UUCP:   {ihnp4 akgua decvax dcdwest ucbvax}!sdcsvax!noscvax!dberg

mike@blipyramid.BLI.COM (Mike Ubell) (09/16/88)

By my definition a database machine is a system that has been architected
specificly for database management tasks.  It may contain specialized
hardware or general hardware components in a system with special architectual
features to support dbms tasks.
The machine will include software to perform the dbms tasks.
The two companies who have been selling database machines the longest,
Teradata and Britton Lee, both use standard processors in their current
offerings with some specialized hardware in a total system architected
for DBMS.  Teradata has special interconnection bus that connects
many specialized 80x86 processor boards (up to 1k I believe).
The bus the patented Y-bus that acutally has active compenents that
can do data merging and I believe some concurancy control.
Britton Lee provides a family of systems with two or more specialized z8000
processor boards and an optional Data Base Accelerator which is a custom
logic search engine.
Our newest product uses a custom processor plus 68020 based I/O processors
connected to a large shared memory.
There have been many proposed, and some built, very specialized dbms
processors in the literature.

(The foregoing is not intended as a sales pitch, sorry if it comes across
as such).

DMasterson@cup.portal.com (09/16/88)

>	Could someone please tell me exactly what a data base machine is and
>how if differs from a DBMS (besides being hardware based)?
>
     In general, there is little difference.  The concept is becoming blurred.
Typically, a database machine is a smart machine to which database requests
can be passed (typically in an SQL/Quel like language) and it will process the
query and return the relevant rows from the database.  This assumes, of
course, a relational database machine which need not be the case.  There have
been textual database machines -- even a file server might be considered a
database machine!  Originally, database systems that implemented a front-end,
back-end mechanism on two different machines was considered a database
machine.  Now, however, a lot of the database systems that grew up as
software-only database systems are going that route (Ingres/Star,
Informix-Turbo, Oracle(?), Sybase), so the distinction has definitely become
blurred in the commercial sense.

David Masterson
DMasterson@cup.portal.com

sysop@stech.UUCP (Jan Harrington) (09/21/88)

in article <21755@tut.cis.ohio-state.edu>, mcclure@tut.cis.ohio-state.edu (James Edward McClure) says:
> 
> 
> 	Could someone please tell me exactly what a data base machine is and
> how if differs from a DBMS (besides being hardware based)?
> 
> Thanks for your help!!
   
There are a number of definitions of a database machine, though the one most
commonly used is a computer (usually a mini) set-up as a slave to a host
computer which is dedicated to database processing. The database machine
usually runs a standard DBMS.                  

The database machine offloads much of the database processing from the
host computer. Since database processing is often CPU bound (lot of work is
required to translate a users logical requests for data into physical 
storage locations), having a database machine can speed operations, since
the host computer can be doing other things besides address translations.
People seem to feel that a database machine increases throughput.

Does that help?
 
Jan Harrington, sysop
Scholastech Telecommunications
UUCP: husc6!amcad!stech!sysop or allegra!stech!sysop
BITNET: JHARRY@BENTLEY

********************************************************************************
	Miscellaneous profundity:

		"No matter where you go, there you are."
				Buckaroo Banzai
********************************************************************************

roger@esquire.UUCP (Ro Reid) (09/22/88)

in article <21755@tut.cis.ohio-state.edu>, mcclure@tut.cis.ohio-state.edu (James Edward McClure) says:
>
>      Could someone please tell me exactly what a data base machine is and
> how if differs from a DBMS (besides being hardware based)?
>
> Thanks for your help!!

Speed (fast as hell, especially on complex queries)
Cost  (more than a software-based DBMS by quite a factor)
Number of things that can go wrong (2-3 times more than a software
				    DBMS resident on the host)

Elaboration:  At one time the ONLY way to get serious speed without
completely swamping your host was to have a separate box to offload
the database work to.  This box has it's own hardware and software
specifically designed to be fast at RDB type things, instead
of being general purpose.
     The drawbacks were always there: more hardware to fail,
a network (or some sort of interface) is involved and can fail,
and lack of portability: in a Unix shop, you can move your
software to the newest, latest fastest box, regardless of
vendor.  A hardware database machine is about a proprietary
a beast as there is.
     Things have changed since we bought our first database machine
7 years ago.  The 11/70 is no longer the workhorse of the Unix
world, there are some fast, relatively cheap boxes out there and
they keep getting faster and faster.  So now many applications
can afford to use the more generalized hardware to take advantage of the
speed gains as they are developed, even if you maintain the separation
between the front end and the back end, and basically make yourself a
database machine by loading a software-based DBMS on one box and
using another box as host.
     My experience is that right now, there are some very good
software-based DBMS's out there that can negate the need for
a specialized database machine.  This holds until you start doing
things above a certain level of complexity. For example, we
found a software DBMS that was faster than our database machine,
until we hit it with 5,6,7 way joins.  And then the software
system went to hell in a bitbucket, while the hardware system
continued to be fast.  We also found that software systems
didn't do so well when the conditions for the retrieve got
real complex and nasty.
     These are by no means final benchmark figures on all software
systems, which is one reason I'm not naming names.  But if you
are not going to torture a database system the way we do, you
probably should stick to software.  It's when you find that there
is no software out there that can do the job for you that you
have to start considering database machines.
     We still have hopes that we can move to a software based
system, if we can find one that can get the work done in the
available CPU cycles for us.
     If you want architectural type details, you might contact
BrittonLee; they might be glad to fill you in.  I've had it explained
to me before and it's amazing the things they do to make that sucker
hum!
-- 
				   Ro Reid

			      {rutgers|phri|cucard}!cmcl2!esquire!roger
			      uunet!esquire!roger
			      roger@woof.columbia.edu

"...to understand is always an ascending movement; this is why
comprehension ought always to be concrete. (one is never got out
of the cave, one comes out of it.)"
	  -Simone Weil, First and Last Notebooks

kfw@ecrcvax.UUCP (Fai Wong) (10/09/89)

	Hi, everybody,
 
	I am seeking for information on commercial database machines. 
	I've read about the Britton-Lee IDM, the Intel iDBP and
	the Teradata DBC machines. The information I have describing
	these machines are out-of-date (ca. 1985). Could someone
	tell me if these companies still exist ? If they do, how could
	I contact them ? I would also appreciate any information on other
	existing commercial database machines.  

	Many thanks in advace.

	Cheers,
	Kam-Fai.