[comp.databases] Is RDBMS unproven technology?

tomr@ashtate (Tom Rombouts) (08/01/90)

Are relational databases an unproven technology regarding
performance?  Just to keep this group lively, here are some 
unauthorized excerpts from an article titled "Rude Awakening"
on page 23 of the July 30, 1990 Computerworld:

"A report by a British consultancy is throwing some cold water 
in the face of the primarily U.S. relational database management
system industry.  The report, titled 'Database, an Evaluation
and Comparision,' attempts to sort out the often misleading claims
of DBMS vendors - and to make some general statements about the use-
fulness of RDBMS technology itself.  < background material deleted >
....A key tenet of the report is that RDBMS technology has been
available for 20 years but still has not been proved in large,
complex applications.  The report notes that users associate
these products with poor system performance, even though they may
be flexible and easier to implement."  The article then goes on 
to cite a firm that is reluctant to replace IMS with DB2, and
discusses other sites that use a mixture of relational and
possibly non-relational systems.

Please, let's not start a war here telling each other how
wonderful relational technology is.  Just wanted to bring this
to the attention of this group.  Maybe others out there are
more familiar with the original British report.  Maybe others
have ideas on strategies to prove that "relational" does not
have to mean "overhead."  (Or does it?   :-)  )

I now stand back....

Tom Rombouts  Torrance Techie  tomr@ashtate.A-T.com  V:(213)538-7108

DISCLAIMER:  The above posting is intended to be informational only
and should not reflect my opinions or those of any known corporation.

tim@ohday.sybase.com (Tim Wood) (08/02/90)

In article <1073@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
>Are relational databases an unproven technology regarding
>performance?  
>
>....A key tenet of the report is that RDBMS technology has been
>available for 20 years but still has not been proved in large,
>complex applications.  The report notes that users associate
>these products with poor system performance, even though they may
>be flexible and easier to implement.  The article then goes on 
>to cite a firm that is reluctant to replace IMS with DB2, and
>discusses other sites that use a mixture of relational and
>possibly non-relational systems.

There isn't really enough information in your excerpt to comment on the
article.  The question, "Relational, yes or no?" is becoming less and
less germaine.  The question most important to most users is "Will
widget W solve my problem, P?"

Relational systems have so far been deployed in smaller-scale
applications than have hierarchical and network systems.  This is due
to several factors: relational is "newer" (that is, the technology
existed long before successful commercial products) and the older
database architectures were deployed in the days when nearly all
commercial computing resources were centralized and operating in a
batch-processing environment.  In that environment, updates and access
to the database are relatively rigidly controlled.

The appeal of relational systems has been the promise of flexible
access to the database by users far removed from the DP department.
The trend toward decentralized access has been strengthened by the
growth of processing power directly available to individual users, and
by the changing nature of applications themselves.

Relational systems lend themselves well to distributed database, where
by definition there will be fewer, if any, centralized points of
transaction processing activity.  So, an individual site in the
distributed relational database may look "slow" compared to IMS on an
IBM-MVS 3090, but the aggregate throughput of the networked database
can be prodigious.  

This is not to say that there is some theoretical limit to the
performance of individual relational database engines.  Indeed, a major
focus of the industry now is to develop local transaction processing
speeds that rival those of older architectures on a platform of similar
scale.  Today's technology is proving (already has, actually) that the
assertion that relational is slow is out-of-date.  What's more,
relational technology is solving the problems of distributed
applications better than the older architectures, at transaction speeds
that are so far adequate for most applications.  A recent Digital
Review survey asked relational users about their throughput
requirements.  They found that about 90% of applications required no
more than about 12TPS.  This TPS number will surely increase, as will
the ability of relational systems to carry more load.

Organizational reluctance to replace existing non-relational systems
is now very understandable.  Replacements will occur as the economic
benefits of the distributed high-performance relational model increasingly
outweigh the costs of changing.  Actual replacements will be preceded
by gradual integration of RDBMS into the organization's DP framework.
It is important for relational products to allow connection with
existing heterogenous systems, rather than requiring their replacement.
-TW
---
Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608	  415-596-3500
tim@sybase.com          {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
	One day, when I can afford enough lawyers, I will speak for
	    a whole company.  For now, I speak just for myself.

swfc@ulysses.att.com (Shu-Wie F Chen) (08/04/90)

In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes:
|>In article <1073@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
|>>Are relational databases an unproven technology regarding
|>>performance?  
|>>
|>>....A key tenet of the report is that RDBMS technology has been
|>>available for 20 years but still has not been proved in large,
|>>complex applications.  The report notes that users associate
|>>these products with poor system performance, even though they may
|>>be flexible and easier to implement.  The article then goes on 
|>>to cite a firm that is reluctant to replace IMS with DB2, and
|>>discusses other sites that use a mixture of relational and
|>>possibly non-relational systems.
|>

[some deleted stuff]

|>
|>Relational systems have so far been deployed in smaller-scale
|>applications than have hierarchical and network systems.  This is due
|>to several factors: relational is "newer" (that is, the technology
|>existed long before successful commercial products) and the older
|>database architectures were deployed in the days when nearly all
|>commercial computing resources were centralized and operating in a
|>batch-processing environment.  In that environment, updates and access
|>to the database are relatively rigidly controlled.

I don't see how these reasons (which are not incorrect) explain why
relational systems have so far only been deployed in smaller-scale
applications.

|>
|>The appeal of relational systems has been the promise of flexible
|>access to the database by users far removed from the DP department.

RDBMSs have made two contributions:

1. non-procedural access
2. data independence

I don't see what relational systems have to do with "the promise of
flexible access ... far removed from the DP department."  Are you
implying that network communication or client/server is restricted to
relational systems?

|>The trend toward decentralized access has been strengthened by the
|>growth of processing power directly available to individual users, and
|>by the changing nature of applications themselves.

Hmmm.  Last week I spoke with a Sybase tech support person who said that
Sybase's client/server architecture was geared toward having most of the
computation performed at the server end.  My response was "How about all
that CPU power directly available to the user?"  It seems that Sybase
feels that database computation should not be done at the client end(I
read this as personal workstation) because it would take away CPU cycles
for editing, reading news, etc.  They believe they can overcome the CPU
bottleneck at the server end.  This seems to contradict the above
statement by Tim (who works for Sybase).

[sorry for this digression, but Tim's position (which I agree with)
seems to differ from that of his company's]

|>Relational systems lend themselves well to distributed database, where
|>by definition there will be fewer, if any, centralized points of
  ^^^^^^^^^^^^^

Huh?  What definition?

I think relational systems lend themselves well to distributed databases
because they are set-oriented, rather than navigational systems like the
hierarchical and network models.  You can think in terms of sets of
tuples coming from each site instead of thinking on the level of
individual records.

|>transaction processing activity.  So, an individual site in the
|>distributed relational database may look "slow" compared to IMS on an
|>IBM-MVS 3090, but the aggregate throughput of the networked database
|>can be prodigious.  

Is this an argument for throughput over response time?  From the user's
point of view, it is much easier to gauge response time.

|>
|>This is not to say that there is some theoretical limit to the
|>performance of individual relational database engines.  Indeed, a major
|>focus of the industry now is to develop local transaction processing
|>speeds that rival those of older architectures on a platform of similar
|>scale.  Today's technology is proving (already has, actually) that the
|>assertion that relational is slow is out-of-date.  What's more,

I think that that assertion was proven incorrect about 10-15 years ago.

|>relational technology is solving the problems of distributed
|>applications better than the older architectures, at transaction speeds
|>that are so far adequate for most applications.  A recent Digital
|>Review survey asked relational users about their throughput
|>requirements.  They found that about 90% of applications required no
|>more than about 12TPS.  This TPS number will surely increase, as will
|>the ability of relational systems to carry more load.

The figure 12TPS by itself is meaningless.  How many users, what
architecture, etc. should accompany any figures.  Sybase claims 34 TPS
for 30(?) users on a Sun-4.  What do other vendors claim?

|>
|>Organizational reluctance to replace existing non-relational systems
|>is now very understandable.  Replacements will occur as the economic

RDBMSs have their benefits.  Non-RDBMSs have their benefits.  Though it
is true that RDBMSs are not as slow as anti-RDBMSers (of the great
debate at SIGMOD in the 70's) claimed them to be, they still do not
match the performance of navigational systems like IMS.  One of the
reasons that many corporations have not moved from IMS to relational
systems is for this exact reason.  12TPS may be acceptable to relational
users, but it surely isn't for IMS users.

|>benefits of the distributed high-performance relational model increasingly
                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Where?  Commercial RDBMS vendors claim high-performance.  What they are
really claiming is that their RDBMS performs faster than the
competition.  1000TPS is high-performance.  12TPS (or 34) is acceptable.

|>outweigh the costs of changing.  Actual replacements will be preceded
|>by gradual integration of RDBMS into the organization's DP framework.
|>It is important for relational products to allow connection with
|>existing heterogenous systems, rather than requiring their replacement.

As I stated earlier, the two major contributions of the relational model
have been non-procedural access and data independence.  However, the
implementation to provide these features will incur overhead that
navigational systems (like hierarchical and network) do not have to pay
for.  For instance, joins are a real big performance killer for
relational systems.  So there is some substance behind the users
associating relational "... products with poor system performance, even
though they may be flexible and easier to implement."[from the original
posting on the British report]

As Tom [the original poster] suggested, let's not start a war telling
each other how wonderful relational technology is.

But to answer Tom's question on whether "relational" has to mean "overhead":
Relational does not mean overhead, but since it provides more "features"
(flexible, easier to implement, easier to use(?)), some overhead *must*
be incurred.

I think a good discussion would be over where the overheads are.  For
starters, relational query compilation has to be smarter.  But they may
not (never?) be smart enough!?!

Flames to /dev/null
Disussion to comp.databases

*swfc

tim@ohday.sybase.com (Tim Wood) (08/06/90)

In article <13532@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes:
>In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes:
>|>
>|>Relational systems have so far been deployed in smaller-scale
>|>applications than have hierarchical and network systems.  
>
>I don't see ... why
>relational systems have so far only been deployed in smaller-scale
>applications.

What I'm driving at is that relational has not developed in a DP/MIS
context, and DP/MIS is where most of the large-scale business applications
have traditionally resided.  Relational is the architecture of choice for
the "bottom-up" development of organizational databases, where local DP
departments are creating relational databases to manager their local
operations, and looking for ways to tie all those local databases
together.

>|>The appeal of relational systems has been the promise of flexible
>|>access to the database by users far removed from the DP department.
>
>RDBMSs have made two contributions:
>1. non-procedural access
>2. data independence

True, but most users running canned applications won't be as aware of 
these features as applications programmers, who beenfit most from them.  
I was really only discussing the end-users, since they are the largest
group of database utilizers in an organization.  Your comment is
correct and rounds out my point.

>I don't see what relational systems have to do with "the promise of
>flexible access ... far removed from the DP department."  Are you
>implying that network communication or client/server is restricted to
>relational systems?

No, but relational seems to be the context in which client/server is 
most rapidly being deployed.  I do think it's easier to distribute a
relational db than a naviagational (thanks, good adjective) one, because
of the looser coupling among data objects.

>|>The trend toward decentralized access has been strengthened by the
>|>growth of processing power directly available to individual users, and
>|>by the changing nature of applications themselves.
>
>Hmmm.  Last week I spoke with a Sybase tech support person who said that
>Sybase's client/server architecture was geared toward having most of the
>computation performed at the server end.  My response was "How about all
>that CPU power directly available to the user?"  

Those MIPS are used for the applications.  That local power makes it economical
to perform complicated analysis and transformations on the data.  Basically,
the server preserves and disseminates existing knowledge, but new
knowledge is created on the front-end.  The front-end then submits that
new knowledge to the server, which may reject it because the knowledge
does not fit the world model known to the server (in slogan-speak, this
is "DBMS enforced integrity").  Or the server accepts it, and the whole 
organization becomes "smarter."  This is still a relatively new concept
feature of products, an improvement over the case where each application
has to apply the model.

>It seems that Sybase
>feels that database computation should not be done at the client end...
>They believe they can overcome the CPU bottleneck at the server end...

At this point, "database computation" is too vague a term to allow 
a response.

>|>Relational systems lend themselves well to distributed database, where
>|>by definition there will be fewer, if any, centralized [servers]
>
>Huh?  What definition?

If a database is distributed, then the database state is maintained by
more than one server.  The limiting case is where every machine on
the network is of similar size and maintains an equal part of the database.  
A more likely scenario is a server hierarchy, such as in telephone exchanges.

>I think relational systems lend themselves well to distributed databases
>because they are set-oriented, rather than navigational systems like the
>hierarchical and network models.  

That's what I was driving at.  Thanks for the words beyond "so many words".

>|>... the aggregate throughput of the networked database can be prodigious.  
>
>Is this an argument for throughput over response time?  From the user's
>point of view, it is much easier to gauge response time.

Distributed balances both.  It's analogous to caching, or virtual memory
in that you have a small, frequently-used subset of the database that
is local, so average response times are close to what they would be 
if the entire database was local on a behemoth machine.  Yet the whole
database might be so large that it would take a buildingful of 3090's
(or clones :-) to hold it all locally.

I am speaking in generality here, much design and measurement must go
into one's distributed db schema so that average response time is good
and worst case not awful.  Probably beyond the state of the practice
today.  Maybe one reason why distributed is slow catching on.

>|>... Today's technology is proving (already has, actually) that the
>|>assertion that relational is slow is out-of-date.  What's more,
>
>I think that that assertion was proven incorrect about 10-15 years ago.

It was proven that RDBMS COULD be as fast as existing navigational systems,
but there haven't been competitive products till recently.  "Proof"
for many folks requires no less than a released (or announced :-) product.

>|>... A recent Digital
>|>Review survey asked relational users about their throughput
>|>requirements.  They found that about 90% of applications required no
>|>more than about 12TPS.  
>
>The figure 12TPS by itself is meaningless.  How many users, what
>architecture, etc. should accompany any figures.  Sybase claims 34 TPS
>for 30(?) users on a Sun-4.  What do other vendors claim?

Twelve TPS as measured at the server.  So as you pile on users, response
time will tank (ie go up).  I think the survey intended an implicit
clause, "with acceptable user response time."

>.... One of the
>reasons that many corporations have not moved from IMS to relational
>systems is [unacceptable performance].  12TPS may be acceptable to relational
>users, but it surely isn't for IMS users.

'Cuss not.  For large DP hardware, you'd better be talking well into
the 100's.  To handle volume, an RDBMS product has to scale with hardware.

>... 1000TPS is high-performance.  12TPS (or 34) is acceptable.

Show me someone getting 1000TPS on a Sun 3/280.  What must not exist in a
product is a performance ceiling above which throughput stops growing
(linerarly) with increase in platform scale.  That's the essence of 
the perceived "relational bug."

>... [J]oins are a real big performance killer for relational systems.  

Not if they are pre-optimized or pre-computed.

>So there is some substance behind the users
>associating relational "... products with poor system performance, even
>though they may be flexible and easier to implement."[from the original
>posting on the British report]

Sure, the substance is based on historical knowledge.  That knowledge
is being obsoleted by the onset of RDBMSs that scale well.  I believe
users will be able to have both DP-scale performance and ease of use in RDBMS
in the near future.

>But to answer Tom's question on whether "relational" has to mean "overhead":
>Relational does not mean overhead, but since it provides more "features"
>(flexible, easier to implement, easier to use(?)), some overhead *must*
>be incurred.

The question is where to place that overhead.  That's one problem we
(Sybase anyway) are trying to solve.

>I think a good discussion would be over where the overheads are.  For
>starters, relational query compilation has to be smarter.  

Hmm, I've been developing the opinion that query compilation is a largely
solved problem (cost-based optimizers, etc.), but that fundamental things
like I/O management and access methods policies need a lot more work
in RDBMS.  So sounds like we have a good discussion ahead of us :-) .
-TW
---

Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608	  415-596-3500
tim@sybase.com          {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
	One day, when I can afford enough lawyers, I will speak for
	    a whole company.  For now, I speak just for myself.

dafuller@sequent.UUCP (David Fuller) (08/07/90)

In article <10371@sybase.sybase.com> tim@ohday.sybase.com (Tim Wood) writes:
>In article <1073@ashton.UUCP> tomr@ashton.UUCP (Tom Rombouts) writes:
>>Are relational databases an unproven technology regarding
>>performance?  
>>
>>....A key tenet of the report is that RDBMS technology has been
>>available for 20 years but still has not been proved in large,
>>complex applications.  The report notes that users associate
>>these products with poor system performance, even though they may
>>be flexible and easier to implement.  The article then goes on 
>>to cite a firm that is reluctant to replace IMS with DB2, and
>>discusses other sites that use a mixture of relational and
>>possibly non-relational systems.

Some random thoughts from someone who's in the trenches and deals with
less than, uhh, theoretical arguments...

In my experience with Very Large Databases, the DBMS type is less
important than the quality of the individual system's implementation.
The 10% of the time you spend developing is quickly subsumed by the
requirement to plan for and provide a stable applications environment.

To wit: The typical SQL-based RDBMS is abstract enough from what's
going on down deep to permit gross errors in implementation.  I've 
looked at systems which fetched 100,000 records and threw away every
one except the single tuple of interest.  The fact that it was an RDBMS
was irrelevant.  You coulda been doing IMS or FOCUS and made that
mistake.

Axiom 1: There is no substitute for planning.

>
>Relational systems have so far been deployed in smaller-scale
>applications than have hierarchical and network systems.  This is due
>to several factors: relational is "newer" (that is, the technology
>existed long before successful commercial products) and the older
>database architectures were deployed in the days when nearly all
>commercial computing resources were centralized and operating in a
>batch-processing environment.  In that environment, updates and access
>to the database are relatively rigidly controlled.

Sure, relational is "new", but the basic access methods have not 
considerably improved; we still use B-trees and relative files and 
maybe hashed files.  The "relational" aspect is a layer above this.
I can write slow code in any environment; and there is nothing inherent
to the relational model which makes it slower than any other model.

The fact is that noble 3NF implementations almost always get mutated
by harsh reality: that you end up generating "extract" tables and other
de-facto optimizations once you do a simple calculation of how many I/Os
it's gonna take to support your subsecond, online application.

That's reality; you can either spend money for hardware or take a hardnosed
approach to implementation.

Second, the biggest horror to big DBMS DBAs is the unknown called
"ad-hoc queries".  It is easy to hurt a production system on many platforms by 
issuing queries from hell that can't ever complete but require massive
sequential scans.  Big DBMS engines usually have strict controls on
adhocery and either prioritize them low or require they complete in batch.

In fact, lots of big systems do overnite extracts and provide an online
system to promote decision support.  Rarely do these systems permit 
queries to "live" data simply because supporting the surge load caused by
adhoc in current implementations costs too much money.

Axiom #2: Ad-hoc means unpredictable, which represents a basic incongruety
against the goal of production.  No current DBMS or implementation knows
how to balance the two in a truly large implementation automatically.

(I have not seen the DBMS yet that sends me mail and counsels "Dave, I've
been reviewing access patterns and I really think you should consider a
clustered index...")

...

In conclusion:

1) There's no free lunch.  Until we find a more expressive mechanism
   for revealing the intent of the user to the DBMS then we're going to
   live with controls over what a particular user can do.  We need to
   be able to control plowing of new furrows thru a DBMS carefully
   versus handling heads-down data entry with predictable speed.

2) Experience at Tandem shows that a true SQL RDBMS doesn't have to be
   slower, in fact the State of California has committed to NonStop SQL
   for their entire vehicle database based on some strenuous benchmarks.

3) We are a long ways away from creating DBMS systems into which data
   can be poured and the relied on to balance access and update needs.
   No matter what your implementation, it will take intelligence and
   forethought to create a successful implementation.

Speaking for myself, as always...

-- 
Dave Fuller				   
Sequent Computer Systems		  Think of this as the hyper-signature.
(312) 318-0050 (humans)			  It means all things to all people.
{uunet,sun,...}!sequent!dafuller

swfc@ulysses.att.com (Shu-Wie F Chen) (08/07/90)

In article <10419@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes:
|>In article <13532@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F
Chen) writes:
|>>In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood)
writes:
|>>|>
|>>|>Relational systems have so far been deployed in smaller-scale
|>>|>applications than have hierarchical and network systems.  
|>>
|>>I don't see ... why
|>>relational systems have so far only been deployed in smaller-scale
|>>applications.
|>
|>What I'm driving at is that relational has not developed in a DP/MIS
|>context, and DP/MIS is where most of the large-scale business applications
|>have traditionally resided.  Relational is the architecture of choice for
|>the "bottom-up" development of organizational databases, where local DP
|>departments are creating relational databases to manager their local
|>operations, and looking for ways to tie all those local databases
|>together.
|>

Yes, relational databases are easier to implement (from a DBA's point of view).

|>>|>The appeal of relational systems has been the promise of flexible
|>>|>access to the database by users far removed from the DP department.
|>>
|>>RDBMSs have made two contributions:
|>>1. non-procedural access
|>>2. data independence
|>
|>True, but most users running canned applications won't be as aware of 
|>these features as applications programmers, who beenfit most from them.  
|>I was really only discussing the end-users, since they are the largest
|>group of database utilizers in an organization.  Your comment is
|>correct and rounds out my point.

If you are talking about end users and canned applications, the model
used isn't that important.  If you talk about the programmers who
implement the canned application, then it is a different story. 
Frankly, I am now confused.  Your previous arguments made sense for
application programmers, but now you say you were really talking about
end users.

|>>|>Relational systems lend themselves well to distributed database, where
|>>|>by definition there will be fewer, if any, centralized [servers]
|>>
|>>Huh?  What definition?
|>
|>If a database is distributed, then the database state is maintained by
|>more than one server.  The limiting case is where every machine on
|>the network is of similar size and maintains an equal part of the database.  
|>A more likely scenario is a server hierarchy, such as in telephone exchanges.
|>

My question arose because you gave no reason why you thought relational
systems were better for distributed databases.  I then gave my reason below.

|>>I think relational systems lend themselves well to distributed databases
|>>because they are set-oriented, rather than navigational systems like the
|>>hierarchical and network models.  
|>
|>That's what I was driving at.  Thanks for the words beyond "so many words".
|>

|>>|>... Today's technology is proving (already has, actually) that the
|>>|>assertion that relational is slow is out-of-date.  What's more,
|>>
|>>I think that that assertion was proven incorrect about 10-15 years ago.
|>
|>It was proven that RDBMS COULD be as fast as existing navigational systems,
|>but there haven't been competitive products till recently.  "Proof"
|>for many folks requires no less than a released (or announced :-) product.
|>

Well, have there seen any released competitive products (by mean
competitive, I don't mean better than other *relational* DBMSs, but
better than any *other* DBMSs).

|>
|>>... 1000TPS is high-performance.  12TPS (or 34) is acceptable.
|>
|>Show me someone getting 1000TPS on a Sun 3/280.  What must not exist in a
|>product is a performance ceiling above which throughput stops growing
|>(linerarly) with increase in platform scale.  That's the essence of 
|>the perceived "relational bug."

This was really a cheap shot on my part.  I was referring to Kai Li's
main-memory database system at Princeton which I believe achieved 1000
TPS.  No, it was not on a Sun 3/280...

|>
|>>... [J]oins are a real big performance killer for relational systems.  
|>
|>Not if they are pre-optimized or pre-computed.

What do you mean by pre-optimized or pre-computed?  What if I performed
a join that was not pre-optimized or pre-computed?

|>
|>>So there is some substance behind the users
|>>associating relational "... products with poor system performance, even
|>>though they may be flexible and easier to implement."[from the original
|>>posting on the British report]
|>
|>Sure, the substance is based on historical knowledge.  That knowledge
|>is being obsoleted by the onset of RDBMSs that scale well.  I believe
|>users will be able to have both DP-scale performance and ease of use in RDBMS
|>in the near future.
|>

How many RDBMSs scale well (besides Sybase, of course ;-)?  Better yet,
how many RDBMSs scale?

|>>But to answer Tom's question on whether "relational" has to mean "overhead":
|>>Relational does not mean overhead, but since it provides more "features"
|>>(flexible, easier to implement, easier to use(?)), some overhead *must*
|>>be incurred.
|>
|>The question is where to place that overhead.  That's one problem we
|>(Sybase anyway) are trying to solve.
|>
|>>I think a good discussion would be over where the overheads are.  For
|>>starters, relational query compilation has to be smarter.  
|>
|>Hmm, I've been developing the opinion that query compilation is a largely
|>solved problem (cost-based optimizers, etc.), but that fundamental things
|>like I/O management and access methods policies need a lot more work
|>in RDBMS.  So sounds like we have a good discussion ahead of us :-) .

I/O management and access methods policies are orthogonal to the data
model.  These issues are just as important in navigational models.

The reason I suggested query compilation as a point of study is that in
navigational systems, the application programmer has to know the
physical layout of the database files in order to write code that could
navigate.  The programmer has to know about the clustering, the indices,
what pointers to chase, etc.  (Please correct if I am wrong about this. 
I have never had the opportunity to program on a navigational system). 
Therefore, a *good* application programmer would know the best way to
access the database for a given query and could write optimal code.  One
the other hand, in the relational model, application programmers are
encouraged not to know the underlying physical layout of the database. 
They are dependent on the query compiler to map their logical view and
operations to physical operations.  I don't believe current compilers
have reached the expertise of hand-crafted coders in performing this mapping.

It is certainly easier to talk about relational things since a
declarative language is used instead of a procedural one.  But the
penalty of a declarative language is that it must be translated to a
procedural one.  Though RDBMSs (and in particular, Sybase) can use
precompiled queries to improve performance, this does not solve the
problem of ad-hoc queries.

BTW, is there such a thing as an ad-hoc query in navigational systems?

Cheers,
*swfc

normb@sequent.UUCP (Norm Browne) (08/08/90)

In article <13545@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes:
>
>            ...  Though RDBMSs (and in particular, Sybase) can use
>precompiled queries to improve performance, this does not solve the
>problem of ad-hoc queries.

Nothing solves the "problem" of ad-hoc queries (save of course more
horsepower).  I have never seen a single architecture that could
possibly serve two divergent needs (such as transaction processing and
decision support).  The common and IMO appropriate methodology for
handling these is to keep them separate, one system to handle TP and
another (periodically refreshed) to provide DSS.

>BTW, is there such a thing as an ad-hoc query in navigational systems?

Focus (from Information Builders) has provided this capability in the
mainframe world (ugh) for years.  The report-writer/query language is
non-procedural and can access such various data structures as VSAM,
IMS, DB2 and SQL/DS, Adabas, Total, IDMS and just about anything else
that runs on a 370.  The end user is almost completely insulated from
the underlying structure.  There are other products that provide some
of this type of functionality (Mark IV, Dyl280).

..NB

tim@ohday.sybase.com (Tim Wood) (08/11/90)

In article <13545@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F Chen) writes:
>In article <10419@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes:
>|>In article <13532@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F
>Chen) writes:
>|>>In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood)
>
>If you are talking about end users and canned applications, the model
>used isn't that important.  If you talk about the programmers who
>implement the canned application, then it is a different story. 
>Frankly, I am now confused.  Your previous arguments made sense for
>application programmers, but now you say you were really talking about
>end users.

Making it very simple:
The relational model eases application development.  This tends to
encourage application development.  So the app. programmer and
the users benefit: the users get more needs met because
the app. programmer's job is easier because the relational model makes
app. writing easier.

>Well, have there [b]een any released competitive products (by ...
>competitive, I don't mean better than other *relational* DBMSs, but
>better than any *other* DBMSs)?

Sure.  Relational has made the metrics of non-procedural access and
data independence part of the "competitiveness" equation.  RDBMS's
are competitive because people are buying them.  They are now
competing on another crucial metric, performance, so making navigational
systems even less attractive.

>|>>... [J]oins are a real big performance killer for relational systems.  
>|>Not if they are pre-optimized or pre-computed.
>What do you mean by pre-optimized or pre-computed?  What if I performed
>a join that was not pre-optimized or pre-computed?

Pre-optimized: the query processing strategy is determined and that
strategy is saved in the DBMS in a pre-compiled form.  The DBMS needs
merely to execute the strategy to obtain the results.  Pre-computed:
the RESULTS (not just the strategy for obtaining them) are saved
somewhere (in or out of the DBMS) saving the need to recompute them.  A
long ad-hoc join will tend to dent transaction processing performance
(see Marc Zwieger@Sequent's thread on this).  So maybe you run ad-hoc
users at lower priority, or defer the volume updates for overnight (but
allow your DBMS's view of the world to diverge several hours from
reality), etc.

>How many RDBMSs scale well (besides Sybase, of course ;-)?  Better yet,
>how many RDBMSs scale?

I'm not motivated enough to do the research to answer this question.  
If I was a prospect or a consultant (or in Marketing :-), I would be.  
My point is, the important DBMSs will be ones that can operate efficiently 
at various scales, from departmental 80386(tm) to nerve-center mainframe.

>|>Hmm, I've been developing the opinion that query compilation is a largely
>|>solved problem (cost-based optimizers, etc.), but that fundamental things
>|>like I/O management and access methods policies need a lot more work
>|>in RDBMS.  So sounds like we have a good discussion ahead of us :-) .
>I/O management and access methods policies are orthogonal to the data
>model.  These issues are just as important in navigational models.

Of course, but these issues haven't been as well addressed in relational
models, and they are the ones (in most cases) that are hindering 
the ability of relational systems to scale well.  The fact is,
decreasing numbers of people are interested anymore in making navigational
systems faster.  

>The reason I suggested query compilation as a point of study is that in
>navigational systems, [ good summary of the application programming
>issues deleted ].  [But in relational, programmers ]
>are dependent on the query compiler to map their logical view and
>operations to physical operations.  I don't believe current compilers
>have reached the expertise of [human] coders in performing this mapping.

Probably not.  The important ($) question for most users is, do today's
optimizers do a GOOD ENOUGH job on ENOUGH queries, without doing
a HORRIBLE job on nearly ANY query.  The more decision support (ie ad-hoc
queries) an organization does, the more important the optimizer will be.
However, a decent optimizer has become a check-off item for any RDBMS today,
as it should be: that is one of the things that allows them to perform
at all on those easily-written (but sometimes hard to answer) ad-hoc queries.
-Tim
---
Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608	  415-596-3500
tim@sybase.com          {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
	One day, when I can afford enough lawyers, I will speak for
	    a whole company.  For now, I speak just for myself.

ghm@ccadfa.adfa.oz.au (Geoff Miller) (08/13/90)

tim@ohday.sybase.com (Tim Wood) writes:

>Making it very simple:
>The relational model eases application development.  This tends to
>encourage application development.  So the app. programmer and
>the users benefit: the users get more needs met because
>the app. programmer's job is easier because the relational model makes
>app. writing easier.

I would agree with Tim, and particularly with his choice of words  -  
"relational model" rather than "RDBMS".  One can (and we have) successfully
implement databases designed using the relational model without using an
RDBMS, and we still obtain the advantages which Tim points out.  I have 
been concerned for some years now that the marketers of so-called 
"relational" products have pursuaded a gullible user community into thinking
that a relational model can only be implemented using an RDBMS, which 
simply is not so!

Geoff Miller (ghm@cc.adfa.oz.au)
Computer Centre, Australian Defence Force Academy

swfc@ulysses.att.com (Shu-Wie F Chen) (08/13/90)

In article <10494@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood) writes:
|>In article <13545@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F
Chen) writes:
|>>In article <10419@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood)
writes:
|>>|>In article <13532@ulysses.att.com> swfc@ulysses.att.com (Shu-Wie F
|>>Chen) writes:
|>>|>>In article <10371@sybase.sybase.com>, tim@ohday.sybase.com (Tim Wood)
|>>
|>>If you are talking about end users and canned applications, the model
|>>used isn't that important.  If you talk about the programmers who
|>>implement the canned application, then it is a different story. 
|>>Frankly, I am now confused.  Your previous arguments made sense for
|>>application programmers, but now you say you were really talking about
|>>end users.
|>
|>Making it very simple:
|>The relational model eases application development.  This tends to
|>encourage application development.  So the app. programmer and
|>the users benefit: the users get more needs met because
|>the app. programmer's job is easier because the relational model makes
|>app. writing easier.
|>

I agree...

|>>Well, have there [b]een any released competitive products (by ...
|>>competitive, I don't mean better than other *relational* DBMSs, but
|>>better than any *other* DBMSs)?
|>
|>Sure.  Relational has made the metrics of non-procedural access and
|>data independence part of the "competitiveness" equation.  RDBMS's
|>are competitive because people are buying them.  They are now
|>competing on another crucial metric, performance, so making navigational
|>systems even less attractive.
|>

Okay.  Though I was expecting the names of the released products with
high performance, I agree with your statement.

|>>|>>... [J]oins are a real big performance killer for relational systems.  
|>>|>Not if they are pre-optimized or pre-computed.
|>>What do you mean by pre-optimized or pre-computed?  What if I performed
|>>a join that was not pre-optimized or pre-computed?
|>
|>Pre-optimized: the query processing strategy is determined and that
|>strategy is saved in the DBMS in a pre-compiled form.  The DBMS needs
|>merely to execute the strategy to obtain the results.  Pre-computed:
|>the RESULTS (not just the strategy for obtaining them) are saved
|>somewhere (in or out of the DBMS) saving the need to recompute them.  A

Yes, pre-optimized and pre-computed queries will definitely be wins. 
But with respect to pre-optimized queries as a solution to improving
joins performance, I had thought that the overhead came from the
*execution* of the strategy, not the determination of the strategy. 
This really is a minor quibbling point on my part, the argument that a
compiled program runs faster than an interpreted one is true regardless
of whether the program contains joins.

[some deleted stuff about ad-hoc queries, scalability of RDBMSs]

|>
|>>|>Hmm, I've been developing the opinion that query compilation is a largely
|>>|>solved problem (cost-based optimizers, etc.), but that fundamental things
|>>|>like I/O management and access methods policies need a lot more work
|>>|>in RDBMS.  So sounds like we have a good discussion ahead of us :-) .
|>>I/O management and access methods policies are orthogonal to the data
|>>model.  These issues are just as important in navigational models.
|>
|>Of course, but these issues haven't been as well addressed in relational
|>models, and they are the ones (in most cases) that are hindering 
|>the ability of relational systems to scale well.  The fact is,
|>decreasing numbers of people are interested anymore in making navigational
|>systems faster.  
|>

I've been using navigational to mean hierarchical and network DBMSs.  It
is true that there is decreasing interest in making these systems
faster.  However, there is increasing interest in object-oriented
systems which are inherently navigational because of the
class-composition hierarchy. (This does not imply that there can not be
non-navigational components.)  This discussion is leading to my last
paragraph further down...


|>>The reason I suggested query compilation as a point of study is that in
|>>navigational systems, [ good summary of the application programming
|>>issues deleted ].  [But in relational, programmers ]
|>>are dependent on the query compiler to map their logical view and
|>>operations to physical operations.  I don't believe current compilers
|>>have reached the expertise of [human] coders in performing this mapping.
|>
|>Probably not.  The important ($) question for most users is, do today's
|>optimizers do a GOOD ENOUGH job on ENOUGH queries, without doing
|>a HORRIBLE job on nearly ANY query.  The more decision support (ie ad-hoc
|>queries) an organization does, the more important the optimizer will be.
|>However, a decent optimizer has become a check-off item for any RDBMS today,
|>as it should be: that is one of the things that allows them to perform
|>at all on those easily-written (but sometimes hard to answer) ad-hoc queries.
|>-Tim
|>---

As far as I can tell, we are agreeing that RDBMS is a proven technology
that has matured quite nicely into commercial products.  It is not
perfect for all applications (e.g. CAD/CAM, software programming
environments, gigantic IMS databases).  There remains work to be done on
improving performance.

The new question is: what comes next?

For starters, I'll cite two references:

1. Third-Generation Data Base System Manifesto by the Committee for
Advanced DBMS Function (Stonebraker, Rowe, Lindsay, Gray, Carey, Brodie,
Bernstein, Beech).

2. The Object-Oriented Database System Manifesto by Atkinson, Bancilhon,
DeWitt, Dittrich, Maier, and Zdonick.

Any comments?

*swfc

swfc@ulysses.att.com (Shu-Wie F Chen) (08/13/90)

In article <1809@ccadfa.adfa.oz.au>, ghm@ccadfa.adfa.oz.au (Geoff
Miller) writes:
|>tim@ohday.sybase.com (Tim Wood) writes:
|>
|>>Making it very simple:
|>>The relational model eases application development.  This tends to
|>>encourage application development.  So the app. programmer and
|>>the users benefit: the users get more needs met because
|>>the app. programmer's job is easier because the relational model makes
|>>app. writing easier.
|>
|>I would agree with Tim, and particularly with his choice of words  -  
|>"relational model" rather than "RDBMS".  One can (and we have) successfully
|>implement databases designed using the relational model without using an
|>RDBMS, and we still obtain the advantages which Tim points out.  I have 
|>been concerned for some years now that the marketers of so-called 
|>"relational" products have pursuaded a gullible user community into thinking
|>that a relational model can only be implemented using an RDBMS, which 
|>simply is not so!
|>

Won Kim recently defined an object-oriented database to be a database
that implements the object-oriented model (which he kind of defined ;-).
Following this logic, isn't a RDBMS a database that implements the
relational model?

*swfc