[comp.databases] Recent discussions...

larry@postgres.uucp (Larry Rowe) (09/01/89)

as usual, i haven't been reading this list for a while.  so, after plowing
through 200+ messages i have a few comments.

1. bundled rdbms's
someone made the comment that the issue of embedded dbms was really a
bundling issue.  i think that comment hits the mark.  ansi compatible sql
dbms engines will become a commodity.  commodity products are sold by
vendors that are the low cost producers.  3rd party companies cannot compete
with the hardware vendors for low cost to produce and 
sell because the hardware vendors
have high margins built into the product they are selling -- a system
including hardware and software. ibm has bundled an rdbms with OS/2 and dec
is bundling rdb with vms and ultrix.  (rdb/ultrix is actually rtingres
that was licensed by dec.  rti isn't naive.  dec will replace the rti
product with an internally generated product asap.  they are rumored to
be working on one now.  but, they had time to market problem.  exact
same story holds for risc processors.)  they are doing it for two reasons.
first, they want to use the rdbms in their other products (e.g., to store
compiler symbol tables for object modules that can be accessed at run-time
by debuggers and to store system management information such as network 
configurations, user authorizations, etc.).

the rdbms vendors have slowly converged to products with roughly the same
functionality and performance.  another couple of releases from all the
vendors and the ansi sql engines will be indistinguishable.

someone else jokingly commented that the hardware vendors are bundling
to control customer accounts.  that too hit the mark.  ibm made 
many, many dollars selling ims and hardware to run it to companies doing
large *production* applications (think banks, railroads, 
large manufacturers, etc.) that spend $10M - >$100M on software and 
hardware per year.  ibm controlled the accounts because they sold the
hardware and they sold the software that ran *the very most important
applications*.  ibm and other large computer vendors will do the same in
the future to try to stave off the threat of hardware/os independence.

bottom line: 3rd party dbms companies had better have a strategy to produce
products that will replace the sales dollars they make on their sql
engine today.  my guess is that they have something less than 5 years to
do it.

here's what some companies are doing:

ORACLE: run everywhere and sell applications.  by running everywhere they'll
find some markets not dominated by the hardware vendors.  users want 
applications not dbms, so sell it to them.  of course, you bundle the dbms
in your application and move your profit from the dbms sale to the
application sale.

RTI: distributed dbms, integrate heterogeneous dbms's, and sell 1st rate
development tools.  distributed dbms is new technology.  3rd parties will
lead the hardware vendors.  hetergeneous dbms's cross vendor lines since most 
companies have products from many different hardware vendors.  3rd parties
can win this business for a while too. 1st rate development tools solve 
problems of building applications.  if you can't buy the application you 
want, get the best tools to build it.  for some reason i can't determine,
neither dec nor ibm has produced competitive 4gl's.  it's probably because
the programming tools groups in the companies are database naive.  they 
just don't understand what customers are doing with the dbms.  (note: there are
*many*, *many* tools vendors.  there's probably a shakeout coming in this
market in the next 5-10 years.)

INFORMIX: unix application business.  wingz looks like a winner in the mac
marketplace.  watch for it to appear on other platforms (e.g., unix(!)
and pc's).  then, scrounge for info as to when revenue from wingz is greater
than for rdbms products.  i wouldn't be surprised if it happened within
2-3 years.

SYBASE:  unkown.  but, first they have to catch up with oracle and rti.
i think sybase did $25M this year.  oracle did approximately $600M.  rti
did $130M.  the rdbms market seems to be growing at somewhere around
40% per year.  i don't see a breakout strategy for sybase that will cause
them to win a significant percentage of the rdbms business.  they'll
obviously survive, but are they destined to always be in 3rd place.  they
had a chance with SQL server, but that seems to be a receding opportunity.

UNIFY: application development tools. they've already abandoned the dbms
business.

ASHTON-TATE:  unkown.  recent decomit to sql and network products on dos
and their weak financial results suggests they are in for a tough time.
640K pc/dos machines are not a growth market.
AT hasn't shown any ability to produce a competitive product in
another market (remember "dbase for the mac"?).  note: i think AT did 
roughly $250M this year.  they might be a takeover candidate for a 
company that wanted to fire 75% of the people and sell dbase as a 
cash-cow product.  (do i hear computer associates knocking on the door?)

SHAREBASE: unkown.  revenues this year were approximately $40M.  sybase
will probably pass them this year.  i haven't followed their financial
results closely for the past couple of years, but i don't think they've
done very well.  i think they hit higher revenues in previous years.
rumors are that technical people are abandoning ship.  they may not last 
much longer.

2. OO -vs- R dbms wars
i found the discussion interesting.  however, it seems to me that the real
issue is "ease of expression" -vs- "performance".  it is *absolutely* clear
that oodbms's solve a problem for people building CAx products today.  most
CAx companies build their own dbms's (read 10-20 developers).  if they
could buy a product rather than build their own they would be *much* better
off.  all CAx company managers know this and they're looking for a product.
i'm sure one or two of the "O" companies (ontologic, object design,
objectivity, and object sciences) will succeed at selling into this market.
the OO-C++'s that they are producing will provide good performance and
ease of expression for those applications.

at the same time, i'm sure that all 3rd party rdbms vendors will put OO
constructs into their data models (inheritance, path expressions like
"dept.mgr.salary" that eliminate explicit joins, methods into the query
language, object identity, etc.).  the advantages for data modelling are too
substantial to ignore.  query language you ask?  sql of course.  the market
announced load and clear that's what they want.  the "O" companies will hear
the same market.  they'll have to provide an sql interface (both embedded
and ad hoc query) or they'll be confined to a nitch market.  too many people
are building too many applications and investing in too much training based on 
sql to walk away from it.  unless the "O" companies can show a >10X improvement
in productivity, i doubt that people will be willing to switch.
personally, i doubt that you can do it with just an OO programming language.

of course, the "R" companies (rti, oracle (formerly relational software),
informix (formerly relational database, inc?), etc.) will create OO versions
of their programming languages too.  OO-3GL's and OO-4GL's.  and, i'm sure they
will get forced to do C++ embeddings sometime too.  now, can they create
seemless integrations.  depends on how hard they want to work at it.  
personally, i think they can do it.  simple thought.  every implementation
tactic used by "O" companies can be put into "R" company product.  the same
primitives called by the language run-time system to the data storage system
of the OO-C++ products can be implemented within the relational dbms.  not on
top of, within.  several people mentioned that "R" products are putting in
db procedures to win tp1 benchmarks.  currently those procedures are coded
in a 3GL or 4GL.  but, they could just as easily call routines coded in
the dbms implementation language (C) and they could access *any* internal 
interfaces in the backend.  

the difference the OO-C++ products will have is the effort they put into the
programming language runtime environment (object caching, smart pointer
swizzling, locking, etc.).  (btw, dan and jack.  are you planning on
supporting ad hoc queries in C++ that touch objects in the object cache and
in the backend?  how many lines of code are you going to write in your 
distributed query optimizer and executer? i'll bet you'll finesse this for
now because it isn't crucial to your target market.  it will be if you want
to get into the bigger MIS and end-user market.)  the "R" companies could write
this same code but they probably won't unless they believe it is a big
enough market to go after.  and that's where the uncertainty is.  the "O"
companies think their market is potentially huge.  the "R" companies aren't
convinced that it is as large as they think.  this is a standard dilemma
when technology changes.  sometimes big companies win (ibm waited until
apple proved there was a market for pc's, then they waded in and took over
the market) and sometimes they lose (the network dbms vendors didn't believe
that rdbms's mattered and they were destroyed by the "R" companies).
what will happen in this case?  only time will tell.  

i wish the "O" companies well.  it's great fun to be in a start-up 
during product development and early missionary work.  
one thing though.  relational products didn't really take off until ibm 
blessed the technology by announcing db2.  i worry about the oodbms 
technology because ibm doesn't have any internal project building prototype 
products that they can pick up and productize. they have starburst, but it
is an extended relational product along the lines of postgres rather than
a true OODBMS.  that suggests the adoption 
cycle could be long.  the investment community only gives companies about 5 
years to make it.  after that they have a *very* hard time raising money.  
the AI companies ran into this problem.
i'm frankly surprised that ontologic is still able to raise investment 
money given that they've built two products that failed.  i suppose  it is
because the new "O" companies have shown that there's money to be made and
ontologic has more experience than the new companies.  

3. Using relational dbms's
someone from servio-logic commented that relational db designs normalized
their data and that forced applications to do joins which were too slow.
in fact, that is the theory, but not the practice.  ted codd wrote a paper
in the early 70's that said you must denormalize your design to meet the
performance requirements of your application.  every smart relational
application builder does this.  unfortunately, there are a lot of
"doorknobs" writing applications that won't violate a theoretical rule.
they fail at writing applications which leads to two comments:
	1) people claim that rdbms's don't perform as well as product X and 
	2) there's money to be made as an application consultant.

in fact, rti stores almost all application descriptions in the rdbms (e.g.,
form definitions, report definitions, and most of the definition of an abf
application -- in fact the only thing not stored in the dbms is the abf
operation code which was my decision and one of the biggest mistakes we 
made because it complicated all the dbms utilities -- copy-database,
unload-database, etc. had to do something special for abf code).  these
are complex objects with shared subobjects.  the performance of the
interface tools operating on the descriptions in the dbms is evidently 
acceptable since users haven't abandoned the tools.  the trick is to 
precompute a main memory representation of the complex object and store
that in the dbms along with the normalized version.  i developed this idea
about 7 years ago at Berkeley.  we expanded on it in postgres and the
picasso interface toolkit that i've been working on for the past 3 years.

------
so enough for now...let the responses begin!
	larry

dlw@odi.com (Dan Weinreb) (09/06/89)

In article <16753@pasteur.Berkeley.EDU> larry@postgres.uucp (Larry Rowe) writes:

Thanks for your comments; I agree with most of what you say.

   2. OO -vs- R dbms wars

   of course, the "R" companies (rti, oracle (formerly relational software),
   informix (formerly relational database, inc?), etc.) will create OO versions
   of their programming languages too.  OO-3GL's and OO-4GL's.  and, i'm sure they
   will get forced to do C++ embeddings sometime too.  now, can they create
   seemless integrations.  depends on how hard they want to work at it.  
   personally, i think they can do it.  simple thought.  every implementation
   tactic used by "O" companies can be put into "R" company product.  the same
   primitives called by the language run-time system to the data storage system
   of the OO-C++ products can be implemented within the relational dbms.  not on
   top of, within.  several people mentioned that "R" products are putting in
   db procedures to win tp1 benchmarks.  currently those procedures are coded
   in a 3GL or 4GL.  but, they could just as easily call routines coded in
   the dbms implementation language (C) and they could access *any* internal 
   interfaces in the backend.  

Perhaps.  It sounds to me as if you're saying that the "R" companies
can build an OODBMS by essentially making a new product, except
sharing some internal pieces of the existing RDBMS product where
appropriate.  It's possible that this can be done.  I don't know much
about the internals of the ORACLE product or the RTI product.
However, I do understand the basic concepts of System R, and therefore
the basic concepts of DB/2, moderately well, and it seems to me that
any system that used significant amounts of that machinery would have
some significant technical disadvantages compared to our approach.  Of
course, maybe I'm just not being clever enough.  We'll see.

   the difference the OO-C++ products will have is the effort they put into the
   programming language runtime environment (object caching, smart pointer
   swizzling, locking, etc.).  

Yes, I think that's a big part of it.

			       (btw, dan and jack.  are you planning on
   supporting ad hoc queries in C++ that touch objects in the object cache and
   in the backend?  how many lines of code are you going to write in your 
   distributed query optimizer and executer? i'll bet you'll finesse this for
   now because it isn't crucial to your target market.  it will be if you want
   to get into the bigger MIS and end-user market.)  

I'll leave it to Jack to talk about queries; he'll be back next week.
But I strongly agree that the important issue is the distinction
between our initial target market versus the MIS, end-user market.
You're quite right that we are concentrating now on the needs of our
initial target market, and deferring issues that we think are mainly
of interest to the MIS and end-user market.

Earlier you commented on the difficulty of getting people to "switch",
by which I presume you meant "switch from relational to
object-oriented".  As we see it, our initial target market is
primarily not using DBMS's at all in the areas that we are hoping to
provide OODBMS's.  That is, they might be using relational DBMS's in
some areas, but there are other areas in which they are not using a
DBMS, and it is primarily those latter areas in which we feel our
product can be of the most value.  So, for the most part, we're not
trying to get people to switch, and we're not trying to compete with
relational databases head-on.  Object-oriented databases, at least for
the forseeable future, are mainly to fill new needs.

   one thing though.  relational products didn't really take off until ibm 
   blessed the technology by announcing db2.  i worry about the oodbms 
   technology because ibm doesn't have any internal project building prototype 
   products that they can pick up and productize. 

I feel that this would be an important issue for the MIS and end-user
market, but really doesn't matter in our initial target market areas.
Many of the major CAD companies used workstations, Unix, and C long
before IBM endorsed any of those things.  CAx people are not
influenced by IBM nearly as heavily as MIS people.

Regarding normalization in relational systems:

							the trick is to 
   precompute a main memory representation of the complex object and store
   that in the dbms along with the normalized version.

To be fair, though, there are some extra costs incurred by this trick.
You have to make sure that this precomputed representation is
recomputed (or cache-invalidated) whenever there's a change in any
value it depends on.  So someone has to check when those values are
changed; ideally there should be a trigger/integrity-like check, to
prevent slip-ups due to manual error, but even then the checks must
have some runtime cost.  There also must be some storage overhead cost
for storing two different representations of the same data.  It's
certainly a good trick and I'm sure it provides fine performance for
some applications, but in a speed-critical area with many updates, or
when the number of instances is very large, these costs would have to
be considered as part of a tradeoff.

Dan Weinreb		Object Design, Inc.		dlw@odi.com

larry@postgres.uucp (Larry Rowe) (09/07/89)

here are some comments and further questions...

In article <1989Sep5.214702.1377@odi.com> dlw@odi.com writes:

>  Object-oriented databases, at least for
>the forseeable future, are mainly to fill new needs.

how big a market can this be?  if you are selling a development tool
to the CAx companies, they will have to pay you a runtime license
fee.  since the packages that use your runtime system will run on
workstations and be priced under $2.5K per system, your OEM revenue
will probably be under $500 per copy (more likely in the $100-$200 range).
at that price point, you'll have to sell 100K to make $50M.  now,
how many total machines have apollo and sun sold in their liftime?
maybe 500K?  assuming that 50% are running a product with an OODBMS
embedded in it, we're talking about a $125M market.  that's a good
place to start.  but, i sure would want to be confident that i was
going to dominate that market (i.e., own 30-60% of the market) or
i might not survive.  the problem that the "O" companies have is that
there are too many of them and probably only 1 that will survive.

so, the "O" companies are either going to have to get into markets with
2-5M machines (e.g., MAC's and PC's) or they are going to have to broaden
their products.  you may not want to go into the MIS/end-user market, but
i'm not sure how you can grow to be a $100M/year company if you don't.

>Regarding normalization in relational systems:
>
>							the trick is to 
>   precompute a main memory representation of the complex object and store
>   that in the dbms along with the normalized version.
>
>To be fair, though, there are some extra costs incurred by this trick.
>You have to make sure that this precomputed representation is
>recomputed (or cache-invalidated) whenever there's a change in any
>value it depends on.  So someone has to check when those values are
>changed; ideally there should be a trigger/integrity-like check, to
>prevent slip-ups due to manual error, but even then the checks must
>have some runtime cost.  There also must be some storage overhead cost
>for storing two different representations of the same data.  It's
>certainly a good trick and I'm sure it provides fine performance for
>some applications, but in a speed-critical area with many updates, or
>when the number of instances is very large, these costs would have to
>be considered as part of a tradeoff.

all valid points.  look at postgres to see how mike stonebraker and i
designed a database system to handle this problem.  yes, the trick only
works for things that can be replicated.  people who want to store 747
and submarine designs can't replicate their databases.

so, let's discuss storing precomputed joins which is essentially what the
OODBMS's propose to do when storing complex objects.  surprisingly enough,
that idea has been around in the relational world for a while and it was
implemented by oracle 6-8 years ago.  it doesn't seem to make that much
difference in performance because rtingres hasn't been blown out of the
water by oracle in benchmarks.  in spite of the oracle chest pounding about
their TP1 numbers, the information i've seen suggests that rtingres has been
faster than them for most of the past decade.  of course, dbms performance
doesn't always lead to a sale.

the OODBMS proponents response will be that TP1 and other MIS applications
don't have the kind of complex objects found in engineering applications
so the precomputed join strategy might not be cost effective.  my experience
is quite different.  MIS applications often have complex objects (e.g.,
application objects composed of 5-20 different object types with numerous
instances in on complex object) with shared subobjects.  these applications
would definitely take advantage of this mechanism.
don't misunderstand my point.  i am not saying that precomputed joins isn't
a viable strategy.  rather i'm saying it may not be as important as the 
OODBMS folks believe it is.  

but, my bigger point is that if this implementation strategy does become
a significant performance issue, the RDBMS folks will just implement it.

for my money, the major difference between the OODBMS's being developed by
the "O" companies and the RDBMS's currently being marketed is the
application program caching they are implementing.  most benchmarks that 
i've seen that supposedly show why OODBMS's win over RDBMS's (e.g., the
sun, tektronix, and maier benchmarks) seem to be totally dominated by
queries that must be implemented on a main memory database in the
application program address space.  current RDBMS application development
tools don't do this.  but, they probably will.  see my research at berkeley
over the past couple of years on the picasso shared object hierarchy which
is persistent CLOS with objects stored in postgres.
	larry

nico@unify.UUCP (Nico Nierenberg) (09/07/89)

In article <16753@pasteur.Berkeley.EDU>, larry@postgres.uucp (Larry Rowe) writes:
.
.
.
.
> UNIFY: application development tools. they've already abandoned the dbms
> business.
> 

While it is true that we have a strategy of providing our front end
tools across all the major DBMS products (Oracle, Sybase, etc.), we
still are a major playor in the Unix RDBMS market.  Our new ANSI SQL product
Unify 2000, which was introduced this year, is being well received and
has won several benchmarks.

Forgive the commercial, but I needed to respond to this.  By the way
I am the V.P. of engineering at Unify.