[comp.databases] Data vs Application oriented design

chris@AUSTIN.LOCKHEED.COM (Chris Wood) (01/07/88)

In article <12673@pyramid.pyramid.com>, eric@pyrps5 (Eric Bergan) writes:
> 	Have all the applications that you have worked on been static
> entities, that didn't change as you after you had initially designed the
> database?  If a network database-based application suddenly needs to
> support a new query that it was not initially designed for (but has the
> data for), a significant redesign may be necessary. In a properly designed
> relational database, the data independence allows you to avoid this
> redesign.
> 

I submit that the above is true because Relational systems are designed
to fit the data, whereas network systems are/were typically designed to
fit the applications.  This means that network systems were very concerned
with performance issues to meet the response time needs of specific
applications.  It is entirely possible to build a CODASYL database design
that does not use the dreaded "imbedded pointers" and is just a set of 
(dare I say it) relations that are unrelated.  True, now you have lost much
of the power of the DBMS, but look how easy it is to reorganize!

Heretical statement follows:

Thus it follows that the Relational model is a SUBSET of the network model.

OOOOOOOOOOhhhhhhh  boy, that ought to generate some flames!

Chris Wood

By the way, keep in mind that I am playing Devil's advocate here.  I have 
used many DBMS's over the years, but now I am using only Relational ones.
(When was the last time a major vendor announced a NEW DBMS product that
wasn't relational?)

larry@postgres.uucp (Larry Rowe) (01/08/88)

<munch line>
recently there have been several messages about relational -vs- non-relational
systems and the difficulty of developing production applications in relational
systems.  i resisted ``putting my 2 cents worth'' in, but couldn't, so...

1. problem with program iteration in SQL.
  this is a problem with the design of SQL.  Embedded QUEL has a retrieve
construct that looks something  like:

	RETRIEVE (name=emp.name, salary=emp.salary)
	WHERE emp.salary > 40000
	{
		<host language stmts>
	}

that executes the host language stmts once for each record that qualifies
in the emp table.  the program variables name and salary are set to the 
appropriate values in each qualifying record.
  the designers of embedded SQL didn't put this construct into their research
prototype, so it wasn't part of the product.  least i sound too much like 
``quel was wonderful, sql sucks,'' i should mention that Univeristy INGRES
did not allow nested queries (i.e., the host language stmts above could not
include another QUEL stmt) because of the way INGRES was first implemented
on a PDP-11 (remember them, 64K byte processes) in 4 or 5 processes.  it
was just too complicated to do at the same time as writing all the rest of
the system.

2. the iteration issue has been solved by just about every programming language
designed with new query language constructs (i.e., no SQL compatible).  I
developed a language at Berkeley, named Rigel, that ran on top of Unversity
INGRES that allowed the example above to be coded as

	for e in emp where e.salary > 40000 do
		<stmts>
	end

where references to fields in qualifying records could be written ``e.name''
in <stmts>.  not hard to do in a new language.  almost impossible to do when
embedding a query language into an existing programming language (unless you
want to write a complete parser for the programming language).

3. relational -vs- non-relational.
  this issue is a little surprising at this late date.  relational systems
use many of the same implementation strategies that non-relational systems
used.  sometimes they work, sometimes they don't.  for example, someone
mentioned the idea of storing precomputed joins (i.e., storing records of
different types close to each other).  ORACLE implemented a version of this
several years ago and it didn't work very well.  (at first, their 
implementation was bad, but i thought they cleaned up later.)  from what
i could tell, it didn't really improve the performance of their system.  now,
why could this happen?  basically, for one of several reasons:

	1. their implementation wasn't good.
	2. performance was dominated by overhead elsewhere in the system
	   (e.g., parsing, planning, or executing the commands OR getting
	   records from disk page and converting/copying them to the 
	   application program OR application program processing of the return
	   records (e.g., converting to print rep, drawing on screen, etc.)
	   dominated elapsed time).
	3. people didn't run the query that this implementation optimizes very
	   often (more below).

so, why does this strategy not work better.  well, let's look at what queries
it improves.  let's suppose we have two relations DEPT(dname,...) and 
EMP(name,...,dept) and that i precompute 
	join (DEPT,EMP, EMP.dept=DEPT.dname)
what queries can take advantage of this structure?

	1. ret(emp.all) where qual(emp)		NOPE
	2. ret(dept.all) where qual(dept)	NOPE
	3. ret(emp.all,dept.all) 
	   where qual(emp,dept) and emp.dept-dept.dname	MAYBE

whether or not #3 goes faster depends on how many dept objects (i.e., dept
record + all emp records) are fetched.  if it is a complete scan, the query
will do *exactly the same number of page reads* that the partition per table
implementation will perform.  hence, no advantage.  what if the query fetches
just one dept object?  this query should go faster because it will require
fewer page reads (1 instead of 2 -- remember appropriate storage structure
design will cluster emp's on dept and hash/btree dept).  so, how much
different will this make?  consider a 1 VAX780 machine.  typical relational
systems take approx. 70 msec to execute a null query (i.e., get through
communication to the backend and process null query and return).  i/o's
are maybe 25 msec.  and it takes 3 msecs to process each return record.
so time to fetch dept object is:
	70 + 25*p + n*3
suppose n=25.  that means p=1 ==> 170 msec and p=2 ==> 205msec (i.e., 20%
faster).  now, can you detect 20% faster queries when the time is less than
1/4 of a second?  probably not.

now, assuming you buy my arguement above (which you're free to disagree with),
why are relational systems slow when compared to non-relational systems?  
several reasons.  first, the non-relational systems didn't compile and plan
queries at run-time.  second, the dbms code was linked into the user program
so that calls from the application program were subroutine calls not remote
procedure calls (i.e., pass msg and switch processes).  third, some relational
products do not have efficient access methods (e.g., do they support hash,
can data be stored in storage structure or is the storage structure really
just an index, was the program and database optimized for performance, etc.).
fourth, many times a performance problem is a locking problem not a storage
or query processing problem.  maybe there's a critical lock/section of code
that limits the performance of the systems.  etc, etc.  i could go on.

4. relational performance
  but, are relational systems really slower?  my experience with benchmarks
between commercial network systems and relational systems on VAX's is NO.
i frankly was very surprised in the early 1980's RTI-INGRES and ORACLE
did as well as they did against SEED, DBMS-32, and other hybrid systems.

in fact, relational systems can be excellent tools for building production
applications.  (btw, to the fellow who questioned whether real production
systems can be implemented with relational systems, the answer is yes.  many
companies are running their entire business, or key segments of their business
or manufacturing DP using relational systems.  moreover, tandem's DBMS is
essentially a relational system and they've made a very good business out
of on-line xact processing applications -- heard of the new york stock exchange.
wonder why their computer systems didn't crash when the market crashed and
trading volumes > 500M shares traded; 3-5 times average.  reason was it was
a tandem system.)

enough preaching.  what problems do relational systems have that probably
account for the complaints registered?  as with most new technologies, it has
been oversold.  novice users are not going to write production applications
that are bug free and highly efficient.  moreover, experienced programmers
are going to have to learn some new ways of building and optimizing systems.
and finally, relational products are going to have to mature so that they can
satisfy the needs of more users.  i believe the biggest problem that relational
products have today is providing the tools to make it easier to solve the
performance problems that people invariably fall into (e.g., database design
tools - both logical and physical design, on-line performance monitors,
application design tools to simplify the construction of OLTP applications
(see tandem pathway product or ibm CICS product), etc.).
	larry rowe

apeed@ihlpg.ATT.COM (Andrew B. Peed) (01/09/88)

In article <69@coot.AUSTIN.LOCKHEED.COM>, chris@AUSTIN.LOCKHEED.COM (Chris Wood) writes:
> 
> Heretical statement follows:
> 
> Thus it follows that the Relational model is a SUBSET of the network model.
> 
> OOOOOOOOOOhhhhhhh  boy, that ought to generate some flames!

No flames here, Chris. I quite vividly remember hearing this theory in a
lecture during my Intro to Database Theory class at Michigan Tech. Made sense
to me at the time, and it still does now.

Dr. John strikes again! (If you're still getting news feed at MTU, Hi Folks!)


=============================================================================
Andrew B. Peed                                        UUCP: ihnp4!ihlpg!apeed
AT&T Bell Laboratories                BITNET: moss!ihlpg!apeed at rutgers.edu
=============================================================================

gupta@cullsj.UUCP (Yogesh Gupta) (01/09/88)

In article <89@pasteur.Berkeley.Edu>, larry@postgres.uucp (Larry Rowe) writes:
> 
> 3. relational -vs- non-relational.
>    ...
> fourth, many times a performance problem is a locking problem not a storage
> or query processing problem. 

Yes!  As most pre-relational systems do NOT support level 3 consistency,
the lock management overhead within a transaction is less than that in a
relational system which supports level 3 consistency.  Also, the amount
of concurrency is higher which shows up in multi-user performance.  I think
that there is NO reason why a relational database system should perform
slower than a pre-relational system for the SAME functionality.
-- 
Yogesh Gupta			| If you think my company will let me
Cullinet Software, Inc.		| speak for them, you must be joking.

UH2@PSUVM.BITNET (Lee Sailer) (01/12/88)

In article <69@coot.AUSTIN.LOCKHEED.COM>, chris@AUSTIN.LOCKHEED.COM (Chris Wood) says:
>
>Thus it follows that the Relational model is a SUBSET of the network model.
     
I can sort of agree with that.  It is similar to saying that Modula 2 is
a subset of Ada.  One language philosophy is to build in EVERY feature
that might ever be useful.  A different philosophy is to build in
JUST ENOUGH features to make everything possible.
     
The Relational model provides a subset of the features, but still allows
one to build whatever you need to build.
     
Why is this minimalist approach popular?  Well, it is a lot easier to
implement a minimal system and be reasonably sure that it is correct.
Look how much easier it is to create a M2 compiler than an Ada compiler,
and forget about COBOL and FORTRAN.  Likewise, a new computer system will
see its first Relational DBMS long before its first Network system.
     
     
BUT!
     
The *ideal* relational system also supports some capabilties that the Network
model does not, such as seamless integration of database and data dictionary.
Also, much of the modern query optimization depends on the use of a
minimal, mathematically coherent set of operators.  As soon as
ad hoc (though "useful") features are added to the language, optimaization
techniques begin to fail.
     
Languages are designed to be easy to use.  They are also designed to be
easy to port, and to produce efficient code.  Some ease of use is worth
sacrificing if it increases portability and efficiency enough.
     
Remember, too.  Anything currently available is already obsolete.
     

pavlov@hscfvax.harvard.edu (G.Pavlov) (01/15/88)

> Heretical statement follows:
> 
> Thus it follows that the Relational model is a SUBSET of the network model.
> 
> OOOOOOOOOOhhhhhhh  boy, that ought to generate some flames!
> 
> Chris Wood
> 
  I would say that the case is more that the principles embodied in the rela-
  tional model are "universal" data organization principles.  E.g., we tried,
  to whatever extent possible, to apply normalization, reduce redundancy, etc,
  while we were still using Codasyl-based systems.

  Packages which present a relational logical view to the programmer have the
  attribute of making it easier to adhere to these principles.  I say "logi-
  cal view" because in reality, are there any systems out there which are 
  truly relational internally ?

  greg pavlov, fstrf, amherst, ny 

allbery@ncoast.UUCP (Brandon S. Allbery) (01/16/88)

[We haven't seen the line-eater since we sent Minnifield and Dixon after it!]

As quoted from <69@coot.AUSTIN.LOCKHEED.COM> by chris@AUSTIN.LOCKHEED.COM (Chris Wood):
+---------------
| applications.  It is entirely possible to build a CODASYL database design
| that does not use the dreaded "imbedded pointers" and is just a set of 
| (dare I say it) relations that are unrelated.  True, now you have lost much
| of the power of the DBMS, but look how easy it is to reorganize!
| 
| Thus it follows that the Relational model is a SUBSET of the network model.
+---------------

No argument.  Isn't Unify a direct example of this?
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
       {well!hoptoad,uunet!hnsurg3,cbosgd,sun!mandrill}!ncoast!allbery
 PS/2:  Half a computer.  OS/2:  Half an operating system for half a computer.
-------------------------------------------------------------------------------
ATTENTION!!  Please do not send mail to ncoast through necntc.  This applies to
all mailing lists and all mail not directed to comp.sources.misc.  Thank you.
        (This message provided by myself and Jeff Janock <necntc!jj>.)