[comp.databases] Database implementation/theory issues?

gupta@cullsj.UUCP (Yogesh Gupta) (02/17/88)

I find that this group does not have much discussion about
either the theory or the implementation of DBMS issues.
Why is that?  I know that quite a few people that are involved
in DBMS research as well as product development read this group.
So lack of people can not be the reason.  Any comments?
-- 
Yogesh Gupta			| If you think my company will let me
Cullinet Software, Inc.		| speak for them, you must be joking.

UH2@PSUVM.BITNET (Lee Sailer) (02/18/88)

In article <232@cullsj.UUCP>, gupta@cullsj.UUCP (Yogesh Gupta) says:
>
>I find that this group does not have much discussion about
>either the theory or the implementation of DBMS issues.
>Why is that?

Because it is hard to draw diagrams of entities with relationships
between them.  (semi 8-)

gorry@smidefix.liu.se (Goran Rydquist) (02/23/88)

In article <232@cullsj.UUCP> gupta@cullsj.UUCP (Yogesh Gupta) writes:
>I find that this group does not have much discussion about
>either the theory or the implementation of DBMS issues.

I'd like to start! 

The greatest evil in today database theory is the unquestioned (?) assumption
that the data model must be based on records. This is true for all three of
the accepted and currently used data models, namely the hierarchical, the
network and the relational models.

I quote:
	Models which provide additional file structure around the records 
	(eg sequencing, hierarchies, CODASYL networks) overcome some of the
	functional limitations of records. None of the overcome all the
	limitations. Furthermore, by building on top of record structures,
	they retain all the underlying ambiguities. In some cases, they simply
	add more options for representing something which could already be
	represented in several ways in record structure. 
[W. Kent, Limitations of Record-Based Information Models", ACM Transactions on
Database Systems, vol 4, no 1, pp 107-131, March 1979]

I would like to hear comments, historical motivations, views etc.

---
Goran Rydqvist			gorry@majestix.liu.se
---------

dc@gcm (Dave Caswell) (02/28/88)

In article <725@smidefix.liu.se> gorry@smidefix.liu.se (Goran Rydquist) writes:
)
)The greatest evil in today database theory is the unquestioned (?) assumption
)that the data model must be based on records. This is true for all three of
)the accepted and currently used data models, namely the hierarchical, the
)network and the relational models.
)
)I quote:
)	Models which provide additional file structure around the records 
)	(eg sequencing, hierarchies, CODASYL networks) overcome some of the
)	functional limitations of records. None of the overcome all the
)	limitations. Furthermore, by building on top of record structures,
)	they retain all the underlying ambiguities. In some cases, they simply
)	add more options for representing something which could already be
)	represented in several ways in record structure. 
)[W. Kent, Limitations of Record-Based Information Models", ACM Transactions on
)Database Systems, vol 4, no 1, pp 107-131, March 1979]
)
)I would like to hear comments, historical motivations, views etc.


I'm not sure how to respond to this.  You say that something is the greatest 
evil without giving any reasons why.  What are the functional limitations of 
records?  What would like to order by (use for sequencing) if not a field in
a record?  Do records have a natural sequencing apart from their contents?

You say "In some cases, they simply add more options ..".   Again I ask what
do you want to represent that can not be represented in records?   What does 
"adding file structure" or files in general have to do with data models?
What are all the underlying ambiguities?

In twenty-five pages the author must have had some arguments, why don't you 
summarize them?

cy@ashtate (Cy Shuster) (03/02/88)

What is the specific meaning of "records" that causes problems here? Is it
the grouping together of third-normal data (physically and logically), or
is it the problem of "record definitions" with hierarchies, such as a date
composed of month, day, year subfields? (i.e., composite domains).

To my mind, there is an undeniable benefit of data independence in the rela-
tional model (allowing the application and the database to change indepen-
dent of each other), at the cost in many cases of lessened performance, from 
the underlying engine having to support arbitrary joins. It's perhaps akin
to assembler language vs. a higher-level one: a good programmer can make
impressive performance gains in assembler, at an increased maintenance cost.
By the same token, a good programmer can take advantage of knowledge of the
physical database layout, especially in a network system, for very good
performance. But changing the network means changing all the code. You pays
your money, and you takes yer choice.

--Cy--     dBASE Mac Development    UUCP:...seismo!scgvaxd!ashtate!cy

marti@ethz.UUCP (Robert Marti) (03/02/88)

In article <725@smidefix.liu.se>, gorry@smidefix.liu.se (Goran Rydquist) writes:
> The greatest evil in today database theory is the unquestioned (?) assumption
> that the data model must be based on records.  [ ... ]
> 
> I quote:
> 	[ ... Quote ... ]
> [W. Kent, Limitations of Record-Based Information Models", ACM Transactions on
> Database Systems, vol 4, no 1, pp 107-131, March 1979]
> 
> I would like to hear comments, historical motivations, views etc.
> 
> Goran Rydqvist			gorry@majestix.liu.se
> ---------

The ideas contained in Bill Kent's paper are now more than 10 years
old.  Even today, however, there are not many systems which implement
them.  This is partly due to the fact that their implementation has
turned out to be decidedly non-trivial:  Apart from deciding on how
to map the data model onto secondary storage you also have to consider
access path, recovery, and concurrency control issues.

A few systems have been built as research prototypes -- CCA's work on
DAPLEX (LDM, DDM), Xerox PARC's Cypress, and the TAXIS project at U
Toronto come to mind.  But there are certainly no widely used products
which support semantic data models on the market today, although the
situation seems to be changing with all the interest in applying database
techniques to engineering and AI applications:  A new generation of
object-oriented database systems is emerging, e.g., Servio Logic's
GemStone and Ontologic's VBase products.

Still, users of record-oriented database systems apparently have been
able to build a lot of very useful applications with this technology.

-- 
Robert Marti                    Phone:       +41 1 256 52 36
Institut fur Informatik
ETH Zentrum/SOT                 CSNET/ARPA:  marti%ifi.ethz.ch@relay.cs.net
CH-8092 Zurich, Switzerland     UUCP:        ...uunet!mcvax!ethz!marti

gorry@senilix.liu.se (Goran Rydquist) (03/04/88)

I wrote this some time ago

>)The greatest evil in today database theory is the unquestioned (?) assumption
>)that the data model must be based on records.

and got answers like

>In twenty-five pages the author must have had some arguments

so ... I'll give you some of my own arguments, much inspired by the original
article of course.

Let's start with a definition [also by W. Kent].

"By record we mean here a fixed sequence of field values, conforming to a
static description usually contained in catalogs and/or in programs. The
description consists mainly of name, length and data type for each field."

The static description in the definition is usually referred to as the schema.
The phrase "conforming to", implies that the schema is *extracted* - that is
the information needed to interpret (or at least process) the data is stored
separately from the record itself.

The major idea of the record is that the schema is extracted. The motivation
is that we save space by avoiding the repetition of the same information. The
reversed view is the distributed schema. By a distributed schema I mean that
the information to interpret a data instance is stored explicitly together
with that instance. Casually glancing at a record data model, the schema
appears to be distributed. The extraction is a computational, machine-oriented
way of handling large amounts of data.

The space saved by extracting the schema easily becomes illusory. The
resulting rigid system does not handle variation well and a user is confronted
with the unnatural requirement of predicting the worst case. This estimate is
then allocated in every instance, resulting in much waste.

A person is a good example of an entity in the real world. What attributes
would be needed to model a person. Consider name, address, social security
number, length, age, sex, maiden name etc. All of these attributes are not be
needed for every person instance - some people haven't got a social security
number, only girls have maiden names etc.

Person 1                         Person 2                   
----------------------	         ----------------------     
name        "Stan Smith"         name        "Ann Smith"    
address     "Park Avenue 32"     address     "Main street 1"
length      6'		         length      5'             
age         24		         age         22             
			         maiden-name "Jones"        

To accommodate the variations, we could:

	- Define the record format to include the union of all relevant
	fields, where not all the fields are expected to have values in
	every record. These null values naturally leeds to storage overhead,
	a user or application programmer is forced to predict every possible
	field that may appear in a person record, and there is no
	restriction on what fields should have values when.

	- Allow the same field to have different meanings in different
	records.  The meaning of the field would then be interpreted by
	adding an extra type field to the record. Unfortunately the
	interpretation of this record will only be known by the application
	that conceived it. The database and independent applications treats
	the two conceptually associated fields as separate chunks of data,
	with no known restrictions. Further, space will be wasted if not all
	the data in the union happens to be of equal size.

	- Define a new record type for every combination of fields. This
	approach eliminate the storage space overhead, but if the data
	varies too much, the system will be littered with record types. The
	desired correspondence between entity and record disappears
	completely, and no restrictions exist that prevent two records to
	model the same entity at the same time.

Suppose we have a bank account record type. An account can be allocated to
either a corporation, or to a person. This relationship is naturally modeled
by having an owned-by field in the account record.

The problem arises because persons are identified by social security number,
while corporations are identified by name (string). The record modeling
problems and the possible solutions are similar to the ones that were
described in the previous example. We could possibly use a generic pointer
type or something like that, but what we really want is that the value of the
owned-by field should be able to assume more than one type.

A record is far from self-describing. Consider the problem of coding a generic
procedure that prints records in a common format. Such a procedure must
minimally know what data it is going to print, and the format of this data.
Programming languages typically use compilation to hard-wire the schema into
the code, which leaves no possibilities of querying the record instance of its
composition.

Yeah man!
		- gry
---
Goran Rydqvist			gorry@majestix.liu.se
---------

allbery@ncoast.UUCP (Brandon Allbery) (03/14/88)

As quoted from <733@senilix.liu.se> by gorry@senilix.liu.se (Goran Rydquist):
+---------------
| I wrote this some time ago
| 
| >)The greatest evil in today database theory is the unquestioned (?) assumption
| >)that the data model must be based on records.
| 
| and got answers like
| 
| >In twenty-five pages the author must have had some arguments
| 
| so ... I'll give you some of my own arguments, much inspired by the original
| article of course.
+---------------

Which he proceeds to do.

One problem.  I don't see any incriminating evidence against *records*; I
see incriminating evidence against *static schemas*, a different kettle of
fish entirely.  The concept of a *record* (i.e. an object composed of fields)
still remains in the new system.

The proposed system sounds to me like an obvious extension to relational
databases.  (Relational purists will probably flame me to death for that!)
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
       {well!hoptoad,uunet!hnsurg3,cbosgd,sun!mandrill}!ncoast!allbery