[comp.databases] "Relational" databases

stone@cunixb.cc.columbia.edu (Glenn Stone) (04/09/91)

The idea of "relationality" is used in very different ways.  I'm trying 
to find a good general discussion of the topic, either in print or
electronic form.  Any suggestions would be appreciated.

Glenn Stone

bodhi@isc-br.ISC-BR.COM (Satyabodhi) (04/15/91)

Everyone sincerely interested in understanding, using or developing
relational database managers should read E.F. Codd's new book,
"The RELATIONAL MODEL for DATABASE MANAGEMENT: VERSION 2", 1990,
from Addison-Wesly Publishing Company, Inc.

Codd states his reasons for writing this book in the preface including:
" the need to assemble all of the relational model in one document for 
DBMS vendors, users, and inventors of new data models who seem to be 
unaware of the scope of the relational model and the scope of database 
management".

I found this book much easier reading than Codd's earlier papers.  He
now usually speaks of rows and columns instead of tuples and
attributes.  He doesn't insist on a new private vocabulary.  And, new
vocabulary is not needed now as is was earlier because Codd's ideas
and his formerly new vocabulary are now generally accepted in the
DBMS world.

This book does not strive for abstract theoretical purity.  It is 
practical, sometime annoyingly so.  For example, in Chapter 19 there
is a fine discussion of the support that a DBMS ought to have for
user defined functions, written in a host language such as C.
"The DBMS requires that each user-defined function and its inverse (if 
any) be written in one of the host languages, and compiled before the 
function is stored in the catalog."  ... "The term 'host language' is 
used to identify any one of the so-called general-purpose programming 
languages such as FORTRAN, COBOL, and PL/1.  The relational model
supports at least these three as languages with which the principal
relational language can communicate."

What does support for PL/1 and COBOL have to do with an abstract
model?! The quote above seem to imply that a DBMS is not _truly_
relational if it doesn't support the favorite languages of IBM
mainframe application developers, a nauseating thought.

Meanwhile, the book gives no help to an implementor worried about
the problems of linking user defined functions.  That is fairly
consistent with Codd's stance as an advocate for users and an
adversary of implementors.

Much of the book is a convincing polemic for the support of domains
as extended data types, defined in the catalog.  This portion of
Codd's exposition is exceptionally clear.  I was formerly confused
about a number of concepts concerning the implementation of domain
support.  If I understand him correctly, Codd maintains that

A domain is an extended data type defined by:
 1. name;
 2. basic data type such as integer, floating point or characters;
 3. value constraint (that may not refer to another table).  Codd calls 
    this a range of permitted values;
 4. whether it is meaningful to apply the less_than comparator to
    values of this type.

All column definitions include the ID of the domain on which the 
column is based.  The column may have a further more restrictive
statement of the range of permitted values.
Two columns are comparable only when their domains are identical or
when an authorized use asserts a "domain check override". 

LIMITED INHERITANCE
Domains inherit the properties of the basic data type, but may not
inherit the properties of another domain.  So, you cannot define
a domain called LABEL and then another domain FUNCTION_LABEL based
on the qualities of LABEL.  This makes the implementation easy.
Users  cannot define base datatypes.

Codd lists nine good practical reasons for supporting domains.  Read 
the book!

Domains are related to referential integrity support, but a domain's
range of values cannot be defined in terms of a relation.  Consider
a two character state abbreviation domain, named ST.  The permitted
values might be stated as "in 'AK', 'AL', 'AZ', ... ",etc. Or, the range
might be any two uppercase alphabetic characters.  The later allows
the creation of a table of states which will allow new additions like
LU for Luna.  
There seems to be no immediate automated way of ensuring that all
base tables with columns defined over domain ST will be defined with
a referential constraint referring to the abbreviation column in the
states table.  At least the DBA can query the catalog and list all
base tables other that 'states' with columns on domain ST without a
referential constraint.  This query could be included in a trigger on
column definitions (create and alter) and the referential constraint
could be imposed via that trigger.  This is awfully round-about and
constitutes what Codd might call an excessive burden on the user.
Codd frequently refers to RM/V3 which may clean this up.

-- 
Satyabodhi              |  Domain:  bodhi@idun.ISC-BR.COM
                        |  UUCP:    ...!uunet!isc-br!idun!bodhi