[comp.databases] A few words on the "normalization"

marti@ethz.UUCP (Robert Marti) (08/05/89)

In article <36270005@hpindwa.HP.COM>, donovan@hpindwa.HP.COM (Donovan Hsieh)
writes in repsonse to some earlier message of mine:

> My point is, it is difficult to formulate all possible MVDs for a given 
> relational database scheme in the real world applications. Certainly, finding 
> some or several of the MVDs are easier. If the designer choose to use the
> synthese algorithm to normalize the database scheme, incomplete set of FDs &
> MVDs can lead to incorrect result.

I see your point, and it's certainly valid.  However -- as I am sure you
know -- a designer does not have to specify ALL functional and multi-
valued dependencies to achieve correct third or fourth normal form.
All she needs to do is provide a superset of an FD cover (or superset of
a dependency basis in the case of MVDs) from which all FDs (MVDs) can be
inferred.  For real world examples, finding these FDs and MVDs is
essentially the same thing as finding the relationships between different
entity types with their association cardinalities in an ER model.
In either case, you get an incorrect model of the world if the designer
overlooks some relationships between attributes (in the relational model)
or entity types (in the ER model).

-- 
Robert Marti                      Phone:      +41 1 256 52 36
Institut fur Informationssysteme
ETH-Zentrum                       CSNET/ARPA: marti%inf.ethz.ch@relay.cs.net
CH-8092 Zurich, Switzerland       UUCP:       ...uunet!mcvax!ethz!marti

donovan@hpindwa.HP.COM (Donovan Hsieh) (08/08/89)

	
In the note, Robert Marti <marti@ethz.UUCP> at ETH Zuerich wrote :

> I see your point, and it's certainly valid.  However -- as I am sure you
> know -- a designer does not have to specify ALL functional and multi-
> valued dependencies to achieve correct third or fourth normal form.
> All she needs to do is provide a superset of an FD cover (or superset of
> a dependency basis in the case of MVDs) from which all FDs (MVDs) can be
> inferred.  For real world examples, finding these FDs and MVDs is
> essentially the same thing as finding the relationships between different
> entity types with their association cardinalities in an ER model.
> In either case, you get an incorrect model of the world if the designer
> overlooks some relationships between attributes (in the relational model)
> or entity types (in the ER model).

In theory, the suggested way to design a fully normalized relational database
scheme is as follows :

	1. Formulate all possible set of FDs & MVDs from the real world 
	   enterprise
	2. Compute the "optimal covers" from the set of FDs & MVDs to derive
	   the optimal set of FDs & MVDs. Note that a minimal set is not
	   necessarily optimal.	
	3. Apply whatever algorithms you may choose to normalize the set
	   of optimal FDs & MVDs to the highest possible normal form.

But in reality, the difficulties are :

	1. You'll probably never be able to prove that you have obtained
	   all possible FDs & MVDs for the real world enterprise except for
	   those toy databases. Not to mention what is the superset of the
	   the FDs & MVDs. It is different to say that "I have found the
	   complete set of FDs & MVDs" and "I can produce the superset of
	   FDS & MVDs for a GIVEN set of FDs & MVDs from which all original
	   dependencies can be inferred".

	2. At least for the current time, there is probably no polinomial
	   time algorithm for finding an "optimal cover" for a set of FDs &
	   MVDs. This is a NP-complete problem.

	3. Even if you have successfully completed the above two steps, the
	   net result of a mechanical normalization process does not always
	   produce the most efficient schema for real world queries. This is
	   due to that all normalization algorithms are purely "syntactical"
	   instead of "semantic". They do not have the knowledge of what are
	   the real world database requirements (from the perspective of
	   query execution efficiency or referential integrity). This is
	   why "Semantic" and "Object-Oriented" data models have become
	   popular.


Donovan Hsieh
Hewlett-Packard
Business Network Division

marti@ethz.UUCP (Robert Marti) (08/10/89)

In article <36270009@hpindwa.HP.COM>, donovan@hpindwa.HP.COM (Donovan Hsieh)
writes as contribution to our ongoing (never-ending :-) discussion:
> In theory, the suggested way to design a fully normalized relational database
> scheme is as follows: [ ... ]
> But in reality, the difficulties are :
> 	1. You'll probably never be able to prove that you have obtained
> 	   all possible FDs & MVDs for the real world enterprise except for
> 	   those toy databases. [ ... ]
I agree.  Absolutely.  I never claimed anything else in my previous
postings.  The point I was trying to make was:  If a designer is not
aware of certain relationships (dependencies) between different data
items, or if the designer is mistaken about the kind of relationship
between data items (i.e one-to-many instead of many-to-many, or FD
instead of MVD), you will end up with incorrect models of the world
no matter what design methodology you use.

> 	3. [ ... ] the net result of a mechanical normalization process does
>          not always produce the most efficient schema for real world queries.
Again I agree.  I might even go further and claim that it hardly ever
produces the most efficient schema.

>          This is due to that all normalization algorithms are purely
>          "syntactical" instead of "semantic". They do not have the knowledge
>          of what are the real world database requirements (from the
>          perspective of query execution efficiency or referential integrity).
Well, in the end, all information in the computer is purely syntactic,
even if you use a so-called semantic model.  In order to capture what
you call "real world database requirements", you'd probably want a
design tool / algorithm which also takes into consideration
- arbitrary integrity constraints (including referential integrity)
- expected data volumes (if known)
- "canned" update transactions and queries and their frequency (if known)
(The paper by Finkelstein et al. in ACM ToDS, Vol.13, No.1, March 88
presents a design tool which represents a step in this direction.)

-- 
Robert Marti                      Phone:      +41 1 256 52 36
Institut fur Informationssysteme
ETH-Zentrum                       CSNET/ARPA: marti%inf.ethz.ch@relay.cs.net
CH-8092 Zurich, Switzerland       UUCP:       ...uunet!mcvax!ethz!marti