[comp.databases] A few words on the "normaliztion"

donovan@hpindwa.HP.COM (Donovan Hsieh) (07/27/89)

There are two important factors for the designers to consider in designing 
a good relational database system : performance and data integrity &
consistency. Unfortunately, the normalization theory which is commonly used to
enforce the data integrity can often cause performance degredation. Since
a normalized (to 3/4 NF) database although eliminates update anomalies, but
it also increaes the cost for jointing normalized tables (or relations) and 
creating user views.

It is easy to claim that a simple database which contains only a few tables to
be fully normalized to its highest possible normal form. But in the real
world applications, we are always faced with multi dimensional data attributes
and complex relationships which sometimes we need to compromise between those
two factors. Thus a good database design really should ballance between those 
two boundaries, and sometimes even adopt unorthodox approaches to gain the 
performance and usability.

When I design a database application, I always evaluate what is the
best data models (or even using only staright flat files or ISAM files) 
which will provide the most benefit for the users. If a relational database is
a good fit, I normally use an Extended Entity-Relationship model to capture
the real world semantics which is coupled with functional/multi-valued
dependencies. Currently there are several normalization tools available on the
market. A typical one would be from Chen & Association which will allow the
database scheme to be normalized to 3NF or BCNF. In rare cases 4NF is required
by the real world applications. Above all, multi-valued dependencies are very 
difficult to formulate for the 4NF, and it is also very expensive for the
database system to enforce the multi-valued update consistency.

After the database is fully normalized to my best knowledge, I will apply 
certain user requirements to justify if any normalized tables make sense. 
Most normalization tools which use synthetic algorithms have very little
knowledge of the real world operations. Human intervention should be always
applied at the final decision for optimal fine tuning.

A good database system design is more than just normalizing the tables into
normal forms. There are various aspects of issues and problems involved
in the design process which require special treatment and attention.


Donovan Hsieh
Business Network Division
Hewlett-Packard

marti@ethz.UUCP (Robert Marti) (07/31/89)

In article <36270004@hpindwa.HP.COM>, donovan@hpindwa.HP.COM (Donovan Hsieh)
writes:
> [ ... ]  Above all, multi-valued dependencies are very difficult to
> formulate for the 4NF,
Huh?  I don't see what's so difficult about stating something like
empno -->> language (employees, identified by empno, have the ability
to speak/understand multiple languages).

> and it is also very expensive for the database system to enforce
> [the] multi-valued update consistency.
No DBMS enforces 4NF (or "multi-valued update consistency", as you call
it).  The database designer enforces 4NF, essentially by defining a
table per MVD.  Note that this is no differnet from 3NF:  The designer
enforces 3NF by defining appropriate tables and keys.

> A good database system design is more than just normalizing the tables into
> normal forms. There are various aspects of issues and problems involved
> in the design process which require special treatment and attention.
Yes.

-- 
Robert Marti                      Phone:      +41 1 256 52 36
Institut fur Informationssysteme
ETH-Zentrum                       CSNET/ARPA: marti%inf.ethz.ch@relay.cs.net
CH-8092 Zurich, Switzerland       UUCP:       ...uunet!mcvax!ethz!marti

donovan@hpindwa.HP.COM (Donovan Hsieh) (08/03/89)

In reply from <marti@ethz.UUCP>, marti%inf.ethz.ch@relay.cs.net (Robert Marti) 
wrote :

> Huh?  I don't see what's so difficult about stating something like
> empno -->> language (employees, identified by empno, have the ability
> to speak/understand multiple languages).
My point is, it is difficult to formulate all possible MVDs for a given 
relational database scheme in the real world applications. Certainly, finding 
some or several of the MVDs are easier. If the designer choose to use the
synthese algorithm to normalize the database scheme, incomplete set of FDs &
MVDs can lead to incorrect result.

> No DBMS enforces 4NF (or "multi-valued update consistency", as you call
> it).  The database designer enforces 4NF, essentially by defining a
> table per MVD.  Note that this is no differnet from 3NF:  The designer
> enforces 3NF by defining appropriate tables and keys.
You are quite right. It is the DB designer's responsibility to enforce the
NF consistency. I have used a wrong word for saying "database system to 
enforce the consistency".

Donovan Hsieh
Hewlett-Packard
Business Network Division

rlh@fciva.FRANKLIN.COM (Ramon L. Holt) (08/04/89)

In reply from <marti@ethz.UUCP>, marti%inf.ethz.ch@relay.cs.net (Robert Marti) 
wrote :
>
> No DBMS enforces 4NF (or "multi-valued update consistency", as you call
> it).  The database designer enforces 4NF, essentially by defining a
> table per MVD.  Note that this is no differnet from 3NF:  The designer
> enforces 3NF by defining appropriate tables and keys.

I have been following along this line of discussion and understand most of
it and have even learned something new (like that keeping similar data in 
as few places as possible in called normalization) but could someone explain 
3NF, 4NF and MVD ? Could you also provide some simple examples that would
show the differences between these levels of normalization (they are levels 
dealing with normalization aren't they?).

Thank you.

Rame.

-- 
Ramon L. Holt				Franklin Mortgage Capital Corp.
(703) 448-3300				7900 Westpark Drive, Suite A-130
uunet!fciva!rlh				McLean, Virginia  22102

donovan@hpindwa.HP.COM (Donovan Hsieh) (08/04/89)

/ hpindwa:comp.databases / rlh@fciva.FRANKLIN.COM (Ramon L. Holt) / 11:46 am  Aug  3, 1989 /

> I have been following along this line of discussion and understand most of
> it and have even learned something new (like that keeping similar data in 
> as few places as possible in called normalization) but could someone explain 
> 3NF, 4NF and MVD ? Could you also provide some simple examples that would
> show the differences between these levels of normalization (they are levels 
> dealing with normalization aren't they?).

In this notes file entry title "Do you normalize ?" reply #11, I have listed
4 books which may be helpful for those who like to know more about the
normal form theory. The first two books are suitable for those who are not
interested in mathematic, and the later are for more mathematic background
readers.

Donovan Hsieh
Hewlett-Packard
Business Network Division