[comp.databases] SQL Duplicate Row Deletion ???

tdoyle@cim-vax.honeywell.com (04/02/91)

In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes:
> I've made a nice mistake loading data twice over several days into a table, and
> would like to delete just one of the duplicate rows.  The number of duplicates
> is on the order of 10,000 long rows, and the table has 250,000 rows total, so
> unloading, manipulating the data, and reloading the data is out of the
> question.  I can identify the duplicate rows with a select joining the table
> to itself, but I can't use the same technique to delete the rows.
> 
> Anyone have a nifty solution to this problem?  The database manager is DB2, but
> I'd think that any SQL based language would have the same problem.
> 
> Thanks in advance,
> 
> Phil Zampino

Select identified dupicates into a temporary table (SELECT UNIQUE INTO TEMP).
Then delete from the original where the record is the same as in TEMP.
This will rid ALL instances of the duplicates (including the original).
Then add records from TEMP into ORIG to restore one copy.

Bipin Prasad bprasad@honcim1.honeywell.com

SYSPMZT@gecrdvm1.crd.ge.com (04/02/91)

I've made a nice mistake loading data twice over several days into a table, and
would like to delete just one of the duplicate rows.  The number of duplicates
is on the order of 10,000 long rows, and the table has 250,000 rows total, so
unloading, manipulating the data, and reloading the data is out of the
question.  I can identify the duplicate rows with a select joining the table
to itself, but I can't use the same technique to delete the rows.

Anyone have a nifty solution to this problem?  The database manager is DB2, but
I'd think that any SQL based language would have the same problem.

Thanks in advance,

Phil Zampino

SYSPMZT@gecrdvm1.crd.ge.com (04/03/91)

In article <1991Apr1.163615.56@cim-vax.honeywell.com>,
tdoyle@cim-vax.honeywell.com says:
>
>In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com
>writes:
>> I've made a nice mistake loading data twice over several days into a table,
>and
>> would like to delete just one of the duplicate rows.                        s
>> Anyone have a nifty solution to this problem?  The database manager is DB2,
>>
>> Phil Zampino
>

>Select identified dupicates into a temporary table (SELECT UNIQUE INTO TEMP).
>Then delete from the original where the record is the same as in TEMP.
>This will rid ALL instances of the duplicates (including the original).
>Then add records from TEMP into ORIG to restore one copy.
>
>Bipin Prasad bprasad@honcim1.honeywell.com

Thanks; I got a lot of personal replies with some clever suggesstions to this
problem.  A similar solution is:

Create temp_table like original_table
Select distinct * from original_table into temp_table
Delete from original_table
Select * from temp_table into original_table.

I am now the happy owner of a uniquely rowed table, and plan to put a unique
index on the table to protect it from my own stupid actions.

Thanks to all replying,
      Phil Zampino

beal@paladin.owego.ny.us (Alan Beal) (04/03/91)

tdoyle@cim-vax.honeywell.com writes:

>In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes:
>> I've made a nice mistake loading data twice over several days into a table, and
>> would like to delete just one of the duplicate rows.
>> 
>> Anyone have a nifty solution to this problem?  The database manager is DB2, but
>> I'd think that any SQL based language would have the same problem.

I often use QMF as a tool for doing this, see below.

>Select identified dupicates into a temporary table (SELECT UNIQUE INTO TEMP).

How about in QMF:
   SELECT DISTINCT * FROM TABLE
Then:
   SAVE DATA AS TEMP
TEMP should now contain the unique rows.  Note that QMF creates a table
called TEMP in the tablespace specified by the SPACE parameter in your
QMF profile.  Typically, QMF's default tablespace is used by everyone,
may contain many tables, and may be in serious need of a reorganization.

>Then delete from the original where the record is the same as in TEMP.

Or using my example, delete all the records in the original table using:
   DELETE FROM TABLE
Can save time by using the LOAD utility with REPLACE and dummy input.  Of
course be careful if you have multiple tables in the tablespace.  The 
DELETE will run faster if the tablespace is segmented.

>This will rid ALL instances of the duplicates (including the original).
>Then add records from TEMP into ORIG to restore one copy.

Then reinsert all the data back in from TEMP:
   INSERT INTO TABLE
   SELECT * FROM TEMP

Of course, it would be wise to backup your table before doing this.  And
finally, I assume you realize having an unique index would have prevented
this problem in the first place.
-- 
Alan Beal
Internet: beal@paladin.Owego.NY.US
USENET:   {uunet,uunet!bywater!scifi}!paladin!beal

mspayer@PacBell.COM (Mike Payer) (04/03/91)

In article <91091.141528SYSPMZT@GECRDVM1.BITNET> SYSPMZT@gecrdvm1.crd.ge.com writes:
>I've made a nice mistake loading data twice over several days into a table, and
>would like to delete just one of the duplicate rows.  The number of duplicates
>is on the order of 10,000 long rows, and the table has 250,000 rows total, so
>unloading, manipulating the data, and reloading the data is out of the
SOME STUFF DELETEDhhsolution to this problem?  The database manager is DB2, but
>
		SOME STUFF DELETED
>Phil Zampino

I just did the same thing about a week ago. I am using informix isql and some 
4gl. But what I did was run a select statement that gave me the row id number
and then deleted the row id for the duplicate entries. This worked fine for me
as I had about 1000 duplicate rows.

Hope this helps
      ____________________________________________________________
     /\                  Michael S Payer Jr                       \
     \_|     Administrator              FAX 415-867-0344          |
       |     Emergency Control Center   pacbell!pbecc!msp         |
       |                   pacbell!pbhya!mspayer                  |
       |                                                          |
       |   ____________SOMETIMES THE DRAGON WINS__________________|__
        \_/_________________________________________________________/

drack@titan.tsd.arlut.utexas.edu (Dave Rackley) (04/03/91)

In article <1991Apr3.010838.2063@eng.ufl.edu> tsao@helios.tcad.ee.ufl.edu ( Tsao) writes:

>I am now using Ingres, I am surprised that it allows the user to append
>duplicated rows (including a null row) into a table unless you define a key
>using create unique index.

>Would anybody give me an example justifying the need for duplicated tuples
>in a table ?

I cannot justify the need for them, but in real-time data collection duplicates
are often created by multiple sensors at a single source.  It then becomes a
data reduction/analysis issue to remove duplicates, while recording the fact
that duplicate data captures occurred.

Why let this happen?  Our data is always dirty--we have to clean it up before
the customer gets his hands on it!

--

  DISCLAIMER:  I, hereby, disclaim any disclaimer except this disclaimer.  

+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+
| David Rackley                 | Now comes the really weird part...You know  |
| Applied Research Laboratories | the part where I know how to tap dance, but |
| The University of Texas       | only while wearing golf shoes...            |
| Austin, TX.  78758            |        ...Ponderous, man, really ponderous! |
+=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+

tdoyle@cim-vax.honeywell.com (04/03/91)

In article <1991Apr3.010838.2063@eng.ufl.edu>, tsao@helios.tcad.ee.ufl.edu ( Tsao) writes:
> In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes:
> |> I've made a nice mistake loading data twice over several days into a table, and
> ......
> 
> 	I am now using Ingres, I am surprised that it allows the user to append 
> duplicated rows (including a null row) into a table unless you define a key
> using create unique index.
> 
> 	Would anybody give me an example justifying the need for duplicated tuples
> in a table ?
> 
>  

In a "clean" database implementation there is NO excuse for such behavior,
but alas it is simple/machine efficient? for DBMS vendors to do it this way.

Since the process of relational modelling means that ultimately the 
tuple/relation is supposed to be in third normal form (or a close
approximation) and thus should have one and only one set of attributes that
uniquely identify the object. This set of attributes then should automatically
have a unique index. None of the popular DBMS enforce this.

Another problem with the popular DBMS's is that they don't treat null values
correctly. If the attributes in the unique index are null, then multiple
entries with null should be permitted (using the theory that null != null, i.e
the two nulls are different).

Bipin Prasad      bprasad@honcim1.honeywell.com

sweiger@sequent.com (04/04/91)

In article <1991Apr3.010838.2063@eng.ufl.edu> tsao@helios.tcad.ee.ufl.edu writes:
>I am now using Ingres, I am surprised that it allows the user to append 
>duplicated rows (including a null row) into a table unless you define a key
>using create unique index.
>
>Would anybody give me an example justifying the need for duplicated tuples
>in a table ?
>
> 
SQL specifically allows duplicate keys and nulls as attributes.  The only
way to force unique primary keys in SQL is to CREATE UNIQUE INDEX using 
that key, a construct that almost every SQL dialect supports.  In theory,
relational databases are not supposed to contain duplicate keys.  In
theory, there is always a primary key which identifies the tuple does
not contain null attributes.  SQL deviates from relational model in this 
regard, and many words of criticism have been written about this deviation 
from the model.  It certainly does complicate RDBMS algorithms.  And joins
can produce some surprising results.  But my personal opinion is this:  
A wise database designer makes sure that his database has a primary key.  
But the fact that SQL allows duplicates recognizes some legitimate, real 
world situations where it just isn't feasible, or perhaps desirable in 
a performance sense, to create a primary key for every tuple.

--Mark

tdoyle@cim-vax.honeywell.com (04/04/91)

In article <DRACK.91Apr3083024@diablo.titan.tsd.arlut.utexas.edu>, drack@titan.tsd.arlut.utexas.edu (Dave Rackley) writes:
> 
> I cannot justify the need for them, but in real-time data collection duplicates
> are often created by multiple sensors at a single source.  It then becomes a
> data reduction/analysis issue to remove duplicates, while recording the fact
> that duplicate data captures occurred.
> 
> Why let this happen?  Our data is always dirty--we have to clean it up before
> the customer gets his hands on it!

This may be a little esotoric, but I would suggest that the table is not
in third normal form. If the table represents data measurements from various
sensors, this would indicate that one attribute is missing from the table:
i.e. sensor-id. If the table represents simply the sequential measurement
then a date-time or a counter attribute may be required.

Once the database "correctly" models the world, then it is quite
possible, (when the user-view of the object does not include one of the
primary identifiers), that the tuple presented in the view corresponds to
more than one tuple in the original relation. But, this does not excuse the
original deficiency the relational DBMSs.

mao@eden.Berkeley.EDU (Mike Olson) (04/04/91)

In <1991Apr3.010838.2063@eng.ufl.edu>, tsao@helios.tcad.ee.ufl.edu ( Tsao)
expresses surprise that ingres permits insertion of duplicate rows, and
asks for

> an example justifying the need for duplicated tuples in a table ?

the reason that all commercial systems allow you to insert duplicates is
that, unless an index exists on the table, every insertion turns into
a scan of the entire table.  dups are bad and everyone admits that, but
nobody seems to be willing to pay the cost of guaranteeing they never happen.

we've also discovered (in postgres) that you don't always know what equality
means for a user-defined data type, so you can't in principle guarantee that
dups never happen without extra work on the part of the user.  actually, it
turns out that type extensibility is pretty hard for a lot of reasons like
this, but that's off-topic...
					mike olson
					postgres research group
					uc berkeley
					mao@postgres.berkeley.edu

cdm@gem-hy.Inel.GOV (Dale Cook) (04/04/91)

In article <1991Apr3.085710.57@cim-vax.honeywell.com>,
tdoyle@cim-vax.honeywell.com writes:
|> In article <1991Apr3.010838.2063@eng.ufl.edu>,
tsao@helios.tcad.ee.ufl.edu ( Tsao) writes:
|> > In article <91091.141528SYSPMZT@GECRDVM1.BITNET>,
SYSPMZT@gecrdvm1.crd.ge.com writes:
|> > |> I've made a nice mistake loading data twice over several days
into a table, and
|> > ......
|> > 
|> > 	I am now using Ingres, I am surprised that it allows the user to append 
|> > duplicated rows (including a null row) into a table unless you
define a key
|> > using create unique index.
|> > 
|> > 	Would anybody give me an example justifying the need for
duplicated tuples
|> > in a table ?
|> > 
|> >  
|> 
|> In a "clean" database implementation there is NO excuse for such behavior,
|> but alas it is simple/machine efficient? for DBMS vendors to do it this way.
|> 
|> Since the process of relational modelling means that ultimately the 
|> tuple/relation is supposed to be in third normal form (or a close
|> approximation) and thus should have one and only one set of attributes that
|> uniquely identify the object. This set of attributes then should
automatically
|> have a unique index. None of the popular DBMS enforce this.
|> 

Sorry, but I must disagree.  Attributes do not _identify_ an instance of an
entity, they merely describe it.  A subtle, but important, distinction.
There is nothing in data modelling that says multiple instances of an entity
or relationship cannot have identical attributes.  What data modelling says
you MUST have is an instance of an entity uniquely identifiable by a "primary
key".  It is sufficient to have a simple sequence number to satisfy this
rule.  Relational DBMS systems satisfy this requirement for you.  Suffice it 
to say that every row in the table has its own internal unique identifier.
If the sequence number is unimportant information, why not let the DBMS keep
track of it for you?

|> Another problem with the popular DBMS's is that they don't treat null values
|> correctly. If the attributes in the unique index are null, then multiple
|> entries with null should be permitted (using the theory that null !=
null, i.e
|> the two nulls are different).
|> 

I don't follow this.

|> Bipin Prasad      bprasad@honcim1.honeywell.com

----------------------------------------------------------------------
--- Dale Cook          "I was born at night, but by god, it wasn't 
    cdm@inel.gov         last night."
The opinions are mine.  The following disclaimer is my employers.
----------------------------------------------------------------------
========== long legal disclaimer follows, press n to skip ===========

Neither the United States Government or the Idaho National Engineering
Laboratory or any of their employees, makes any warranty, whatsoever,
implied, or assumes any legal liability or responsibility regarding any
information, disclosed, or represents that its use would not infringe
privately owned rights.  No specific reference constitutes or implies
endorsement, recommendation, or favoring by the United States
Government or the Idaho National Engineering Laboratory.  The views and
opinions expressed herein do not necessarily reflect those of the
United States Government or the Idaho National Engineering Laboratory,
and shall not be used for advertising or product endorsement purposes.

miket@blia.sharebase.com (Mike Tossy) (04/05/91)

In article <1991Apr3.085710.57@cim-vax.honeywell.com>, tdoyle@cim-vax.honeywell.com writes:
> In article <1991Apr3.010838.2063@eng.ufl.edu>, tsao@helios.tcad.ee.ufl.edu ( Tsao) writes:
> > In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes:
> > |> I've made a nice mistake loading data twice over several days into a table, and
> > ......
> > 
> > 	Would anybody give me an example justifying the need for duplicated tuples
> > in a table ?
> > 
> >  
> 
> In a "clean" database implementation there is NO excuse for such behavior,
> but alas it is simple/machine efficient? for DBMS vendors to do it this way.
> 
> Since the process of relational modelling means that ultimately the 
> tuple/relation is supposed to be in third normal form (or a close
> approximation) and thus should have one and only one set of attributes that
> uniquely identify the object. This set of attributes then should automatically
> have a unique index. None of the popular DBMS enforce this.

ShareBase III (an ANSI SQL based system) does create a unique index (by
default) on the primary key.  Unfortunatly the ANSI Standard requires that
duplicate rows be permitted - a really dumb idea - so we allow duplicate
rows if the DBA requests (Not that there is much useful that can be done
with them (dirty data excluded).

ShareBase II is QUEL based, like Ingres and others, and only permits
duplicates when there is no sorting (clustered) index.  This is mostly an
expense issue - finding duplicates without indices can be expensive.  Lots of
other things, like garbage collection, typically break when there is no
clustered index.  All our manuals and classes stress that a clustered index
was required on every table for proper system operation.  I'll guess this is
a good rule for the other QUEL based systems.

By the way, duplicates is one of the fundimental semantic differences between
SQL based and QUEL based systems.  QUEL defines duplicate tuples (rows) as
having no meaning and the optimizer is allowed (encouraged?!) to delete
duplicates.  SQL says that duplicates have meaning (although it doesn't give
you a way to manipulate them!) and the optimizer can only delete duplicates
at user request (using the distinct clause).  Thus projections which do not
include the primary key (including most aggregates) can give very different
results under SQL semantics and QUEL semantics.  And, just because you are
using SQL syntax does not mean you are getting SQL semantics.  Most of the
QUEL based systems long ago delivered SQL; but with QUEL semantics!  It is
the semantics (not the syntax) that keeps several of the vendors from passing
the NIST ANSI SQL tests.

> ...
>
> Bipin Prasad      bprasad@honcim1.honeywell.com

--
      >>>>>>  The above statements are only my opinions <<<<<<

Mike Tossy					ShareBase Coropration
miket@sharebase.com				14600 Wichester Blvd
(408) 378-7575 ext2200				Los Gatos, CA 95030
	(ShareBase is a subsidiary of Teradata Corportation)

tdoyle@cim-vax.honeywell.com (04/05/91)

In article <1991Apr4.004438.15592@inel.gov>, cdm@gem-hy.Inel.GOV (Dale Cook) writes:
> 
> Sorry, but I must disagree.  Attributes do not _identify_ an instance of an
> entity, they merely describe it.  A subtle, but important, distinction.
> There is nothing in data modelling that says multiple instances of an entity
> or relationship cannot have identical attributes.

Attributes are the ONLY way to describe an object. You may CHOOSE to use the
rowid as an additional attribute for the sake of convenience (or in some 
extreme case that I have not encountered, a necessity), but don't try
to pass that off as data modelling.

Note that the person who originally posted the request has now created a 
unique index to avoid the problem.

> 
> |> Another problem with the popular DBMS's is that they don't treat null values
> |> correctly. If the attributes in the unique index are null, then multiple
> |> entries with null should be permitted (using the theory that null !=
> null, i.e
> |> the two nulls are different).
> |> 
> 
> I don't follow this.
> 

Most reasonable databases treat NULL as a special value for an attribute. 
It means that the value is UNKNOWN. This special value is different from
0 for numeric data and blank for strings. Thus if you had the following
data in your table:

          Person	balance ($)
            A		10
	    B		20
	    C		30
	    D		NULL
      	    E		NULL

SELECT COUNT(*) WHERE balance > 0        will return 3
SELECT AVG(balance)                      will return 20    not 12

Since one null is not the same as another null, balance of D is not the same
as the balance for D. This has implications on how JOINS betweens tables are
executed, how selections are performed, how boolean logic is performed, etc.
(NULL != NULL, result of comparision (NULL > NULL) is unknown).

Most major relational databases do this well/correctly.

WHERE THEY FAIL IS WHEN the attribute that has null values, is part of
the primary key. Then they treat NULL as a single value (which is how NULL is
represented internally in the relational DBMSs). This really hoses things.

As an example: 

	Following is a table where employee number is the key:

	emp-no	name	status	salary   ...etc
        (unique
	index)

	1	A	...    ...............
	2       B       ....     ...........
	NULL    C	....................      New hire, emp-no pending
	NULL    D	....................      New hire, emp-no pending

Or take this example, where purchase requisitions are assigned purchase order
number:

   Tables:  1. Purchase Order  (unique index POnumber)
	    2. Invoices		(unique index POnumber, InvoiceNumber)

Table 1. Purchase Order:

	POnumber  Status amount	vendor limit clause ... etc
        (unique
	 index)

	1	expire	20000  ...................
	2	current	20000  ...................
	NULL	new	20000  ...................  POnumber unaasigned waiting
	NULL	new	20000  ...................  for signoff process
-----------------------------------------------------------------

You get the picture.

Bipin Prasad      bprasad@honcim1.honeywell.com

rob.bbs@shark.cs.fau.edu (Robert Rittenhouse) (04/05/91)

> In article <1991Apr3.010838.2063@eng.ufl.edu> tsao@helios.tcad.ee.ufl.edu wri
> >I am now using Ingres, I am surprised that it allows the user to append 
> >duplicated rows (including a null row) into a table unless you define a key
> >using create unique index.
> >
> >Would anybody give me an example justifying the need for duplicated tuples
> >in a table ?
> >
> > 
An example: I'm doing a parts explosion which consists of recursively
searching a parts tree (a fun trick in and of itself) and inserting
the components required in a parts required table. Sometimes two
parts will use the same component--hence two identical rows will
be inserted in the parts required table. Certainly could avoid doing
so by having a sequence number or some such unique identifier but
there seems no good reason to do so.
(I fix the parts table when finished with the explosion, btw).
Rob R.

clifton@egsner.cirr.com (Clifton Bean) (04/07/91)

In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes:
I've made a nice mistake loading data twice over several days into a table, and
would like to delete just one of the duplicate rows.
 
Anyone have a nifty solution to this problem?  The database manager is DB2, but
I'd think that any SQL based language would have the same problem.

------------------------------------------------------------------------------

I use Informix.  Their tables all contain a "hidden" index on the row_id of
the records within the table.  In Informix, I would code the following:

                 Select rowid, * from table
                  where ... = ...  (you supply the fieldname and contents)

You would then use one of the row_ids in your delete from table statement.

Check your documentation's index for a reference to this type of "automatic"
index - I would think this type of index is a logical part of any sql package.
-- 
*******************************************************************************
Clifton M. Bean                          USENET: clifton@lodestar.lonestar.org
Page & Addison, P.C., Dallas, TX           UUCP: ...!egsner!lodestar!clifton
Work Phone: (214) 960-0933 (9-6 CDT)     Home Phone: (214) 307-2242 (pm CDT)

pavlov@canisius.UUCP (Greg Pavlov) (04/08/91)

In article <1991Apr3.085710.57@cim-vax.honeywell.com>, tdoyle@cim-vax.honeywell.com writes:
> > 	I am now using Ingres, I am surprised that it allows the user to append 
> > duplicated rows (including a null row) into a table unless you define a key
> > using create unique index.
> > 
> In a "clean" database implementation there is NO excuse for such behavior,
> but alas it is simple/machine efficient? for DBMS vendors to do it this way.
> 
  Ignoring the "theoretical" argument of what an "attribute" really is, I would
  argue very strongly that I do NOT want my DBMS to make such assumptions. 
  INGRES generally provides one with the tools to enforce whatever "view" of 
  "good" database "practice" one may have and one can use them accordingly. 
  But that does not mean that a given person's "view" is useful, optimal, or
  even "correct".  

  In our shop, for instance, all tables are required to have
  unique keys.  But if we receive a m{ae}ss of data from elsewhere that has to
  be manipulated, massaged, etc, to get it into a workable form, we first load
  it in "raw" into temporary tables to see what we actually have and to use
  INGRES itself to perform the reformatting/transformation.  During this work,
  I don't want INGRES to make any other-view presuppositions about what we
  should/should not be doing.

  greg pavlov, fstrf, amherst, ny
  pavlov@stewart.fstrf.org

cdm@gem-hy.Inel.GOV (Dale Cook) (04/08/91)

In article <1991Apr5.074844.59@cim-vax.honeywell.com>,
tdoyle@cim-vax.honeywell.com writes:
|> In article <1991Apr4.004438.15592@inel.gov>, cdm@gem-hy.Inel.GOV
(Dale Cook) writes:
|> > 
|> > Sorry, but I must disagree.  Attributes do not _identify_ an
instance of an
|> > entity, they merely describe it.  A subtle, but important, distinction.
|> > There is nothing in data modelling that says multiple instances of
an entity
|> > or relationship cannot have identical attributes.
|> 
|> Attributes are the ONLY way to describe an object. You may CHOOSE to use the
                                  ^^^^^^^^
Isn't that what I said?

|> rowid as an additional attribute for the sake of convenience (or in some 
|> extreme case that I have not encountered, a necessity), but don't try
|> to pass that off as data modelling.

You're right; what I meant to do was make a distinction between describing
an instance of an object and identifying it.  Obviously, in order to make
use of rowid as the primary key, one should properly model the data in order
to come to the conclusion that a sequence number is an appropriate primary
key.  My implication was that the primary key was not an attribute; that
implication is false.  

But we digress.  The original question is whether a true RDBMS should allow
rows with identical attributes.  I still claim it should.  As long as the 
DBMS internally identifies the row uniquely, the implementation violates no
rules of normalization.  As long as the DBMS allows you the ability to 
create unique keys via some mechanism, the implementor has no problem.  To
enforce uniqueness between all identified attributes unnecessarily removes
a perfectly valid option to a database implementor.

As a matter of practicality as well, why should I pay for the DBMS to ensure
every row has at least one attribute with a different value, when I don't
need it?  What you are asking for is a unique index on the entire row.  Is
that what you want?  I really doubt it.

|> 
|> Note that the person who originally posted the request has now created a 
|> unique index to avoid the problem.
|>

Which is exactly what should have been done.  The real problem is not with
the DBMS; it is with unfounded expectations about the behavior of the DBMS.

----------------------------------------------------------------------
--- Dale Cook          "You can sum this game up in one word -
    cdm@inel.gov        'you never know'".  --- J. Andujar
The opinions are mine.  The following disclaimer is my employers.
----------------------------------------------------------------------
========== long legal disclaimer follows, press n to skip ===========

Neither the United States Government or the Idaho National Engineering
Laboratory or any of their employees, makes any warranty, whatsoever,
implied, or assumes any legal liability or responsibility regarding any
information, disclosed, or represents that its use would not infringe
privately owned rights.  No specific reference constitutes or implies
endorsement, recommendation, or favoring by the United States
Government or the Idaho National Engineering Laboratory.  The views and
opinions expressed herein do not necessarily reflect those of the
United States Government or the Idaho National Engineering Laboratory,
and shall not be used for advertising or product endorsement purposes.

davek@informix.com (David Kosenko) (04/09/91)

In article <1991Apr7.054945.8610@egsner.cirr.com> clifton
	@egsner.cirr.com (Clifton Bean) writes:
>
>I use Informix.  Their tables all contain a "hidden" index on the row_id of
>the records within the table.  In Informix, I would code the following:

	Actually, we do not have any "hidden" indexes, though it is very
commonly misunderstood that we do.  Rowid is best described as an l-value,
or the location of a particular record in a database table.  With SE databases
(using a Unix file for the data rows), the rowid represents the offset into
the file where the row is found.  With Turbo/OnLine, it provides a page number/
slot table number  combination, which is used in a similar fashion to locate
the row in question.  In no case is there any true index built on this rowid
value.  You can use the rowid in a WHERE clause, in which case you are asking 
for the row at that particular location.

Dave Kosenko
Informix Client Services

-- 
Disclaimer: These opinions subject to change without notice.
**************************************************************************
The heart and the mind on a parallel course, never the two shall meet.
						-E. Saliers

cakers@oracle.com (Charles Akers) (04/10/91)

The various methods described in the articles on deleting duplicate
rows in a SQL database table do not do it in a single SQL statement.
If the table involved has a keyfield and there are duplicates of keys,
the following SQL statement will delete all rows except the one row
for each keyfield which has the lowest rowid:

If the table is named table_X and the keyfield is named keycol:

DELETE FROM table_X t_alias
WHERE ROWID >
 (SELECT MIN(ROWID)
  FROM table_X
  WHERE keycol=t_alias.keycol);

This statement will delete extra rows with the same keyfield even if
the values in other (non-key) columns are not the same.  That is not a
problem if you have just inserted the same rows twice and want only
one copy of each row.

If there is no column in the table which was unique prior to inserting
the duplicates, the following 3 statements will eliminate duplicates
of rows which have identical values in all columns (requires your SQL
to have a RENAME command).

CREATE TABLE tmp_tab AS
 SELECT DISTINCT *
 FROM table_X;
DROP TABLE table_x;
RENAME tmp_tab TO table_X;

-------------
Charles Akers
-------------