[comp.databases] SQL query for duplicates

larned@uhccux.uhcc.hawaii.edu (Paul Larned) (12/22/89)

I am starting an application in Oracle v.5.1b Prof. for MS-DOS for a
network of PCs which will use RDBMS v.6 for OS/2--this after years of
writing applications in a dBASE environment.  I am importing a fixed-
column data file from a UNIFY database into an ORACLE-created table.
The primary key field I want to use has several duplicates in it from
an unprotected UNIFY environment.  I am having trouble writing a SQL
statement to identify the duplicate 4-digit numbers (in a character
field) in the new database so that they can be changed or deleted.  I
hope I am not reduced to listing all rows and visually identifying the
duplicates.  There are about 1000 rows.  I would appreciate any
suggestions.

daved@usperb.Dayton.NCR.COM (Dave Dresselhouse) (12/22/89)

In article <5784@uhccux.uhcc.hawaii.edu> larned@uhccux.uhcc.hawaii.edu (Paul Larned) writes:
>I am starting an application in Oracle v.5.1b Prof. for MS-DOS for a
>network of PCs which will use RDBMS v.6 for OS/2--this after years of
>writing applications in a dBASE environment.  I am importing a fixed-
>column data file from a UNIFY database into an ORACLE-created table.
>The primary key field I want to use has several duplicates in it from
>an unprotected UNIFY environment.  I am having trouble writing a SQL
>statement to identify the duplicate 4-digit numbers (in a character
>field) in the new database so that they can be changed or deleted.  I
>hope I am not reduced to listing all rows and visually identifying the
>duplicates.  There are about 1000 rows.  I would appreciate any
>suggestions.
Try This:
SELECT <KEY_COLUMN>, COUNT(*)
FROM <TABLE>
GROUP BY <KEY_COLUMN>
HAVING COUNT(*) > 1
...This should do the trick. Your output will list only duplicate key
values, along with the number of duplicates for each key.
-----------------------------------------------------------------------------
Dave Dresselhouse                          dave.dresselhouse@Dayton.NCR.COM
NCR Corporation                            (513) 445-4449
Dayton, OH
-----------------------------------------------------------------------------

may@28182.dec.com (Patrick May) (12/23/89)

In article <5784@uhccux.uhcc.hawaii.edu>, larned@uhccux.uhcc.hawaii.edu (Paul Larned) writes...
>I am starting an application in Oracle v.5.1b Prof. for MS-DOS for a
>network of PCs which will use RDBMS v.6 for OS/2--this after years of
>writing applications in a dBASE environment.  I am importing a fixed-
>column data file from a UNIFY database into an ORACLE-created table.
>The primary key field I want to use has several duplicates in it from
>an unprotected UNIFY environment.  I am having trouble writing a SQL
>statement to identify the duplicate 4-digit numbers (in a character
>field) in the new database so that they can be changed or deleted.  I
>hope I am not reduced to listing all rows and visually identifying the
>duplicates.  There are about 1000 rows.  I would appreciate any
>suggestions.


     I can think of two ways to remove duplicates from an Oracle database
table.  The first will work with any SQL-based database:

          1.  Create a temporary table with the same structure as the
              one containing the duplicates.

          2.  INSERT INTO new_tab
              SELECT DISTINCT *
              FROM dup_tab;

          3.  Delete all records from dup_tab and refill it from new_tab.

     This works if entire rows are duplicated.  If you just want to find
duplicate values in a particular field, the following should be helpful:

          SELECT DISTINCT col_name, count(*)
          FROM dup_tab
          GROUP BY col_name
          HAVING count(*) > 1;

     This will display all duplicate values for col_name and the number of
times that value appears in dup_tab.

     The second method is Oracle specific.  In every Oracle table there is a
column named ROWID.  The value stored in this column is unique across the table
(and, I believe, across the entire database).  There are some restrictions on
its use (documented in the manuals), but you should be able to use it to specify
the exact row you wish to delete.

Patrick

barrym@infmx.UUCP (Barry Mednick) (12/23/89)

To find duplicates in a table,

  SELECT rowid, x.prime_key
      FROM tabname x, tabname y
         WHERE  x.prime_key = y.prime_key
           AND  x.rowid NOT = y.rowid

tn@hpldola.HP.COM (Ted Neff) (12/23/89)

How about the following:

	SELECT fieldname
	FROM tablename
	GROUP BY fieldname
	HAVING COUNT(*) > 1

steve.jackson@canremote.uucp (STEVE JACKSON) (12/24/89)

>I >hope I am not reduced to listing all rows and visually identifying
the >duplicates.  There are about 1000 rows.  I would appreciate any
>>suggestions.
How about:
=======================================================================
REM The select for update is required for updates and deletes based
REM on rowid.
select * from table
for update of column
/
REM All rows having duplicate column keys will be deleted except for
REM one in each group having the minimum rowid
delete from table x
where rowid > 
(select min(rowid)
from table
where column = x.column)
/
# copied from Oracle Technical Bulletin, June 1988
sjackson
---
 * Via ProDoor 3.1R