tdoyle@cim-vax.honeywell.com (04/02/91)
In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes: > I've made a nice mistake loading data twice over several days into a table, and > would like to delete just one of the duplicate rows. The number of duplicates > is on the order of 10,000 long rows, and the table has 250,000 rows total, so > unloading, manipulating the data, and reloading the data is out of the > question. I can identify the duplicate rows with a select joining the table > to itself, but I can't use the same technique to delete the rows. > > Anyone have a nifty solution to this problem? The database manager is DB2, but > I'd think that any SQL based language would have the same problem. > > Thanks in advance, > > Phil Zampino Select identified dupicates into a temporary table (SELECT UNIQUE INTO TEMP). Then delete from the original where the record is the same as in TEMP. This will rid ALL instances of the duplicates (including the original). Then add records from TEMP into ORIG to restore one copy. Bipin Prasad bprasad@honcim1.honeywell.com
SYSPMZT@gecrdvm1.crd.ge.com (04/02/91)
I've made a nice mistake loading data twice over several days into a table, and would like to delete just one of the duplicate rows. The number of duplicates is on the order of 10,000 long rows, and the table has 250,000 rows total, so unloading, manipulating the data, and reloading the data is out of the question. I can identify the duplicate rows with a select joining the table to itself, but I can't use the same technique to delete the rows. Anyone have a nifty solution to this problem? The database manager is DB2, but I'd think that any SQL based language would have the same problem. Thanks in advance, Phil Zampino
SYSPMZT@gecrdvm1.crd.ge.com (04/03/91)
In article <1991Apr1.163615.56@cim-vax.honeywell.com>, tdoyle@cim-vax.honeywell.com says: > >In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com >writes: >> I've made a nice mistake loading data twice over several days into a table, >and >> would like to delete just one of the duplicate rows. s >> Anyone have a nifty solution to this problem? The database manager is DB2, >> >> Phil Zampino > >Select identified dupicates into a temporary table (SELECT UNIQUE INTO TEMP). >Then delete from the original where the record is the same as in TEMP. >This will rid ALL instances of the duplicates (including the original). >Then add records from TEMP into ORIG to restore one copy. > >Bipin Prasad bprasad@honcim1.honeywell.com Thanks; I got a lot of personal replies with some clever suggesstions to this problem. A similar solution is: Create temp_table like original_table Select distinct * from original_table into temp_table Delete from original_table Select * from temp_table into original_table. I am now the happy owner of a uniquely rowed table, and plan to put a unique index on the table to protect it from my own stupid actions. Thanks to all replying, Phil Zampino
beal@paladin.owego.ny.us (Alan Beal) (04/03/91)
tdoyle@cim-vax.honeywell.com writes: >In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes: >> I've made a nice mistake loading data twice over several days into a table, and >> would like to delete just one of the duplicate rows. >> >> Anyone have a nifty solution to this problem? The database manager is DB2, but >> I'd think that any SQL based language would have the same problem. I often use QMF as a tool for doing this, see below. >Select identified dupicates into a temporary table (SELECT UNIQUE INTO TEMP). How about in QMF: SELECT DISTINCT * FROM TABLE Then: SAVE DATA AS TEMP TEMP should now contain the unique rows. Note that QMF creates a table called TEMP in the tablespace specified by the SPACE parameter in your QMF profile. Typically, QMF's default tablespace is used by everyone, may contain many tables, and may be in serious need of a reorganization. >Then delete from the original where the record is the same as in TEMP. Or using my example, delete all the records in the original table using: DELETE FROM TABLE Can save time by using the LOAD utility with REPLACE and dummy input. Of course be careful if you have multiple tables in the tablespace. The DELETE will run faster if the tablespace is segmented. >This will rid ALL instances of the duplicates (including the original). >Then add records from TEMP into ORIG to restore one copy. Then reinsert all the data back in from TEMP: INSERT INTO TABLE SELECT * FROM TEMP Of course, it would be wise to backup your table before doing this. And finally, I assume you realize having an unique index would have prevented this problem in the first place. -- Alan Beal Internet: beal@paladin.Owego.NY.US USENET: {uunet,uunet!bywater!scifi}!paladin!beal
mspayer@PacBell.COM (Mike Payer) (04/03/91)
In article <91091.141528SYSPMZT@GECRDVM1.BITNET> SYSPMZT@gecrdvm1.crd.ge.com writes: >I've made a nice mistake loading data twice over several days into a table, and >would like to delete just one of the duplicate rows. The number of duplicates >is on the order of 10,000 long rows, and the table has 250,000 rows total, so >unloading, manipulating the data, and reloading the data is out of the SOME STUFF DELETEDhhsolution to this problem? The database manager is DB2, but > SOME STUFF DELETED >Phil Zampino I just did the same thing about a week ago. I am using informix isql and some 4gl. But what I did was run a select statement that gave me the row id number and then deleted the row id for the duplicate entries. This worked fine for me as I had about 1000 duplicate rows. Hope this helps ____________________________________________________________ /\ Michael S Payer Jr \ \_| Administrator FAX 415-867-0344 | | Emergency Control Center pacbell!pbecc!msp | | pacbell!pbhya!mspayer | | | | ____________SOMETIMES THE DRAGON WINS__________________|__ \_/_________________________________________________________/
drack@titan.tsd.arlut.utexas.edu (Dave Rackley) (04/03/91)
In article <1991Apr3.010838.2063@eng.ufl.edu> tsao@helios.tcad.ee.ufl.edu ( Tsao) writes: >I am now using Ingres, I am surprised that it allows the user to append >duplicated rows (including a null row) into a table unless you define a key >using create unique index. >Would anybody give me an example justifying the need for duplicated tuples >in a table ? I cannot justify the need for them, but in real-time data collection duplicates are often created by multiple sensors at a single source. It then becomes a data reduction/analysis issue to remove duplicates, while recording the fact that duplicate data captures occurred. Why let this happen? Our data is always dirty--we have to clean it up before the customer gets his hands on it! -- DISCLAIMER: I, hereby, disclaim any disclaimer except this disclaimer. +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+ | David Rackley | Now comes the really weird part...You know | | Applied Research Laboratories | the part where I know how to tap dance, but | | The University of Texas | only while wearing golf shoes... | | Austin, TX. 78758 | ...Ponderous, man, really ponderous! | +=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=+
tdoyle@cim-vax.honeywell.com (04/03/91)
In article <1991Apr3.010838.2063@eng.ufl.edu>, tsao@helios.tcad.ee.ufl.edu ( Tsao) writes: > In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes: > |> I've made a nice mistake loading data twice over several days into a table, and > ...... > > I am now using Ingres, I am surprised that it allows the user to append > duplicated rows (including a null row) into a table unless you define a key > using create unique index. > > Would anybody give me an example justifying the need for duplicated tuples > in a table ? > > In a "clean" database implementation there is NO excuse for such behavior, but alas it is simple/machine efficient? for DBMS vendors to do it this way. Since the process of relational modelling means that ultimately the tuple/relation is supposed to be in third normal form (or a close approximation) and thus should have one and only one set of attributes that uniquely identify the object. This set of attributes then should automatically have a unique index. None of the popular DBMS enforce this. Another problem with the popular DBMS's is that they don't treat null values correctly. If the attributes in the unique index are null, then multiple entries with null should be permitted (using the theory that null != null, i.e the two nulls are different). Bipin Prasad bprasad@honcim1.honeywell.com
sweiger@sequent.com (04/04/91)
In article <1991Apr3.010838.2063@eng.ufl.edu> tsao@helios.tcad.ee.ufl.edu writes: >I am now using Ingres, I am surprised that it allows the user to append >duplicated rows (including a null row) into a table unless you define a key >using create unique index. > >Would anybody give me an example justifying the need for duplicated tuples >in a table ? > > SQL specifically allows duplicate keys and nulls as attributes. The only way to force unique primary keys in SQL is to CREATE UNIQUE INDEX using that key, a construct that almost every SQL dialect supports. In theory, relational databases are not supposed to contain duplicate keys. In theory, there is always a primary key which identifies the tuple does not contain null attributes. SQL deviates from relational model in this regard, and many words of criticism have been written about this deviation from the model. It certainly does complicate RDBMS algorithms. And joins can produce some surprising results. But my personal opinion is this: A wise database designer makes sure that his database has a primary key. But the fact that SQL allows duplicates recognizes some legitimate, real world situations where it just isn't feasible, or perhaps desirable in a performance sense, to create a primary key for every tuple. --Mark
tdoyle@cim-vax.honeywell.com (04/04/91)
In article <DRACK.91Apr3083024@diablo.titan.tsd.arlut.utexas.edu>, drack@titan.tsd.arlut.utexas.edu (Dave Rackley) writes: > > I cannot justify the need for them, but in real-time data collection duplicates > are often created by multiple sensors at a single source. It then becomes a > data reduction/analysis issue to remove duplicates, while recording the fact > that duplicate data captures occurred. > > Why let this happen? Our data is always dirty--we have to clean it up before > the customer gets his hands on it! This may be a little esotoric, but I would suggest that the table is not in third normal form. If the table represents data measurements from various sensors, this would indicate that one attribute is missing from the table: i.e. sensor-id. If the table represents simply the sequential measurement then a date-time or a counter attribute may be required. Once the database "correctly" models the world, then it is quite possible, (when the user-view of the object does not include one of the primary identifiers), that the tuple presented in the view corresponds to more than one tuple in the original relation. But, this does not excuse the original deficiency the relational DBMSs.
mao@eden.Berkeley.EDU (Mike Olson) (04/04/91)
In <1991Apr3.010838.2063@eng.ufl.edu>, tsao@helios.tcad.ee.ufl.edu ( Tsao)
expresses surprise that ingres permits insertion of duplicate rows, and
asks for
> an example justifying the need for duplicated tuples in a table ?
the reason that all commercial systems allow you to insert duplicates is
that, unless an index exists on the table, every insertion turns into
a scan of the entire table. dups are bad and everyone admits that, but
nobody seems to be willing to pay the cost of guaranteeing they never happen.
we've also discovered (in postgres) that you don't always know what equality
means for a user-defined data type, so you can't in principle guarantee that
dups never happen without extra work on the part of the user. actually, it
turns out that type extensibility is pretty hard for a lot of reasons like
this, but that's off-topic...
mike olson
postgres research group
uc berkeley
mao@postgres.berkeley.edu
cdm@gem-hy.Inel.GOV (Dale Cook) (04/04/91)
In article <1991Apr3.085710.57@cim-vax.honeywell.com>, tdoyle@cim-vax.honeywell.com writes: |> In article <1991Apr3.010838.2063@eng.ufl.edu>, tsao@helios.tcad.ee.ufl.edu ( Tsao) writes: |> > In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes: |> > |> I've made a nice mistake loading data twice over several days into a table, and |> > ...... |> > |> > I am now using Ingres, I am surprised that it allows the user to append |> > duplicated rows (including a null row) into a table unless you define a key |> > using create unique index. |> > |> > Would anybody give me an example justifying the need for duplicated tuples |> > in a table ? |> > |> > |> |> In a "clean" database implementation there is NO excuse for such behavior, |> but alas it is simple/machine efficient? for DBMS vendors to do it this way. |> |> Since the process of relational modelling means that ultimately the |> tuple/relation is supposed to be in third normal form (or a close |> approximation) and thus should have one and only one set of attributes that |> uniquely identify the object. This set of attributes then should automatically |> have a unique index. None of the popular DBMS enforce this. |> Sorry, but I must disagree. Attributes do not _identify_ an instance of an entity, they merely describe it. A subtle, but important, distinction. There is nothing in data modelling that says multiple instances of an entity or relationship cannot have identical attributes. What data modelling says you MUST have is an instance of an entity uniquely identifiable by a "primary key". It is sufficient to have a simple sequence number to satisfy this rule. Relational DBMS systems satisfy this requirement for you. Suffice it to say that every row in the table has its own internal unique identifier. If the sequence number is unimportant information, why not let the DBMS keep track of it for you? |> Another problem with the popular DBMS's is that they don't treat null values |> correctly. If the attributes in the unique index are null, then multiple |> entries with null should be permitted (using the theory that null != null, i.e |> the two nulls are different). |> I don't follow this. |> Bipin Prasad bprasad@honcim1.honeywell.com ---------------------------------------------------------------------- --- Dale Cook "I was born at night, but by god, it wasn't cdm@inel.gov last night." The opinions are mine. The following disclaimer is my employers. ---------------------------------------------------------------------- ========== long legal disclaimer follows, press n to skip =========== Neither the United States Government or the Idaho National Engineering Laboratory or any of their employees, makes any warranty, whatsoever, implied, or assumes any legal liability or responsibility regarding any information, disclosed, or represents that its use would not infringe privately owned rights. No specific reference constitutes or implies endorsement, recommendation, or favoring by the United States Government or the Idaho National Engineering Laboratory. The views and opinions expressed herein do not necessarily reflect those of the United States Government or the Idaho National Engineering Laboratory, and shall not be used for advertising or product endorsement purposes.
miket@blia.sharebase.com (Mike Tossy) (04/05/91)
In article <1991Apr3.085710.57@cim-vax.honeywell.com>, tdoyle@cim-vax.honeywell.com writes: > In article <1991Apr3.010838.2063@eng.ufl.edu>, tsao@helios.tcad.ee.ufl.edu ( Tsao) writes: > > In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes: > > |> I've made a nice mistake loading data twice over several days into a table, and > > ...... > > > > Would anybody give me an example justifying the need for duplicated tuples > > in a table ? > > > > > > In a "clean" database implementation there is NO excuse for such behavior, > but alas it is simple/machine efficient? for DBMS vendors to do it this way. > > Since the process of relational modelling means that ultimately the > tuple/relation is supposed to be in third normal form (or a close > approximation) and thus should have one and only one set of attributes that > uniquely identify the object. This set of attributes then should automatically > have a unique index. None of the popular DBMS enforce this. ShareBase III (an ANSI SQL based system) does create a unique index (by default) on the primary key. Unfortunatly the ANSI Standard requires that duplicate rows be permitted - a really dumb idea - so we allow duplicate rows if the DBA requests (Not that there is much useful that can be done with them (dirty data excluded). ShareBase II is QUEL based, like Ingres and others, and only permits duplicates when there is no sorting (clustered) index. This is mostly an expense issue - finding duplicates without indices can be expensive. Lots of other things, like garbage collection, typically break when there is no clustered index. All our manuals and classes stress that a clustered index was required on every table for proper system operation. I'll guess this is a good rule for the other QUEL based systems. By the way, duplicates is one of the fundimental semantic differences between SQL based and QUEL based systems. QUEL defines duplicate tuples (rows) as having no meaning and the optimizer is allowed (encouraged?!) to delete duplicates. SQL says that duplicates have meaning (although it doesn't give you a way to manipulate them!) and the optimizer can only delete duplicates at user request (using the distinct clause). Thus projections which do not include the primary key (including most aggregates) can give very different results under SQL semantics and QUEL semantics. And, just because you are using SQL syntax does not mean you are getting SQL semantics. Most of the QUEL based systems long ago delivered SQL; but with QUEL semantics! It is the semantics (not the syntax) that keeps several of the vendors from passing the NIST ANSI SQL tests. > ... > > Bipin Prasad bprasad@honcim1.honeywell.com -- >>>>>> The above statements are only my opinions <<<<<< Mike Tossy ShareBase Coropration miket@sharebase.com 14600 Wichester Blvd (408) 378-7575 ext2200 Los Gatos, CA 95030 (ShareBase is a subsidiary of Teradata Corportation)
tdoyle@cim-vax.honeywell.com (04/05/91)
In article <1991Apr4.004438.15592@inel.gov>, cdm@gem-hy.Inel.GOV (Dale Cook) writes: > > Sorry, but I must disagree. Attributes do not _identify_ an instance of an > entity, they merely describe it. A subtle, but important, distinction. > There is nothing in data modelling that says multiple instances of an entity > or relationship cannot have identical attributes. Attributes are the ONLY way to describe an object. You may CHOOSE to use the rowid as an additional attribute for the sake of convenience (or in some extreme case that I have not encountered, a necessity), but don't try to pass that off as data modelling. Note that the person who originally posted the request has now created a unique index to avoid the problem. > > |> Another problem with the popular DBMS's is that they don't treat null values > |> correctly. If the attributes in the unique index are null, then multiple > |> entries with null should be permitted (using the theory that null != > null, i.e > |> the two nulls are different). > |> > > I don't follow this. > Most reasonable databases treat NULL as a special value for an attribute. It means that the value is UNKNOWN. This special value is different from 0 for numeric data and blank for strings. Thus if you had the following data in your table: Person balance ($) A 10 B 20 C 30 D NULL E NULL SELECT COUNT(*) WHERE balance > 0 will return 3 SELECT AVG(balance) will return 20 not 12 Since one null is not the same as another null, balance of D is not the same as the balance for D. This has implications on how JOINS betweens tables are executed, how selections are performed, how boolean logic is performed, etc. (NULL != NULL, result of comparision (NULL > NULL) is unknown). Most major relational databases do this well/correctly. WHERE THEY FAIL IS WHEN the attribute that has null values, is part of the primary key. Then they treat NULL as a single value (which is how NULL is represented internally in the relational DBMSs). This really hoses things. As an example: Following is a table where employee number is the key: emp-no name status salary ...etc (unique index) 1 A ... ............... 2 B .... ........... NULL C .................... New hire, emp-no pending NULL D .................... New hire, emp-no pending Or take this example, where purchase requisitions are assigned purchase order number: Tables: 1. Purchase Order (unique index POnumber) 2. Invoices (unique index POnumber, InvoiceNumber) Table 1. Purchase Order: POnumber Status amount vendor limit clause ... etc (unique index) 1 expire 20000 ................... 2 current 20000 ................... NULL new 20000 ................... POnumber unaasigned waiting NULL new 20000 ................... for signoff process ----------------------------------------------------------------- You get the picture. Bipin Prasad bprasad@honcim1.honeywell.com
rob.bbs@shark.cs.fau.edu (Robert Rittenhouse) (04/05/91)
> In article <1991Apr3.010838.2063@eng.ufl.edu> tsao@helios.tcad.ee.ufl.edu wri > >I am now using Ingres, I am surprised that it allows the user to append > >duplicated rows (including a null row) into a table unless you define a key > >using create unique index. > > > >Would anybody give me an example justifying the need for duplicated tuples > >in a table ? > > > > An example: I'm doing a parts explosion which consists of recursively searching a parts tree (a fun trick in and of itself) and inserting the components required in a parts required table. Sometimes two parts will use the same component--hence two identical rows will be inserted in the parts required table. Certainly could avoid doing so by having a sequence number or some such unique identifier but there seems no good reason to do so. (I fix the parts table when finished with the explosion, btw). Rob R.
clifton@egsner.cirr.com (Clifton Bean) (04/07/91)
In article <91091.141528SYSPMZT@GECRDVM1.BITNET>, SYSPMZT@gecrdvm1.crd.ge.com writes:
I've made a nice mistake loading data twice over several days into a table, and
would like to delete just one of the duplicate rows.
Anyone have a nifty solution to this problem? The database manager is DB2, but
I'd think that any SQL based language would have the same problem.
------------------------------------------------------------------------------
I use Informix. Their tables all contain a "hidden" index on the row_id of
the records within the table. In Informix, I would code the following:
Select rowid, * from table
where ... = ... (you supply the fieldname and contents)
You would then use one of the row_ids in your delete from table statement.
Check your documentation's index for a reference to this type of "automatic"
index - I would think this type of index is a logical part of any sql package.
--
*******************************************************************************
Clifton M. Bean USENET: clifton@lodestar.lonestar.org
Page & Addison, P.C., Dallas, TX UUCP: ...!egsner!lodestar!clifton
Work Phone: (214) 960-0933 (9-6 CDT) Home Phone: (214) 307-2242 (pm CDT)
pavlov@canisius.UUCP (Greg Pavlov) (04/08/91)
In article <1991Apr3.085710.57@cim-vax.honeywell.com>, tdoyle@cim-vax.honeywell.com writes: > > I am now using Ingres, I am surprised that it allows the user to append > > duplicated rows (including a null row) into a table unless you define a key > > using create unique index. > > > In a "clean" database implementation there is NO excuse for such behavior, > but alas it is simple/machine efficient? for DBMS vendors to do it this way. > Ignoring the "theoretical" argument of what an "attribute" really is, I would argue very strongly that I do NOT want my DBMS to make such assumptions. INGRES generally provides one with the tools to enforce whatever "view" of "good" database "practice" one may have and one can use them accordingly. But that does not mean that a given person's "view" is useful, optimal, or even "correct". In our shop, for instance, all tables are required to have unique keys. But if we receive a m{ae}ss of data from elsewhere that has to be manipulated, massaged, etc, to get it into a workable form, we first load it in "raw" into temporary tables to see what we actually have and to use INGRES itself to perform the reformatting/transformation. During this work, I don't want INGRES to make any other-view presuppositions about what we should/should not be doing. greg pavlov, fstrf, amherst, ny pavlov@stewart.fstrf.org
cdm@gem-hy.Inel.GOV (Dale Cook) (04/08/91)
In article <1991Apr5.074844.59@cim-vax.honeywell.com>, tdoyle@cim-vax.honeywell.com writes: |> In article <1991Apr4.004438.15592@inel.gov>, cdm@gem-hy.Inel.GOV (Dale Cook) writes: |> > |> > Sorry, but I must disagree. Attributes do not _identify_ an instance of an |> > entity, they merely describe it. A subtle, but important, distinction. |> > There is nothing in data modelling that says multiple instances of an entity |> > or relationship cannot have identical attributes. |> |> Attributes are the ONLY way to describe an object. You may CHOOSE to use the ^^^^^^^^ Isn't that what I said? |> rowid as an additional attribute for the sake of convenience (or in some |> extreme case that I have not encountered, a necessity), but don't try |> to pass that off as data modelling. You're right; what I meant to do was make a distinction between describing an instance of an object and identifying it. Obviously, in order to make use of rowid as the primary key, one should properly model the data in order to come to the conclusion that a sequence number is an appropriate primary key. My implication was that the primary key was not an attribute; that implication is false. But we digress. The original question is whether a true RDBMS should allow rows with identical attributes. I still claim it should. As long as the DBMS internally identifies the row uniquely, the implementation violates no rules of normalization. As long as the DBMS allows you the ability to create unique keys via some mechanism, the implementor has no problem. To enforce uniqueness between all identified attributes unnecessarily removes a perfectly valid option to a database implementor. As a matter of practicality as well, why should I pay for the DBMS to ensure every row has at least one attribute with a different value, when I don't need it? What you are asking for is a unique index on the entire row. Is that what you want? I really doubt it. |> |> Note that the person who originally posted the request has now created a |> unique index to avoid the problem. |> Which is exactly what should have been done. The real problem is not with the DBMS; it is with unfounded expectations about the behavior of the DBMS. ---------------------------------------------------------------------- --- Dale Cook "You can sum this game up in one word - cdm@inel.gov 'you never know'". --- J. Andujar The opinions are mine. The following disclaimer is my employers. ---------------------------------------------------------------------- ========== long legal disclaimer follows, press n to skip =========== Neither the United States Government or the Idaho National Engineering Laboratory or any of their employees, makes any warranty, whatsoever, implied, or assumes any legal liability or responsibility regarding any information, disclosed, or represents that its use would not infringe privately owned rights. No specific reference constitutes or implies endorsement, recommendation, or favoring by the United States Government or the Idaho National Engineering Laboratory. The views and opinions expressed herein do not necessarily reflect those of the United States Government or the Idaho National Engineering Laboratory, and shall not be used for advertising or product endorsement purposes.
davek@informix.com (David Kosenko) (04/09/91)
In article <1991Apr7.054945.8610@egsner.cirr.com> clifton @egsner.cirr.com (Clifton Bean) writes: > >I use Informix. Their tables all contain a "hidden" index on the row_id of >the records within the table. In Informix, I would code the following: Actually, we do not have any "hidden" indexes, though it is very commonly misunderstood that we do. Rowid is best described as an l-value, or the location of a particular record in a database table. With SE databases (using a Unix file for the data rows), the rowid represents the offset into the file where the row is found. With Turbo/OnLine, it provides a page number/ slot table number combination, which is used in a similar fashion to locate the row in question. In no case is there any true index built on this rowid value. You can use the rowid in a WHERE clause, in which case you are asking for the row at that particular location. Dave Kosenko Informix Client Services -- Disclaimer: These opinions subject to change without notice. ************************************************************************** The heart and the mind on a parallel course, never the two shall meet. -E. Saliers
cakers@oracle.com (Charles Akers) (04/10/91)
The various methods described in the articles on deleting duplicate rows in a SQL database table do not do it in a single SQL statement. If the table involved has a keyfield and there are duplicates of keys, the following SQL statement will delete all rows except the one row for each keyfield which has the lowest rowid: If the table is named table_X and the keyfield is named keycol: DELETE FROM table_X t_alias WHERE ROWID > (SELECT MIN(ROWID) FROM table_X WHERE keycol=t_alias.keycol); This statement will delete extra rows with the same keyfield even if the values in other (non-key) columns are not the same. That is not a problem if you have just inserted the same rows twice and want only one copy of each row. If there is no column in the table which was unique prior to inserting the duplicates, the following 3 statements will eliminate duplicates of rows which have identical values in all columns (requires your SQL to have a RENAME command). CREATE TABLE tmp_tab AS SELECT DISTINCT * FROM table_X; DROP TABLE table_x; RENAME tmp_tab TO table_X; ------------- Charles Akers -------------