saunders@refine.enet.dec.com (Kyle Saunders) (05/13/91)
Hi folks, I was wondering how you would implement cursor based UPDATE's and DELETE's. I have given some thought to this and I cannot see how the RDBMS can properly update the correct row given the fact that the table to be updated may not have _any_ indexes, and not all fields are guarenteed to be included in the query. This means there is no way to uniquely determine which row in the original base table is being updated. - Kyle saunders@refine.enet.dec.com
tmp@mentor.cc.purdue.edu (Tom Putnam) (05/13/91)
In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saunders) writes: > > I was wondering how you would implement cursor based UPDATE's and >DELETE's. > > I have given some thought to this and I cannot see how the RDBMS can >properly update the correct row given the fact that the table to be updated >may not have _any_ indexes, and not all fields are guarenteed to be included >in the query. This means there is no way to uniquely determine which row in >the original base table is being updated. I recently encountered this problem in a slightly different context: I wanted to do an "ordered update". My database keeps a list of people who have registered for a course and a counter in each record which denotes the order of registration. The permits me to implement a "waiting list" when a course fills. In this context, I can't declare the cursor for UPDATE since I want it ordered. The work-around was to recognize that (at least under INFORMIX), there is a field called ROWID for each row. I can select it, then use it to give me an "absolute pointer" to a record. Here is my code sequence using INFORMIX 4GL: declare reseq_c cursor for select rowid, * from what where what.sid = this_sid and what.cid = this_cid order by internal, regdate, regseq open reseq_c let i = 1 foreach reseq_c into this_row, wtmp.* update what set regseq = i where rowid = this_row let i = i + 1 end foreach close reseq_c Of course, one should be absolutely paranoid about ever keeping a ROWID around for any longer than this sort of use. Systems need to be able to physically move rows to other locations as the database is updated. -- Tom Putnam Assistant Director Internet: tmp@mentor.cc.Purdue.EDU Purdue University Computing Center BITNET: TMP@PURCCVM Mathematical Sciences Bldg. Phone: (317) 494-1787 West Lafayette, IN 47907
saunders@refine.enet.dec.com (Kyle Saunders) (05/14/91)
In article <12260@mentor.cc.purdue.edu>, tmp@mentor.cc.purdue.edu (Tom Putnam) writes... |In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com |(Kyle Saunders) writes: |> |> I was wondering how you would implement cursor based UPDATE's and |>DELETE's. |> |> I have given some thought to this and I cannot see how the RDBMS can |>properly update the correct row given the fact that the table to be updated |>may not have _any_ indexes, and not all fields are guarenteed to be included |>in the query. This means there is no way to uniquely determine which row in |>the original base table is being updated. | |I recently encountered this problem in a slightly different context: I wanted |to do an "ordered update". My database keeps a list of people who have |registered for a course and a counter in each record which denotes the order of |registration. The permits me to implement a "waiting list" when a course |fills. In this context, I can't declare the cursor for UPDATE since I want it |ordered. The work-around was to recognize that (at least under INFORMIX), |there is a field called ROWID for each row. I can select it, then use it to |give me an "absolute pointer" to a record. Here is my code sequence using [ INFORMIX code deleted ] | |Of course, one should be absolutely paranoid about ever keeping a ROWID around |for any longer than this sort of use. Systems need to be able to physically |move rows to other locations as the database is updated. |-- |Tom Putnam Assistant Director Well, your solution is exactly what I am trying to avoid. I'm _writing_ the SQL RDBMS, so an SQL work-around is not what I need. It looks like I will have to require an index to be created on at least one of the field in the query definition of the cursor. I want to avoid such kludges as the "hidden" ROWID column that INFORMIX evidently has. It doesn't provide for the physical abstraction that the Relational Model is supposed to have. I suppose you could say that requiring an index also violates the relational model, but something tells me that when you do: DECLARE cursor-name CURSOR FOR select-query FOR UPDATE [optional-fields] most RDBMS's will create an index on the fly if one doesn't exist on the fields to update. I would do this too, but I don't implement indexes yet. - Kyle
jfr@locus.com (Jon Rosen) (05/14/91)
In article <12260@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam) writes: >In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com >(Kyle Saunders) writes: >> I was wondering how you would implement cursor based UPDATE's and >>DELETEs. I have given some thought to this and cannot see how the RDBMS can >>properly update the correct row given the fact that the table to be updated >>may not have _any_ indexes, and not all fields are guarenteed to be included >>in the query. This means there is no way to uniquely determine which row in >>the original base table is being updated. ><some stuff deleted...> >.... The work-around was to recognize that (at least under INFORMIX), >there is a field called ROWID for each row. I can select it, then use it to >give me an "absolute pointer" to a record. ><code deleted...> >Of course, one should be absolutely paranoid about ever keeping a ROWID around >for any longer than this sort of use. Systems need to be able to physically >move rows to other locations as the database is updated. Kyle questions how cursor-based UPDATEs and DELETEs can be implemented. If this refers to how an RDBMS implements it, the answer is related to Tom's response... I.e., internally, the RDBMS must keep some kind of a relative row id that it uses to link the cursor-retrieved row back to the actual data row. In fact, this implementation is quite common. It even leads to those databases such as Informix and Oracle actually materializing the ROWID into a user-accessible column which can be used to do direct-access updates against rows without using RDBMS cursors and the UPDATE/DELETE WHERE CURRENT OF statement. Also, it is not as bad as Tom thinks to keep using this ROWID. Usually, the ROWID is a logical construction and has nothing (or little) to do with the physical location/inode/record number of the actual row. So moving the row will generally not change its ROWID. I have seen an application in Oracle that even uses ROWID to build links between tables (UGGGH) in order to improve performance. This requires that the ROWID be persistent. There can be difficulties if rows are deleted and the associated ROWID references in other linked tables are not (a "referential integrity" issue) since ROWIDs can be reissued once the original row is no longer available. A better solution would be a fine granularity TIMESTAMP... In fact, Postgres uses this solution to individualize rows and to allow for update/delete with full history. All versions of rows are kept (until explicitly purged) and you can browse through outdated versions of rows at will. Of course, the default is to look at the current version of a row. Unfortunately, the ROWID solution begs several questions. First of all, a materialized ROWID is by no means universal. Most RDBMS (including DB2, OS/2 EE, RDB, Ingres, etc.) do not have such an abstraction. In addition, neither ROWID nor updatable cursors deal with data integrity across COMMITs. When data is being updated on-line, it is not advisable to hold a cursor open across the workstation or terminal i/o (otherwise locks are kept for long periods of time). Thus, if you use a cursor to select some rows for display, you still should COMMIT prior to displaying them. Now, your updatable cursor is no longer updatable. In fact, you will either have to reread the rows with a new updatable cursor or use a direct UPDATE. Since you have no lock on the data while you are using it on-screen, someone else can come along, read and modify the row without you knowing it. Once again, either a very fine granularity TIMESTAMP or an update/change count column is needed to ensure that your update does not screw up some other data that has been changed since your read. Alternatively, you have to do an UPDATE with a WHERE clause on every column of the TABLE giving the original value in order to make sure that the row has not been changed (bad idea from a performance standpoint). As for the duplicate row issue, don't you know that relational data bases aren't supposed to have duplicate rows??? :-) Jon Rosen
saunders@refine.enet.dec.com (Kyle Saunders) (05/14/91)
In article <24334@oolong.la.locus.com>, jfr@locus.com (Jon Rosen) writes... |In article <12260@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam) writes: |>In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com |>(Kyle Saunders) writes: |>> I was wondering how you would implement cursor based UPDATE's and |>>DELETEs. I have given some thought to this and cannot see how the RDBMS can |>>properly update the correct row given the fact that the table to be updated |>>may not have _any_ indexes, and not all fields are guarenteed to be included |>>in the query. This means there is no way to uniquely determine which row in |>>the original base table is being updated. |><some stuff deleted...> |>.... The work-around was to recognize that (at least under INFORMIX), |>there is a field called ROWID for each row. I can select it, then use it to |>give me an "absolute pointer" to a record. |><code deleted...> |>Of course, one should be absolutely paranoid about ever keeping a ROWID around |>for any longer than this sort of use. Systems need to be able to physically |>move rows to other locations as the database is updated. | |Kyle questions how cursor-based UPDATEs and DELETEs can be implemented. |If this refers to how an RDBMS implements it, the answer is related |to Tom's response... I.e., internally, the RDBMS must keep some kind of |a relative row id that it uses to link the cursor-retrieved row back to |the actual data row. In fact, this implementation is quite common. It |even leads to those databases such as Informix and Oracle actually |materializing the ROWID into a user-accessible column which can be |used to do direct-access updates against rows without using RDBMS cursors |and the UPDATE/DELETE WHERE CURRENT OF statement. Keeping an explicit pointer for each row generated by an OPEN cursor to the original base table is a _major_ kludge IMHO. That's special casing this action. What's worse is letting users access this ROWID. If ROWID is provided in any SELECT, why?!? Users will abuse this bit of information. The user should not see the underlying physical implementation of the Relational Model, if at all possible. |Also, it is not as bad as Tom thinks to keep using this ROWID. Usually, |the ROWID is a logical construction and has nothing (or little) to do |with the physical location/inode/record number of the actual row. So How is it a logical construction? How is it safe to be anything _but_ the actual record number (guarenteed to be unique without having to look-up or generate anything)? [ stuff about integrity deleted ] |As for the duplicate row issue, don't you know that relational data |bases aren't supposed to have duplicate rows??? :-) I never mentioned duplicate rows. |Jon Rosen - Kyle
jfr@locus.com (Jon Rosen) (05/15/91)
In article <22637@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saunders) writes: >|> I was wondering how you would implement cursor based UPDATE's and >|>DELETE's. >|> <PREVIOUS POSTS DELETED> > Well, your solution is exactly what I am trying to avoid. I'm >_writing_ the SQL RDBMS, so an SQL work-around is not what I need. It looks >like I will have to require an index to be created on at least one of the >field in the query definition of the cursor. > I want to avoid such kludges as the "hidden" ROWID column that >INFORMIX evidently has. It doesn't provide for the physical abstraction that >the Relational Model is supposed to have. I suppose you could say that >requiring an index also violates the relational model, but something tells me >that when you do: >DECLARE cursor-name CURSOR FOR select-query FOR UPDATE [optional-fields] >most RDBMS's will create an index on the fly if one doesn't exist on the >fields to update. I would do this too, but I don't implement indexes yet. I see what you are looking for now. You need an internal implementation. Your statement about the relational model requiring physical abstration is off-target. The relational model says nothing about the physical implementation and in fact, there is absolutely nothing about the "right" and "wrong" ways to implement a relational data base. The relational mode er model requires complete logical abstraction. The user's VIEW of the data must be relational. The underlying implementation can do ANYTHING it wants or needs in order to create a complete and potentially efficient view of the relational model. In fact, this almost ALWAYS involves internal tuple ids or row ids. The issue is whether these internal ids are externally manifested (this would appear to be in violation of the relational model although some argue otherwise). REQUIRING an index violates the relational model (you should always be able to do any operation solely by reference to the table's data). However, REQUIRING a PRIMARY KEY is, in fact, intrinsic in the relational model. The relational model DOES NOT ALLOW FOR DUPLICATE ROWS IN TABLES. Thus, every table ALWAYS has a primary key, even if it consists of every attribute (column) of the row. Keep in mind that PRIMARY KEYS are NOT INDEXES (even though almost every physical implementation that enforces primary keys implements them using some kind of indexing or hashing mechanism for performance). Your choices for uniquely identifying rows are limited to either the primary key (no matter how many columns that involves) or an arbitrary row id that you keep for each row. If you are going to allow duplicate rows in your table, you can ONLY use a row id since there is no way to know which duplicate row is which since there is no primary key where there are duplicate rows. In effect, the row id becomes an arbitrary primary key. No matter how you dice and slice it, that's all there is. Jon Rosen
jfr@locus.com (Jon Rosen) (05/15/91)
In article <22653@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saunders) writes: > >In article <24334@oolong.la.locus.com>, jfr@locus.com (Jon Rosen) writes... >|In article <12260@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam) writes: >|>In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com >|>(Kyle Saunders) writes: >|>> I was wondering how you would implement cursor based UPDATE's and >|>>DELETEs. I have given some thought to this and cannot see how the RDBMS can >|>>properly update the correct row given the fact that the table to be updated >|>>may not have _any_ indexes, and not all fields are guarenteed to be included >|>>in the query. This means there is no way to uniquely determine which row in >|>>the original base table is being updated. >|><lotsa stuff deleted...> > > Keeping an explicit pointer for each row generated by an OPEN cursor >to the original base table is a _major_ kludge IMHO. That's special casing >this action. What's worse is letting users access this ROWID. If ROWID is >provided in any SELECT, why?!? Users will abuse this bit of information. The >user should not see the underlying physical implementation of the Relational >Model, if at all possible. Why do you feel this is a kludge? You have to do SOMETHING! In another post you mention creating some kind of index on the fly when a DECLARE CURSOR is used with a FOR UPDATE clause .... Well, what else is an index but a key plus a rowid (albeit a strictly internal one) which lets you get back to the original row? If you don't have the primary key (i.e., whether that is a subset of the columns or the complete set of columns is irrelevant as long as you have a primary key and no duplicate rows) available in your query, you either have to grab a unique row reference and include it with each row (call it what you will, a row id, a pointer, a link, I don't care it still gets you back to the row) or you have to supplement the query to include the entire primary key. If the primary key has lots of columns (sometimes the whole row) this can certainly degrade performance more than keeping a row id. And I don't think it is any less a kludge than the row id. > >|Also, it is not as bad as Tom thinks to keep using this ROWID. Usually, >|the ROWID is a logical construction and has nothing (or little) to do >|with the physical location/inode/record number of the actual row. So > > How is it a logical construction? How is it safe to be anything _but_ >the actual record number (guarenteed to be unique without having to look-up or >generate anything)? Hmmm, I have to go back and look at my original post. What I was trying to convey is that a ROWID in Oracle is a logical construct, not necessarily a physical one. I believe it is a uniquely generated sequential number that is guaranteed not to change even if the row is moved. If this is not the case (my Oracle recollections may be fuzzy) it certainly SHOULD be. My TIMESTAMP rowid would be similar. This is as opposed to using something like the sector-offset value for the location of the row on disk as the rowid. Obviously, this is fraught with peril since the row can move somewhere else and its internal id can then change. But I agree I may have been unclear on this point. >|As for the duplicate row issue, don't you know that relational data >|bases aren't supposed to have duplicate rows??? :-) > > I never mentioned duplicate rows. > Oh... I thought you did... I looked back and you are right, you didnot. If that is the case, you have an implicit primary key in all tables and you are home free either with a rowid or by including all values of the primary key for each table in the query. Any other approach is NOT supported by the relational model. That is actually the main point. The relational model expressly supports updating ONLY through the use of the primary key which means you have to have one and it has to be available at the time of the update. The only problem with that solution is that sometimes the performance using the primary key sucks. Thus, the use of rowids (kludgy or otherwise). Jon Rosen No flames intended :-)
saunders@refine.enet.dec.com (Kyle Saunders) (05/15/91)
In article <22637@shlump.lkg.dec.com> jfr@locus.com (Jon Rosen) writes: |In article <22637@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saunder s) writes: |>|> I was wondering how you would implement cursor based UPDATE's and |>|>DELETE's. |>|> <PREVIOUS POSTS DELETED> |>like I will have to require an index to be created on at least one of the |>field in the query definition of the cursor. |> I want to avoid such kludges as the "hidden" ROWID column that |>INFORMIX evidently has. It doesn't provide for the physical abstraction that |>the Relational Model is supposed to have. I suppose you could say that |>requiring an index also violates the relational model, but something tells me |>that when you do: |>DECLARE cursor-name CURSOR FOR select-query FOR UPDATE [optional-fields] |>most RDBMS's will create an index on the fly if one doesn't exist on the |>fields to update. I would do this too, but I don't implement indexes yet. |I see what you are looking for now. You need an internal implementation. |Your statement about the relational model requiring physical abstration |is off-target. The relational model says nothing about the physical |implementation and in fact, there is absolutely nothing about the "right" No, I was stating that I felt allowing the user to see the internal row-id, such that it is, would be violating the Relational Model. You are right that the Relational Model says nothing about the physical representation, except that the user should _never_ see it. |and "wrong" ways to implement a relational data base. The relational mode |er model requires complete logical abstraction. The user's VIEW of the |data must be relational. The underlying implementation can do ANYTHING |it wants or needs in order to create a complete and potentially efficient |view of the relational model. In fact, this almost ALWAYS involves |internal tuple ids or row ids. The issue is whether these internal ids |are externally manifested (this would appear to be in violation of the |relational model although some argue otherwise). REQUIRING an index This is what I was referring to above. |violates the relational model (you should always be able to do any operation |solely by reference to the table's data). However, REQUIRING a PRIMARY KEY I know that requiring requiring an index would violate the Relational Model. That is why I did not want to do that. That is why I suggested that perhaps the RDBMS should just create the index on the fly, _transparent_ to the user, and drop it when finished. |is, in fact, intrinsic in the relational model. The relational model |DOES NOT ALLOW FOR DUPLICATE ROWS IN TABLES. Thus, every table ALWAYS |has a primary key, even if it consists of every attribute (column) |of the row. Keep in mind that PRIMARY KEYS are NOT INDEXES (even though |almost every physical implementation that enforces primary keys implements |them using some kind of indexing or hashing mechanism for performance). Thanks for the lecture. I already knew all that. |Your choices for uniquely identifying rows are limited to either the |primary key (no matter how many columns that involves) or an arbitrary |row id that you keep for each row. If you are going to allow duplicate |rows in your table, you can ONLY use a row id since there is no way to |know which duplicate row is which since there is no primary key where |there are duplicate rows. In effect, the row id becomes an arbitrary |primary key. No matter how you dice and slice it, that's all there is. | |Jon Rosen |< You are missing the point, I think. When you create a temporary table with DECLARE cursor CURSOR FOR query; OPEN cursor; the columns in the above query may not include all columns or the primary key. Therefore, in order to find the row in the base table that corresponds to the row in the temporary table, you need to have an index on one of the fields that is included in the query. - Kyle
saunders@refine.enet.dec.com (Kyle Saunders) (05/15/91)
In article <22653@shlump.lkg.dec.com> jfr@locus.com (Jon Rosen) writes: |In article <22653@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saund` s) writes: |> |>In article <24334@oolong.la.locus.com>, jfr@locus.com (Jon Rosen) writes... |>|In article <12260@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam` writes: |>|>In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com |>|>(Kyle Saunders) writes: |>|>> I was wondering how you would implement cursor based UPDATE's and |>|>>DELETEs. I have given some thought to this and cannot see how the RDBMS can |>|>>properly update the correct row given the fact that the table to be updated |>|>>may not have _any_ indexes, and not all fields are guarenteed to be includ` |>|>>in the query. This means there is no way to uniquely determine which row ` |>|>>the original base table is being updated. |>|><lotsa stuff deleted...> |> |> Keeping an explicit pointer for each row generated by an OPEN cursor |>to the original base table is a _major_ kludge IMHO. That's special casing |>this action. What's worse is letting users access this ROWID. If ROWID is |>provided in any SELECT, why?!? Users will abuse this bit of information. The |>user should not see the underlying physical implementation of the Relational |>Model, if at all possible. | | Why do you feel this is a kludge? You have to do SOMETHING! In another | post you mention creating some kind of index on the fly when a DECLARE | CURSOR is used with a FOR UPDATE clause .... Well, what else is an | index but a key plus a rowid (albeit a strictly internal one) which lets | you get back to the original row? If you don't have the primary key (i.e., | whether that is a subset of the columns or the complete set of columns | is irrelevant as long as you have a primary key and no duplicate rows) | available in your query, you either have to grab a unique row reference | and include it with each row (call it what you will, a row id, a pointer, | a link, I don't care it still gets you back to the row) or you have to | supplement the query to include the entire primary key. If the primary | key has lots of columns (sometimes the whole row) this can certainly | degrade performance more than keeping a row id. And I don't think it | is any less a kludge than the row id. I don't see a problem with creating an index on the fly. This should fit nicely within the overall design of the RDBMS. I had a problem with adding a "hidden" column to store the base-table row id to the newly create temporary table. This wouldn't be done in any other case. That's the kludgy part I was referring to. [ stuff deleted that doesn't matter anymore :-) ] | |>|As for the duplicate row issue, don't you know that relational data |>|bases aren't supposed to have duplicate rows??? :-) |> |> I never mentioned duplicate rows. |> | | Oh... I thought you did... I looked back and you are right, you didnot. | If that is the case, you have an implicit primary key in all tables | and you are home free either with a rowid or by including all values | of the primary key for each table in the query. Any other approach | is NOT supported by the relational model. That is actually the main | point. The relational model expressly supports updating ONLY through | the use of the primary key which means you have to have one and it | has to be available at the time of the update. The only problem | with that solution is that sometimes the performance using the | primary key sucks. Thus, the use of rowids (kludgy or otherwise). | Ok, if it is necessary to have the primary key in the query that defines the cursor and it does not violate the Relational Model by requiring it, then that is the answer to my question. You are correct in that the Relational Model only supports unique address of rows through the primary key. Sigh. I guess I should have thought of that earlier. I just wanted to be as flexible as possible. That, and the fact that I haven't implemented primary/foreign key support yet (working on the precompiler/development tools). | Jon Rosen | No flames intended :-) No flames here either, previous post included (I hope :-). - Kyle
billc@ingres.com (Bill Coffin, x3387) (05/16/91)
> As for the duplicate row issue, don't you know that relational data > bases aren't supposed to have duplicate rows??? :-) But a cursor can be opened on a query that returns duplicate rows. There are other problems in UPDATE/DELETE through a cursor. That's why ANSI SQL86 has some very strict rules about updatability of cursors. For instance, what does it mean to update a cursor if there's an aggregate function in the query? Or any data derived by mathematical operations, or, in fact, any data that is not a 1-1 representation of the data in the queried table? You generally can't update/delete through a join, but that's a whole 'nother can of worms. In fact, cursor UPDATE/DELETE has all the problems of updating through views, and then some more. Another anomaly to consider is the ORDER BY. What does it mean to update one of the fields that you sorted on? (SQL86 gets around this by stating that the query produces a "snapshot" of the data, which you can then perturb. This model is insufficient for solving many UPDATE/DELETE ambiguities, but may be enough for the ORDER BY problem. Just look out for the "Halloween Problem"!) I could go on. I once implemented a cursor interface to an SQL dialect that didn't support cursors. I found it was very difficult unless I enforced the existence of a unique index. (The rowid was not reliable in this particular environment.) A unique ID does solve a large number of problems. -- Bill Coffin, aka billc@ingres.com FETCH INTO :standard_disclaimer, :gratuitous_witticism, :obscure_quote
michael@cutler.uucp (05/16/91)
In article <24334@oolong.la.locus.com>, jfr@locus.com (Jon Rosen) writes: > ... Usually, the ROWID is a logical construction and has nothing (or > little) to do with the physical location/inode/record number of the > actual row. So moving the row will generally not change its ROWID. ... > > Unfortunately, the ROWID solution begs several questions. First of all, > a materialized ROWID is by no means universal. Most RDBMS (including > DB2, OS/2 EE, RDB, Ingres, etc.) do not have such an abstraction. ... Just for the record, INGRES does have (and has always had) such an abstraction. Each row in an INGRES table or index has a TID, or tuple identifier. Use it as you would any other column. If memory serves, the TID is indeed related to physical location. The TID is something to the effect of the high order m bits is the INGRES page within the table the row is located, and the low order n bits is the index of the row within the INGRES page. Michael D. Shields {UUCP: ...rutgers!ogicse!orstcs!cutler!michael} AVP, MIS {Internet: michael%cutler.uucp@cs.orst.edu} Cutler & Company, Inc. Medford, OR 97504 (503) 770-9000
rap@n5pph007.uucp (Allen Pippin,RTP,8189,) (05/16/91)
Many relational managers make use of a "full" scan of some sort. For example SQL/DS has the dreaded "dbspace scan", DB2 and Oracle have the tablespace scan. These functions make a complete pass over all data in the table applying update/delete to those rows which match the predicate. In fact, in certain cases the query optimizers for SQL/DS, DB2, and Oracle will pass up the use of an index an use the full scan method to reduce total I/O to satisfy a request. By the way, the ROWID concept is in every RDBMS I am aware of, but it is often hidden and cannot be accessed via the SQL language. Hope this helps... ------------------------------------------------------------------------------ Allen Pippin | Reply to: Dept 1165, Northern Telecom, Inc. | rap%n5pph007@rti.rti.org RTP NC, 27709 |
davidm@uunet.UU.NET (David S. Masterson) (05/16/91)
>>>>> On 14 May 91 18:47:57 GMT, jfr@locus.com (Jon Rosen) said: Jon> In article <22637@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saunders) writes: Kyle> |>I was wondering how you would implement cursor based UPDATE's and Kyle> |>DELETE's. Kyle> |> <PREVIOUS POSTS DELETED> Kyle> Well, your solution is exactly what I am trying to avoid. I'm _writing_ Kyle> the SQL RDBMS, so an SQL work-around is not what I need. It looks like Kyle> I will have to require an index to be created on at least one of the Kyle> field in the query definition of the cursor. Kyle> I want to avoid such kludges as the "hidden" ROWID column that INFORMIX Kyle> evidently has. It doesn't provide for the physical abstraction that the Kyle> Relational Model is supposed to have. Jon> I see what you are looking for now. You need an internal implementation. Jon> Your statement about the relational model requiring physical abstration Jon> is off-target. The relational model says nothing about the physical Jon> implementation and in fact, there is absolutely nothing about the "right" Jon> and "wrong" ways to implement a relational data base. This is the meaning of physical abstraction, so Kyle's point is not off target. Kyle's point is that ROWID is a physical implementation in that it places an order on relations which are, by definition, unordered. Jon> The relational model requires complete logical abstraction. The user's Jon> VIEW of the data must be relational. The underlying implementation can Jon> do ANYTHING it wants or needs in order to create a complete and Jon> potentially efficient view of the relational model. In fact, this almost Jon> ALWAYS involves internal tuple ids or row ids. The issue is whether Jon> these internal ids are externally manifested Exactly. What the relational system does internally is its own business, but when those internal decisions begin manifesting themselves on the interface, then some aspect of the relational model is lost. See Codd's new book. -- ==================================================================== David Masterson Consilium, Inc. (415) 691-6311 640 Clyde Ct. uunet!cimshop!davidm Mtn. View, CA 94043 ==================================================================== "If someone thinks they know what I said, then I didn't say it!"
jfr@locus.com (Jon Rosen) (05/17/91)
In article <1991May15.171152.557@ingres.Ingres.COM> billc@ingres.com (Bill Coffin, x3387) writes: >> As for the duplicate row issue, don't you know that relational data >> bases aren't supposed to have duplicate rows??? :-) > >But a cursor can be opened on a query that returns duplicate rows. > Ah, but as I will illustrate, in a proper implementation of the relational model, a cursor can NOT be used to update rows if duplicates are returned. >There are other problems in UPDATE/DELETE through a cursor. No sh*t :-)... That's why "updatable cursors" are such a drag. Let's not forget that updatable cursors are NOT part of the relational model but are in fact an arbitrary construct imposed on implementations due to the impedance mismatch between a set-oriented language like SQL and a record-at-a-time-oriented language like COBOL or C. In addition, (and probably most likely), IBM, when implementing SQL, had to pay attention to the needs of its most "important" users, i.e., the MIS shops, with their oodles of millions of lines of COBOL code with VSAM and QSAM data. Most of these programs do things like loop through a file and update in place those rows that need to be updated. Updatable cursors give you a construct that allows you to convert to SQL without as much pain as if you started requiring primary key retrieval on every selection. ><stuff deleted> >You generally can't update/delete through a join, but that's >a whole 'nother can of worms. In fact, >cursor UPDATE/DELETE has all the problems of >updating through views, and then some more. Most of this is related to the fact that IBM's SQL (and most others) do not enforce the concept of a primary key. >A unique ID does solve a large number of problems. Here is a quote from Chris Date's book "A Guide To The SQL Standard, Second Edition" (Addison-Wesley 1989). Appendix F.9 - Aspects of the Relational Model Not Supported "Primary keys provide the SOLE (emphasis added) record-level addressing mechanism within the relational model. That is, the ONLY (Date's emphasis) system-guaranteed method of identifying an individual record is via the combination (R,k) where R is the name of the relation and k is the primary key value for the record concerned. Every relation (to BE a relation (Date's emphasis)) is required to have a primary key. Primary keys are (of course) required to be unique; in the case of real (i.e., base) relations, they are also required to be wholly non-null. Comment: In fact, of course, the relational model requires every relation to satisfy the requirement that there be no duplicate rows (this is implied by the primary key requirement). The fact that SQL does permit duplicate rows should be regarded as another grave mistake in the original design of the language. End comment. SQL currently provides the mechanisms that allow users to apply the primary key discipline for themselves, but it does not understand the semantics associated with that discipline. As a result, SQL support for certain other functions is either deficient or lacking entirely, as I now explain. <some stuff deleted here> 3. An understanding of primary keys is required in order to support the updating of views correctly. SQL's rules for updating of views are in fact disgracefully ad hoc. <stuff deleted> (a) Projections are logically updatable if and only if they preserve the primary key of the underlying relation. However, SQL supports updates not on projections per se, but on what might be called column subsets where a column subset is any subset of the columns of the underlying table for which duplicate elimination is NOT requested - with a "user beware" if that subset does not in fact include the underlying primary key. (b) Any restriction should be logicially updatable. SQL however, does not permit such updates if duplicate elimination is requested. <lotsa stuff deleted here> (c) A join of two tables on their primary keys should always be updatable.... However SQL does not currently allow updates on any joins at all. (End of Date's excerpt) Anyway, Date's view is consistent with what I have been posting to Kyle and other threads. Primary keys are REQUIRED by the relational model and are the only dependable mechanism to update rows in a table. I used to think this was a flaw in the relational model (i.e., you know,m these math guys don't understand the real world, where there are duplicate rows, abscence of keys, etc). I have come to understand, however, that it is really a failing of us programmers to recognize that the math guys are right and that we need to insist on more rigor in the definition of our data bases so that the relational model can be used properly, rather than coming up with kludgy ideas like updatable cursors, etc. Jon Rosen =================================================== "I know I put it in here somewhere!!" ===================================================
tmp@mentor.cc.purdue.edu (Tom Putnam) (05/17/91)
As a relative novice to SQL, my original reply on this question has certainly raised some interesting discussion. I must say I have been pleased to note the overall tone that all of us seem to be trying to achieve independence of our applications from the physical structure of the databse. In article <CIMSHOP!DAVIDM.91May16094622@uunet.UU.NET> cimshop!davidm@uunet.UU.NET (David S. Masterson) writes: >... >This is the meaning of physical abstraction, so Kyle's point is not off >target. Kyle's point is that ROWID is a physical implementation in that it >places an order on relations which are, by definition, unordered. I don't recall who said it, but ROWID as used in my example in fact implies no physical ordering. It is conceptually a unique value which somehow gives unique access to a record. Yes, one could say to oneself "I know ROWID as implemented in this version of <name-your-favorite-product> is in fact the physical row number of the record within the file." The ways people talk about using such unique identifiers seem to reflect their understanding that the ROWID does not require some kind of unique index data structure to exist. But the use I made did not interpret the value of ROWID in any way, nor do I think it ..."places an order on relations which are, by definition, unordered." As I mentioned to several of you in off-line mail, I am still uncomfortable with its use. It is probably because I know that I can use ROWID to more or less "physically" gain direct access to a record without having to use an index which depends upon a value which is explicitly in each record. So I am taking advantage of implementation knowledge that tells me this is probably much more efficient than writing the update statement with a bunch of WHERE clauses that uniquely but less-efficiently defines the record. By the way, I understand that some (older?) versions of Informix maintainted ROWIDs using SMALLINT data type which created all kinds of grief if the database exceeded 32767 records :-( The current version 4.0 uses type INTEGER so we don't run into trouble until we exceed 2,147,483,647 records. That should give me a couple more weeks to worry about it :-) -- Tom Putnam Assistant Director Internet: tmp@mentor.cc.Purdue.EDU Purdue University Computing Center BITNET: TMP@PURCCVM Mathematical Sciences Bldg. Phone: (317) 494-1787 West Lafayette, IN 47907
aland@informix.com (Colonel Panic) (05/22/91)
In article <12455@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam) writes: >... >By the way, I understand that some (older?) versions of Informix maintainted >ROWIDs using SMALLINT data type which created all kinds of grief if the >database exceeded 32767 records :-( Not in any version that I've ever heard of... -- Alan Denney aland@informix.com {pyramid|uunet}!infmx!aland Circle number 408 on Usenet Reader Service Card.