[comp.databases] how to implement cursor based UPDATE/DELETE

saunders@refine.enet.dec.com (Kyle Saunders) (05/13/91)

	Hi folks,

	I was wondering how you would implement cursor based UPDATE's and 
DELETE's.

	I have given some thought to this and I cannot see how the RDBMS can
properly update the correct row given the fact that the table to be updated
may not have _any_ indexes, and not all fields are guarenteed to be included
in the query.  This means there is no way to uniquely determine which row in
the original base table is being updated.

	- Kyle

	saunders@refine.enet.dec.com

tmp@mentor.cc.purdue.edu (Tom Putnam) (05/13/91)

In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com 
(Kyle Saunders) writes:
>
>	I was wondering how you would implement cursor based UPDATE's and 
>DELETE's.
>
>	I have given some thought to this and I cannot see how the RDBMS can
>properly update the correct row given the fact that the table to be updated
>may not have _any_ indexes, and not all fields are guarenteed to be included
>in the query.  This means there is no way to uniquely determine which row in
>the original base table is being updated.

I recently encountered this problem in a slightly different context: I wanted
to do an "ordered update".  My database keeps a list of people who have
registered for a course and a counter in each record which denotes the order of
registration.  The permits me to implement a "waiting list" when a course
fills.  In this context, I can't declare the cursor for UPDATE since I want it
ordered.  The work-around was to recognize that (at least under INFORMIX),
there is a field called ROWID for each row.  I can select it, then use it to
give me an "absolute pointer" to a record.  Here is my code sequence using
INFORMIX 4GL:

	declare reseq_c cursor for
		select rowid, * from what 
		where what.sid = this_sid and what.cid = this_cid
		order by internal, regdate, regseq

	open reseq_c
	let i = 1
	foreach reseq_c into this_row, wtmp.*

		update what set regseq = i
			where rowid = this_row
		let i = i + 1

	end foreach
	close reseq_c

Of course, one should be absolutely paranoid about ever keeping a ROWID around
for any longer than this sort of use.  Systems need to be able to physically
move rows to other locations as the database is updated.
-- 
Tom Putnam                               Assistant Director
Internet: tmp@mentor.cc.Purdue.EDU       Purdue University Computing Center
  BITNET: TMP@PURCCVM                    Mathematical Sciences Bldg.
   Phone: (317) 494-1787                 West Lafayette, IN 47907

saunders@refine.enet.dec.com (Kyle Saunders) (05/14/91)

In article <12260@mentor.cc.purdue.edu>, tmp@mentor.cc.purdue.edu (Tom Putnam) writes...
|In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com 
|(Kyle Saunders) writes:
|>
|>	I was wondering how you would implement cursor based UPDATE's and 
|>DELETE's.
|>
|>	I have given some thought to this and I cannot see how the RDBMS can
|>properly update the correct row given the fact that the table to be updated
|>may not have _any_ indexes, and not all fields are guarenteed to be included
|>in the query.  This means there is no way to uniquely determine which row in
|>the original base table is being updated.
| 
|I recently encountered this problem in a slightly different context: I wanted
|to do an "ordered update".  My database keeps a list of people who have
|registered for a course and a counter in each record which denotes the order of
|registration.  The permits me to implement a "waiting list" when a course
|fills.  In this context, I can't declare the cursor for UPDATE since I want it
|ordered.  The work-around was to recognize that (at least under INFORMIX),
|there is a field called ROWID for each row.  I can select it, then use it to
|give me an "absolute pointer" to a record.  Here is my code sequence using
	[ INFORMIX code deleted ]
| 
|Of course, one should be absolutely paranoid about ever keeping a ROWID around
|for any longer than this sort of use.  Systems need to be able to physically
|move rows to other locations as the database is updated.
|-- 

|Tom Putnam                               Assistant Director

	Well, your solution is exactly what I am trying to avoid.  I'm
_writing_ the SQL RDBMS, so an SQL work-around is not what I need.  It looks
like I will have to require an index to be created on at least one of the
field in the query definition of the cursor.
	  I want to avoid such kludges as the "hidden" ROWID column that
INFORMIX evidently has.  It doesn't provide for the physical abstraction that
the Relational Model is supposed to have.  I suppose you could say that
requiring an index also violates the relational model, but something tells me
that when you do:
DECLARE cursor-name CURSOR FOR select-query FOR UPDATE [optional-fields]
most RDBMS's will create an index on the fly if one doesn't exist on the
fields to update.  I would do this too, but I don't implement indexes yet.

	- Kyle

jfr@locus.com (Jon Rosen) (05/14/91)

In article <12260@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam) writes:
>In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com 
>(Kyle Saunders) writes:
>>	I was wondering how you would implement cursor based UPDATE's and 
>>DELETEs. I have given some thought to this and cannot see how the RDBMS can
>>properly update the correct row given the fact that the table to be updated
>>may not have _any_ indexes, and not all fields are guarenteed to be included
>>in the query.  This means there is no way to uniquely determine which row in
>>the original base table is being updated.
><some stuff deleted...>
>....  The work-around was to recognize that (at least under INFORMIX),
>there is a field called ROWID for each row.  I can select it, then use it to
>give me an "absolute pointer" to a record.
><code deleted...>
>Of course, one should be absolutely paranoid about ever keeping a ROWID around
>for any longer than this sort of use.  Systems need to be able to physically
>move rows to other locations as the database is updated.

Kyle questions how cursor-based UPDATEs and DELETEs can be implemented.  
If this refers to how an RDBMS implements it, the answer is related
to Tom's response... I.e., internally, the RDBMS must keep some kind of
a relative row id that it uses to link the cursor-retrieved row back to
the actual data row.  In fact, this implementation is quite common.  It
even leads to those databases such as Informix and Oracle actually
materializing the ROWID into a user-accessible column which can be 
used to do direct-access updates against rows without using RDBMS cursors
and the UPDATE/DELETE WHERE CURRENT OF statement.                  

Also, it is not as bad as Tom thinks to keep using this ROWID.  Usually,
the ROWID is a logical construction and has nothing (or little) to do
with the physical location/inode/record number of the actual row.  So
moving the row will generally not change its ROWID.  I have seen 
an application in Oracle that even uses ROWID to build links between
tables (UGGGH) in order to improve performance.  This requires that the
ROWID be persistent.  There can be difficulties if rows are deleted
and the associated ROWID references in other linked tables are not
(a "referential integrity" issue) since ROWIDs can be reissued once
the original row is no longer available.  A better solution would be
a fine granularity TIMESTAMP... In fact, Postgres uses this solution
to individualize rows and to allow for update/delete with full history.
All versions of rows are kept (until explicitly purged) and you can
browse through outdated versions of rows at will.  Of course, the
default is to look at the current version of a row.

Unfortunately, the ROWID solution begs several questions.  First of all, a
materialized ROWID is by no means universal.  Most RDBMS (including
DB2, OS/2 EE, RDB, Ingres, etc.) do not have such an abstraction.
In addition, neither ROWID nor updatable cursors deal with data
integrity across COMMITs.  When data is being updated on-line, it is
not advisable to hold a cursor open across the workstation or terminal
i/o (otherwise locks are kept for long periods of time).  Thus, if you
use a cursor to select some rows for display, you still should COMMIT
prior to displaying them.  Now, your updatable cursor is no longer
updatable.  In fact, you will either have to reread the rows with a
new updatable cursor or use a direct UPDATE.  Since you have no lock 
on the data while you are using it on-screen, someone else can come
along, read and modify the row without you knowing it.  

Once again, either a very fine granularity TIMESTAMP or an update/change
count column is needed to ensure that your update does not screw up
some other data that has been changed since your read.  Alternatively,
you have to do an UPDATE with a WHERE clause on every column of the
TABLE giving the original value in order to make sure that the row has
not been changed (bad idea from a performance standpoint).  

As for the duplicate row issue, don't you know that relational data
bases aren't supposed to have duplicate rows??? :-)

Jon Rosen

saunders@refine.enet.dec.com (Kyle Saunders) (05/14/91)

In article <24334@oolong.la.locus.com>, jfr@locus.com (Jon Rosen) writes...
|In article <12260@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam) writes:
|>In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com 
|>(Kyle Saunders) writes:
|>>	I was wondering how you would implement cursor based UPDATE's and 
|>>DELETEs. I have given some thought to this and cannot see how the RDBMS can
|>>properly update the correct row given the fact that the table to be updated
|>>may not have _any_ indexes, and not all fields are guarenteed to be included
|>>in the query.  This means there is no way to uniquely determine which row in
|>>the original base table is being updated.
|><some stuff deleted...>
|>....  The work-around was to recognize that (at least under INFORMIX),
|>there is a field called ROWID for each row.  I can select it, then use it to
|>give me an "absolute pointer" to a record.
|><code deleted...>
|>Of course, one should be absolutely paranoid about ever keeping a ROWID around
|>for any longer than this sort of use.  Systems need to be able to physically
|>move rows to other locations as the database is updated.
| 
|Kyle questions how cursor-based UPDATEs and DELETEs can be implemented.  
|If this refers to how an RDBMS implements it, the answer is related
|to Tom's response... I.e., internally, the RDBMS must keep some kind of
|a relative row id that it uses to link the cursor-retrieved row back to
|the actual data row.  In fact, this implementation is quite common.  It
|even leads to those databases such as Informix and Oracle actually
|materializing the ROWID into a user-accessible column which can be 
|used to do direct-access updates against rows without using RDBMS cursors
|and the UPDATE/DELETE WHERE CURRENT OF statement.                  

	Keeping an explicit pointer for each row generated by an OPEN cursor
to the original base table is a _major_ kludge IMHO.  That's special casing
this action.  What's worse is letting users access this ROWID.  If ROWID is
provided in any SELECT, why?!?  Users will abuse this bit of information.  The
user should not see the underlying physical implementation of the Relational
Model, if at all possible.

|Also, it is not as bad as Tom thinks to keep using this ROWID.  Usually,
|the ROWID is a logical construction and has nothing (or little) to do
|with the physical location/inode/record number of the actual row.  So 

	How is it a logical construction?  How is it safe to be anything _but_
the actual record number (guarenteed to be unique without having to look-up or
generate anything)?

	[ stuff about integrity deleted ]

|As for the duplicate row issue, don't you know that relational data
|bases aren't supposed to have duplicate rows??? :-)

	I never mentioned duplicate rows.  

|Jon Rosen

	- Kyle

jfr@locus.com (Jon Rosen) (05/15/91)

In article <22637@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saunders) writes:
>|>	I was wondering how you would implement cursor based UPDATE's and 
>|>DELETE's.
>|> <PREVIOUS POSTS DELETED>
>	Well, your solution is exactly what I am trying to avoid.  I'm
>_writing_ the SQL RDBMS, so an SQL work-around is not what I need.  It looks
>like I will have to require an index to be created on at least one of the
>field in the query definition of the cursor.
>	  I want to avoid such kludges as the "hidden" ROWID column that
>INFORMIX evidently has.  It doesn't provide for the physical abstraction that
>the Relational Model is supposed to have.  I suppose you could say that
>requiring an index also violates the relational model, but something tells me
>that when you do:
>DECLARE cursor-name CURSOR FOR select-query FOR UPDATE [optional-fields]
>most RDBMS's will create an index on the fly if one doesn't exist on the
>fields to update.  I would do this too, but I don't implement indexes yet.

I see what you are looking for now.  You need an internal implementation.
Your statement about the relational model requiring physical abstration
is off-target.  The relational model says nothing about the physical
implementation and in fact, there is absolutely nothing about the "right"
and "wrong" ways to implement a relational data base.  The relational mode
er model requires complete logical abstraction.  The user's VIEW of the
data must be relational.  The underlying implementation can do ANYTHING 
it wants or needs in order to create a complete and potentially efficient
view of the relational model.  In fact, this almost ALWAYS involves 
internal tuple ids or row ids.  The issue is whether these internal ids
are externally manifested (this would appear to be in violation of the
relational model although some argue otherwise).  REQUIRING an index 
violates the relational model (you should always be able to do any operation
solely by reference to the table's data).  However, REQUIRING a PRIMARY KEY 
is, in fact, intrinsic in the relational model.  The relational model
DOES NOT ALLOW FOR DUPLICATE ROWS IN TABLES.  Thus, every table ALWAYS
has a primary key, even if it consists of every attribute (column) 
of the row.  Keep in mind that PRIMARY KEYS are NOT INDEXES (even though
almost every physical implementation that enforces primary keys implements
them using some kind of indexing or hashing mechanism for performance).  

Your choices for uniquely identifying rows are limited to either the
primary key (no matter how many columns that involves) or an arbitrary
row id that you keep for each row.  If you are going to allow duplicate
rows in your table, you can ONLY use a row id since there is no way to
know which duplicate row is which since there is no primary key where
there are duplicate rows.  In effect, the row id becomes an arbitrary 
primary key.  No matter how you dice and slice it, that's all there is.

Jon Rosen

jfr@locus.com (Jon Rosen) (05/15/91)

In article <22653@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saunders) writes:
>
>In article <24334@oolong.la.locus.com>, jfr@locus.com (Jon Rosen) writes...
>|In article <12260@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam) writes:
>|>In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com 
>|>(Kyle Saunders) writes:
>|>>	I was wondering how you would implement cursor based UPDATE's and 
>|>>DELETEs. I have given some thought to this and cannot see how the RDBMS can
>|>>properly update the correct row given the fact that the table to be updated
>|>>may not have _any_ indexes, and not all fields are guarenteed to be included
>|>>in the query.  This means there is no way to uniquely determine which row in
>|>>the original base table is being updated.
>|><lotsa stuff deleted...>
>
>	Keeping an explicit pointer for each row generated by an OPEN cursor
>to the original base table is a _major_ kludge IMHO.  That's special casing
>this action.  What's worse is letting users access this ROWID.  If ROWID is
>provided in any SELECT, why?!?  Users will abuse this bit of information.  The
>user should not see the underlying physical implementation of the Relational
>Model, if at all possible.

  Why do you feel this is a kludge?  You have to do SOMETHING!  In another
  post you mention creating some kind of index on the fly when a DECLARE
  CURSOR is used with a FOR UPDATE clause .... Well, what else is an
  index but a key plus a rowid (albeit a strictly internal one)  which lets
  you get back to the original row?  If you don't have the primary key (i.e.,
  whether that is a subset of the columns or the complete set of columns
  is irrelevant as long as you have a primary key and no duplicate rows)
  available in your query, you either have to grab a unique row reference
  and include it with each row (call it what you will, a row id, a pointer,
  a link, I don't care it still gets you back to the row) or you have to
  supplement the query to include the entire primary key.  If the primary
  key has lots of columns (sometimes the whole row) this can certainly
  degrade performance more than keeping a row id.  And I don't think it
  is any less a kludge than the row id.  

>
>|Also, it is not as bad as Tom thinks to keep using this ROWID.  Usually,
>|the ROWID is a logical construction and has nothing (or little) to do
>|with the physical location/inode/record number of the actual row.  So 
>
>	How is it a logical construction?  How is it safe to be anything _but_
>the actual record number (guarenteed to be unique without having to look-up or
>generate anything)?

  Hmmm, I have to go back and look at my original post.  What I was trying
  to convey is that a ROWID in Oracle is a logical construct, not necessarily
  a physical one.  I believe it is a uniquely generated sequential number
  that is guaranteed not to change even if the row is moved.  If this is
  not the case (my Oracle recollections may be fuzzy) it certainly SHOULD
  be.  My TIMESTAMP rowid would be similar.  This is as opposed to using
  something like the sector-offset value for the location of the row on
  disk as the rowid.  Obviously, this is fraught with peril since the row
  can move somewhere else and its internal id can then change.  But I agree
  I may have been unclear on this point.

>|As for the duplicate row issue, don't you know that relational data
>|bases aren't supposed to have duplicate rows??? :-)
>
>	I never mentioned duplicate rows.  
>

  Oh... I thought you did... I looked back and you are right, you didnot.
  If that is the case, you have an implicit primary key in all tables
  and you are home free either with a rowid or by including all values
  of the primary key for each table in the query.  Any other approach
  is NOT supported by the relational model.  That is actually the main
  point.  The relational model expressly supports updating ONLY through
  the use of the primary key which means you have to have one and it
  has to be available at the time of the update.  The only problem 
  with that solution is that sometimes the performance using the
  primary key sucks.  Thus, the use of rowids (kludgy or otherwise).

  Jon Rosen
  No flames intended :-)

saunders@refine.enet.dec.com (Kyle Saunders) (05/15/91)

In article <22637@shlump.lkg.dec.com> jfr@locus.com (Jon Rosen) writes:
|In article <22637@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saunder
s) writes:
|>|>     I was wondering how you would implement cursor based UPDATE's and
|>|>DELETE's.
|>|> <PREVIOUS POSTS DELETED>

|>like I will have to require an index to be created on at least one of the
|>field in the query definition of the cursor.
|>         I want to avoid such kludges as the "hidden" ROWID column that
|>INFORMIX evidently has.  It doesn't provide for the physical abstraction that
|>the Relational Model is supposed to have.  I suppose you could say that
|>requiring an index also violates the relational model, but something tells me
|>that when you do:
|>DECLARE cursor-name CURSOR FOR select-query FOR UPDATE [optional-fields]
|>most RDBMS's will create an index on the fly if one doesn't exist on the
|>fields to update.  I would do this too, but I don't implement indexes yet.

|I see what you are looking for now.  You need an internal implementation.
|Your statement about the relational model requiring physical abstration
|is off-target.  The relational model says nothing about the physical
|implementation and in fact, there is absolutely nothing about the "right"

	No, I was stating that I felt allowing the user to see the internal
row-id, such that it is, would be violating the Relational Model.  You are
right that the Relational Model says nothing about the physical representation,
except that the user should _never_ see it.

|and "wrong" ways to implement a relational data base.  The relational mode
|er model requires complete logical abstraction.  The user's VIEW of the
|data must be relational.  The underlying implementation can do ANYTHING
|it wants or needs in order to create a complete and potentially efficient
|view of the relational model.  In fact, this almost ALWAYS involves
|internal tuple ids or row ids.  The issue is whether these internal ids
|are externally manifested (this would appear to be in violation of the
|relational model although some argue otherwise).  REQUIRING an index

	This is what I was referring to above.

|violates the relational model (you should always be able to do any operation
|solely by reference to the table's data).  However, REQUIRING a PRIMARY KEY

	I know that requiring requiring an index would violate the Relational
Model.  That is why I did not want to do that.  That is why I suggested that
perhaps the RDBMS should just create the index on the fly, _transparent_ to the
user, and drop it when finished.

|is, in fact, intrinsic in the relational model.  The relational model
|DOES NOT ALLOW FOR DUPLICATE ROWS IN TABLES.  Thus, every table ALWAYS
|has a primary key, even if it consists of every attribute (column)
|of the row.  Keep in mind that PRIMARY KEYS are NOT INDEXES (even though
|almost every physical implementation that enforces primary keys implements
|them using some kind of indexing or hashing mechanism for performance).

	Thanks for the lecture.  I already knew all that.

|Your choices for uniquely identifying rows are limited to either the
|primary key (no matter how many columns that involves) or an arbitrary
|row id that you keep for each row.  If you are going to allow duplicate
|rows in your table, you can ONLY use a row id since there is no way to
|know which duplicate row is which since there is no primary key where
|there are duplicate rows.  In effect, the row id becomes an arbitrary
|primary key.  No matter how you dice and slice it, that's all there is.
|
|Jon Rosen
|<

	You are missing the point, I think.  When you create a temporary
table with DECLARE cursor CURSOR FOR query; OPEN cursor; the columns in the
above query may not include all columns or the primary key.  Therefore, in 
order to find the row in the base table that corresponds to the row in the
temporary table, you need to have an index on one of the fields that is 
included in the query.

	- Kyle

saunders@refine.enet.dec.com (Kyle Saunders) (05/15/91)

In article <22653@shlump.lkg.dec.com> jfr@locus.com (Jon Rosen) writes:
|In article <22653@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saund`
s) writes:
|>
|>In article <24334@oolong.la.locus.com>, jfr@locus.com (Jon Rosen) writes...
|>|In article <12260@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam`
writes:
|>|>In article <22626@shlump.lkg.dec.com> saunders@refine.enet.dec.com
|>|>(Kyle Saunders) writes:
|>|>>   I was wondering how you would implement cursor based UPDATE's and
|>|>>DELETEs. I have given some thought to this and cannot see how the RDBMS can
|>|>>properly update the correct row given the fact that the table to be updated
|>|>>may not have _any_ indexes, and not all fields are guarenteed to be includ`
|>|>>in the query.  This means there is no way to uniquely determine which row `
|>|>>the original base table is being updated.
|>|><lotsa stuff deleted...>
|>
|>      Keeping an explicit pointer for each row generated by an OPEN cursor
|>to the original base table is a _major_ kludge IMHO.  That's special casing
|>this action.  What's worse is letting users access this ROWID.  If ROWID is
|>provided in any SELECT, why?!?  Users will abuse this bit of information.  The
|>user should not see the underlying physical implementation of the Relational
|>Model, if at all possible.
|
|  Why do you feel this is a kludge?  You have to do SOMETHING!  In another
|  post you mention creating some kind of index on the fly when a DECLARE
|  CURSOR is used with a FOR UPDATE clause .... Well, what else is an
|  index but a key plus a rowid (albeit a strictly internal one)  which lets
|  you get back to the original row?  If you don't have the primary key (i.e.,
|  whether that is a subset of the columns or the complete set of columns
|  is irrelevant as long as you have a primary key and no duplicate rows)
|  available in your query, you either have to grab a unique row reference
|  and include it with each row (call it what you will, a row id, a pointer,
|  a link, I don't care it still gets you back to the row) or you have to
|  supplement the query to include the entire primary key.  If the primary
|  key has lots of columns (sometimes the whole row) this can certainly
|  degrade performance more than keeping a row id.  And I don't think it
|  is any less a kludge than the row id.

	I don't see a problem with creating an index on the fly.  This should
fit nicely within the overall design of the RDBMS.  I had a problem with
adding a "hidden" column to store the base-table row id to the newly create
temporary table.  This wouldn't be done in any other case.  That's the kludgy
part I was referring to.

	[ stuff deleted that doesn't matter anymore :-) ]
|
|>|As for the duplicate row issue, don't you know that relational data
|>|bases aren't supposed to have duplicate rows??? :-)
|>
|>      I never mentioned duplicate rows.
|>
|
|  Oh... I thought you did... I looked back and you are right, you didnot.
|  If that is the case, you have an implicit primary key in all tables
|  and you are home free either with a rowid or by including all values
|  of the primary key for each table in the query.  Any other approach
|  is NOT supported by the relational model.  That is actually the main
|  point.  The relational model expressly supports updating ONLY through
|  the use of the primary key which means you have to have one and it
|  has to be available at the time of the update.  The only problem
|  with that solution is that sometimes the performance using the
|  primary key sucks.  Thus, the use of rowids (kludgy or otherwise).
|
	Ok, if it is necessary to have the primary key in the query that
defines the cursor and it does not violate the Relational Model by requiring
it, then that is the answer to my question.  You are correct in that the
Relational Model only supports unique address of rows through the primary key.
Sigh.  I guess I should have thought of that earlier.  I just wanted to be as
flexible as possible.  That, and the fact that I haven't implemented 
primary/foreign key support yet (working on the precompiler/development tools).

|  Jon Rosen
|  No flames intended :-)
	
	No flames here either, previous post included (I hope :-).

	- Kyle

billc@ingres.com (Bill Coffin, x3387) (05/16/91)

> As for the duplicate row issue, don't you know that relational data
> bases aren't supposed to have duplicate rows??? :-)

But a cursor can be opened on a query that returns duplicate rows.

There are other problems in UPDATE/DELETE through a cursor.  That's why ANSI SQL86 has some
very strict rules about updatability of cursors.  For instance, what does it mean to
update a cursor if there's an aggregate function in the query?  Or any data derived by
mathematical operations, or, in fact, any data that is not a 1-1 representation of the
data in the queried table?  You generally can't update/delete through a join, but that's
a whole 'nother can of worms.  In fact, cursor UPDATE/DELETE has all the problems of
updating through views, and then some more.

Another anomaly to consider is the ORDER BY.  What does it mean to update one of the 
fields that you sorted on?  (SQL86 gets around this by stating that the query produces
a "snapshot" of the data, which you can then perturb.  This model is insufficient for solving
many UPDATE/DELETE ambiguities, but may be enough for the ORDER BY problem.  Just look 
out for the "Halloween Problem"!)

I could go on.  I once implemented a cursor interface to an SQL dialect that didn't
support cursors.  I found it was very difficult unless I enforced the existence of a 
unique index.  (The rowid was not reliable in this particular environment.)
A unique ID does solve a large number of problems.
--

Bill Coffin, aka billc@ingres.com 
 FETCH INTO :standard_disclaimer, :gratuitous_witticism, :obscure_quote

michael@cutler.uucp (05/16/91)

In article <24334@oolong.la.locus.com>, jfr@locus.com (Jon Rosen) writes:

> ... Usually, the ROWID is a logical construction and has nothing (or
> little) to do with the physical location/inode/record number of the
> actual row.  So moving the row will generally not change its ROWID. ...
> 
> Unfortunately, the ROWID solution begs several questions.  First of all,
> a materialized ROWID is by no means universal.  Most RDBMS (including
> DB2, OS/2 EE, RDB, Ingres, etc.) do not have such an abstraction. ...

Just for the record, INGRES does have (and has always had) such an
abstraction.  Each row in an INGRES table or index has a TID, or
tuple identifier.  Use it as you would any other column.

If memory serves, the TID is indeed related to physical location.  The
TID is something to the effect of the high order m bits is the INGRES
page within the table the row is located, and the low order n bits
is the index of the row within the INGRES page.

Michael D. Shields        {UUCP: ...rutgers!ogicse!orstcs!cutler!michael}
AVP, MIS                  {Internet: michael%cutler.uucp@cs.orst.edu}
Cutler & Company, Inc.
Medford, OR  97504
(503) 770-9000

rap@n5pph007.uucp (Allen Pippin,RTP,8189,) (05/16/91)

Many relational managers make use of a "full" scan of some 
sort.  For example SQL/DS has the dreaded "dbspace scan", DB2 and
Oracle have the tablespace scan.  These functions make a complete pass
over all data in the table applying update/delete to those rows which match
the predicate.   In fact, in certain cases the query optimizers for
SQL/DS, DB2, and Oracle will pass up the use of an index an use the
full scan method to reduce total I/O to satisfy a request.  

By the way, the ROWID concept is in every RDBMS I am aware of, but it is
often hidden and cannot be accessed via the SQL language. 

Hope this helps...


------------------------------------------------------------------------------
Allen Pippin                               | Reply to: 
Dept 1165, Northern Telecom, Inc.          |   rap%n5pph007@rti.rti.org
RTP NC, 27709                              |

davidm@uunet.UU.NET (David S. Masterson) (05/16/91)

>>>>> On 14 May 91 18:47:57 GMT, jfr@locus.com (Jon Rosen) said:

Jon> In article <22637@shlump.lkg.dec.com> saunders@refine.enet.dec.com (Kyle Saunders) writes:

Kyle> |>I was wondering how you would implement cursor based UPDATE's and 
Kyle> |>DELETE's.
Kyle> |> <PREVIOUS POSTS DELETED>

Kyle> Well, your solution is exactly what I am trying to avoid.  I'm _writing_
Kyle> the SQL RDBMS, so an SQL work-around is not what I need.  It looks like
Kyle> I will have to require an index to be created on at least one of the
Kyle> field in the query definition of the cursor.

Kyle> I want to avoid such kludges as the "hidden" ROWID column that INFORMIX
Kyle> evidently has.  It doesn't provide for the physical abstraction that the
Kyle> Relational Model is supposed to have.  

Jon> I see what you are looking for now.  You need an internal implementation.
Jon> Your statement about the relational model requiring physical abstration
Jon> is off-target.  The relational model says nothing about the physical
Jon> implementation and in fact, there is absolutely nothing about the "right"
Jon> and "wrong" ways to implement a relational data base.

This is the meaning of physical abstraction, so Kyle's point is not off
target.  Kyle's point is that ROWID is a physical implementation in that it
places an order on relations which are, by definition, unordered.

Jon> The relational model requires complete logical abstraction.  The user's
Jon> VIEW of the data must be relational.  The underlying implementation can
Jon> do ANYTHING it wants or needs in order to create a complete and
Jon> potentially efficient view of the relational model.  In fact, this almost
Jon> ALWAYS involves internal tuple ids or row ids.  The issue is whether
Jon> these internal ids are externally manifested

Exactly.  What the relational system does internally is its own business, but
when those internal decisions begin manifesting themselves on the interface,
then some aspect of the relational model is lost.  See Codd's new book.

--
====================================================================
David Masterson					Consilium, Inc.
(415) 691-6311					640 Clyde Ct.
uunet!cimshop!davidm				Mtn. View, CA  94043
====================================================================
"If someone thinks they know what I said, then I didn't say it!"

jfr@locus.com (Jon Rosen) (05/17/91)

In article <1991May15.171152.557@ingres.Ingres.COM> billc@ingres.com (Bill Coffin, x3387) writes:
>> As for the duplicate row issue, don't you know that relational data
>> bases aren't supposed to have duplicate rows??? :-)
>
>But a cursor can be opened on a query that returns duplicate rows.
> 

  Ah, but as I will illustrate, in a proper implementation of the
  relational model, a cursor can NOT be used to update rows if
  duplicates are returned.            

>There are other problems in UPDATE/DELETE through a cursor. 

  No sh*t :-)... That's why "updatable cursors" are such a drag.

  Let's not forget that updatable cursors are NOT part of the
  relational model but are in fact an arbitrary construct imposed
  on implementations due to the impedance mismatch between a
  set-oriented language like SQL and a record-at-a-time-oriented
  language like COBOL or C.  In addition, (and probably most
  likely), IBM, when implementing SQL, had to pay attention to
  the needs of its most "important" users, i.e., the MIS shops,
  with their oodles of millions of lines of COBOL code with VSAM
  and QSAM data.  Most of these programs do things like loop 
  through a file and update in place those rows that need to
  be updated.  Updatable cursors give you a construct that allows
  you to convert to SQL without as much pain as if you started
  requiring primary key retrieval on every selection.

><stuff deleted>
>You generally can't update/delete through a join, but that's
>a whole 'nother can of worms.  In fact,
>cursor UPDATE/DELETE has all the problems of
>updating through views, and then some more.

  Most of this is related to the fact that IBM's SQL (and most others)
  do not enforce the concept of a primary key.

>A unique ID does solve a large number of problems.

  Here is a quote from Chris Date's book "A Guide To The SQL Standard,
  Second Edition" (Addison-Wesley 1989).  

  Appendix F.9 - Aspects of the Relational Model Not Supported

  "Primary keys provide the SOLE (emphasis added) record-level
   addressing mechanism within the relational model.  That is, the
   ONLY (Date's emphasis) system-guaranteed method of identifying
   an individual record is via the combination (R,k) where R is
   the name of the relation and k is the primary key value for the
   record concerned.  Every relation (to BE a relation (Date's emphasis))
   is required to have a primary key.  Primary keys are (of course)
   required to be unique; in the case of real (i.e., base) relations,
   they are also required to be wholly non-null.

   Comment: In fact, of course, the relational model requires every
   relation to satisfy the requirement that there be no duplicate
   rows (this is implied by the primary key requirement).  The fact
   that SQL does permit duplicate rows should be regarded as another
   grave mistake in the original design of the language. End comment.

   SQL currently provides the mechanisms that allow users to apply
   the primary key discipline for themselves, but it does not understand
   the semantics associated with that discipline. As a result, SQL
   support for certain other functions is either deficient or lacking
   entirely, as I now explain.

   <some stuff deleted here>

   3. An understanding of primary keys is required in order to support
   the updating of views correctly.  SQL's rules for updating of views
   are in fact disgracefully ad hoc.  <stuff deleted>
     (a) Projections are logically updatable if and only if they preserve
         the primary key of the underlying relation.  However, SQL supports
         updates not on projections per se, but on what might be called
         column subsets where a column subset is any subset of the
         columns of the underlying table for which duplicate elimination
         is NOT requested - with a "user beware" if that subset does not
         in fact include the underlying primary key.  
     (b) Any restriction should be logicially updatable.  SQL however,
         does not permit such updates if duplicate elimination is requested.
         <lotsa stuff deleted here>
     (c) A join of two tables on their primary keys should always be
         updatable.... However SQL does not currently allow updates on
         any joins at all. 

(End of Date's excerpt)          

Anyway, Date's view is consistent with what I have been posting to Kyle
and other threads.  Primary keys are REQUIRED by the relational model
and are the only dependable mechanism to update rows in a table.  

I used to think this was a flaw in the relational model (i.e., you know,m
these math guys don't understand the real world, where there are duplicate
rows, abscence of keys, etc).  I have come to understand, however, that
it is really a failing of us programmers to recognize that the math guys
are right and that we need to insist on more rigor in the definition of
our data bases so that the relational model can be used properly, rather
than coming up with kludgy ideas like updatable cursors, etc.  

Jon Rosen
===================================================
"I know I put it in here somewhere!!"
===================================================

tmp@mentor.cc.purdue.edu (Tom Putnam) (05/17/91)

As a relative novice to SQL, my original reply on this question has certainly
raised some interesting discussion.  I must say I have been pleased to note the 
overall tone that all of us seem to be trying to achieve independence of our
applications from the physical structure of the databse.

In article <CIMSHOP!DAVIDM.91May16094622@uunet.UU.NET> 
cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
>...
>This is the meaning of physical abstraction, so Kyle's point is not off
>target.  Kyle's point is that ROWID is a physical implementation in that it
>places an order on relations which are, by definition, unordered.

I don't recall who said it, but ROWID as used in my example in fact implies
no physical ordering.  It is conceptually a unique value which somehow gives
unique access to a record.  Yes, one could say to oneself "I know ROWID as
implemented in this version of <name-your-favorite-product> is in fact the 
physical row number of the record within the file."  The ways people talk about
using such unique identifiers seem to reflect their understanding that the
ROWID does not require some kind of unique index data structure to exist.
But the use I made did not interpret the value of ROWID in any way, nor do I
think it ..."places an order on relations which are, by definition, unordered."

As I mentioned to several of you in off-line mail, I am still uncomfortable 
with its use.  It is probably because I know that I can use ROWID to more or
less "physically" gain direct access to a record without having to use an index
which depends upon a value which is explicitly in each record.  So I am taking
advantage of implementation knowledge that tells me this is probably much more
efficient than writing the update statement with a bunch of WHERE clauses that
uniquely but less-efficiently defines the record.

By the way, I understand that some (older?) versions of Informix maintainted
ROWIDs using SMALLINT data type which created all kinds of grief if the
database exceeded 32767 records :-(  The current version 4.0 uses type INTEGER
so we don't run into trouble until we exceed 2,147,483,647 records.  That
should give me a couple more weeks to worry about it :-)

-- 
Tom Putnam                               Assistant Director
Internet: tmp@mentor.cc.Purdue.EDU       Purdue University Computing Center
  BITNET: TMP@PURCCVM                    Mathematical Sciences Bldg.
   Phone: (317) 494-1787                 West Lafayette, IN 47907

aland@informix.com (Colonel Panic) (05/22/91)

In article <12455@mentor.cc.purdue.edu> tmp@mentor.cc.purdue.edu (Tom Putnam) writes:
>...
>By the way, I understand that some (older?) versions of Informix maintainted
>ROWIDs using SMALLINT data type which created all kinds of grief if the
>database exceeded 32767 records :-(  

Not in any version that I've ever heard of...

--
Alan Denney      aland@informix.com      {pyramid|uunet}!infmx!aland
          Circle number 408 on Usenet Reader Service Card.