[comp.databases] Informix 4GL Question?

andrew@uel.uel.co.uk (Andrew Josey) (02/05/88)

When deleting records in my database I check whether they are
referenced by any other records. This is currently done by code
similar to the following :

let counter = 0
select count (*) into counter from table
  where table.column = key
if counter > 0
then 
    cannot delete as referenced
else
    delete
end if

As some of my database tables are large, I would like to find
an alternative technique that terminates after finding the first
match instead of searching the whole table.

There may be something in the manual - but I have not spotted it.
Thanks in advance. Please reply directly by e-mail.
-- 
 Andrew Josey,	AT&T Unix Europe, a Division of AT&T (UK) Ltd.
 International House, Ealing Broadway, London W5 5DB, England, UK
 uucp:{ mcvax!ukc, attunix} uel!andrew
 { The usual disclaimer .... }

daveb@geac.UUCP (David Collier-Brown) (02/22/88)

In article <714@uel.uel.co.uk> andrew@uel.uel.co.uk (Andrew Josey) writes:
>When deleting records in my database I check whether they are
>referenced by any other records. This is currently done by code
>similar to the following :
>
>let counter = 0
>select count (*) into counter from table
>  where table.column = key
>if counter > 0
>...
>As some of my database tables are large, I would like to find
>an alternative technique that terminates after finding the first
>match instead of searching the whole table.

   This is interesting, in that it is a perfectly plausible request
that does not **seem** to be well-defined in the relational
algebra/calculus (and perhaps even less well-defined in the *#%!@?&&
sublanguages we get with them).

   Can someone with a deeper knowledge of relational theory comment
on this, especially with respect to saying
	1) "give me one of"  -- the current problem
	2) "give me the first n of" -- from a problem in bibliographic
					search
	3) "give me any n of"

 --dave (this exists for a reason. I wonder what it is) c-b
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind) 
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

lchirica@polyslo.UUCP (Laurian Chirica) (02/24/88)

In article <2314@geac.UUCP> daveb@geac.UUCP (David Collier-Brown) writes:
>In article <714@uel.uel.co.uk> andrew@uel.uel.co.uk (Andrew Josey) writes:
>>When deleting records in my database I check whether they are
>>referenced by any other records. This is currently done by code
>>similar to the following :
>>
>>let counter = 0
>>select count (*) into counter from table
>>  where table.column = key
>>if counter > 0
>>...
>>As some of my database tables are large, I would like to find
>>an alternative technique that terminates after finding the first
>>match instead of searching the whole table.
>
>   This is interesting, in that it is a perfectly plausible request
>that does not **seem** to be well-defined in the relational
>algebra/calculus (and perhaps even less well-defined in the *#%!@?&&
>sublanguages we get with them).
>

I never worked with INFORMIX but if I remember correctly, it has
"cursors" which allow the type of processing you are asking for:

/* Suggestion --This is a schetch of the code involved */
/*  this code would be embedded in a Pascal or C program */

 define cursor X as select <anything>
	            from <relations> 
		    where <qualification>

 open cursor X

 while there are tuples left
      fetch cursor X    /* that is successive tuples that */
                        /* satisfy <qualification> */
      if I_want_to_quit then break else continue
end while 
/* end of suggestion */

P.S. Referring to your example "select count(*) .... etc., you seem
to believe that the system will scan the entire relation in order
to processes the query.  I do not know enough about INFORMIX but
any relational DBMS worth paying for will NOT scan the entire relation
to get the answer, IF "key" in your text is a primary key.  Any reasonable
query processor will do a keyed retrieval on "key" and return a 0 or 1
as an answer without scaning the relation.

P.P.S  The code in your example performs what is known in relational
database theory as a referential integrity check.  There are systems
(e.g., UNIFY) that will do that check automatically for you.

TO THE NET PEOPLE:  Does anyone know what other DBMSs support
referential integrity?

-- 
Laurian M. Chirica
Computer Science Department
California Polytechnic State University (CAL POLY)
San Luis Obispo, CA 93407 - (805) 756-1332

larry@postgres.uucp (Larry Rowe) (02/24/88)

the referenced messages asked about relational calculus queries that
answer the queries:
	1. get the first occurrence of some predicate
	2. get the first N occurrences of some predicate
	3. return true/false if an occurrence exists
QUEL has an aggregate function that does numbers 1 and 3.  the ANY
aggregate terminates when the first record is returned.  i believe
it returns 1 if a record is found, 0 otherwise.  for example,
the following query returns 1 if there is anybody in the toy department:

	RETRIEVE (x=ANY(emp.name where emp.dept= "toy"))

assuming that the emp relation has 100K tuples, the query terminates 
when it finds the first person in the toy department.  in other words
it terminates the scan implied by the aggregate function.  i'm not certain
whether university ingres or rti ingres will use an index on the
dept attribute to avoid scanning the primary data table, but i suspect 
they do.

this illustrates query 3.  query 1 is simple to do, just put the ANY
aggregate in the predicate rather than the target list.  for example,

	RETRIEVE (emp.all)
	WHERE ANY(emp.name by emp.dept where emp.dept= "toy") = 1

the by-clause is needed to link the tuple variable in the ANY
aggregate to the record constructed in the target list.

query 2 is not easy.  it can be solved for fixed N by doing a series of
nested aggregates (not possible in SQL) but this is really gruesome.  a
better strategy would be to have a specific operator to implement this
type of query.  maybe something like

	FIRST $n OF (RETRIEVE ...)

of course, this should probably also be an aggregate so you can count the
number of departments in which the 5th highest salary is greater than $10K...

one problem with this new formulation is that it may or may not lead to
more efficient processing.  suppose you asked for the 5th element from
a retrieve.  if the retrieved data doesn't have to be sorted you could
gain a substantial win by stopping the query early as in the ANY aggregate
above.  however, if the data had to be sorted (e.g., get the N-th highest),
i doubt that you would save very much because you still have to process
the query and sort the return set to determine the N-th highest (unless the
data was presorted in some storage structure).  once you have to sort, you
won't save much time if the answer set is very large.  my sense is that
the sorted version is needed much more often than the non-sorted version
when you ask for something where N is not equal to 1. 

sigh, yet another little query optimization to put into your optimizers.
	larry

mike@blipyramid.BLI.COM (Mike Ubell) (02/25/88)

In article <974@pasteur.Berkeley.Edu>, larry@postgres.uucp (Larry Rowe) writes:
> the referenced messages asked about relational calculus queries that
> answer the queries:
> 	1. get the first occurrence of some predicate
> 	2. get the first N occurrences of some predicate
> 	3. return true/false if an occurrence exists
> QUEL has an aggregate function that does numbers 1 and 3.  the ANY
> this illustrates query 3.  query 1 is simple to do, just put the ANY
> aggregate in the predicate rather than the target list.  for example,
> 
> 	RETRIEVE (emp.all)
> 	WHERE ANY(emp.name by emp.dept where emp.dept= "toy") = 1
> 
> the by-clause is needed to link the tuple variable in the ANY
> aggregate to the record constructed in the target list.

The above aggregate returns all emp records where emp.dept = "toy".
Quel joins the by list to ALL records in the original relation not
just the first one.

andy@garfield.columbia.edu (Andy Lowry) (02/27/88)

In article <974@pasteur.Berkeley.Edu> larry@postgres.UUCP (Larry Rowe) writes:
>the referenced messages asked about relational calculus queries that
>answer the queries:
>	1. get the first occurrence of some predicate
>	2. get the first N occurrences of some predicate
>	3. return true/false if an occurrence exists

Regarding the 2nd type of query:
>...  if the retrieved data doesn't have to be sorted you could
>gain a substantial win by stopping the query early as in the ANY aggregate
>above.  however, if the data had to be sorted (e.g., get the N-th highest),
>i doubt that you would save very much because you still have to process
>the query and sort the return set to determine the N-th highest (unless the
>data was presorted in some storage structure).

Note that the problem of finding the Nth largest (or smallest) element
in a list is a linear-time operation, and does not require sorting the
list.  An O(n) number-of-comparisons algorithm is presented in section
3.6 of Aho, Hopcroft & Ullman's "The Design and Analysis of Computer
Algorithms" (Addison-Wesley, Reading MA, 1974).  Even without
expending the effort of learning and implementing this algorithm, some
savings can be achieved using well-known sorting algorithms that yield
a correctly placed element on each iteration (like bubble sort or heap
sort).  The sort procedure could simply leave the tail end of the list
unsorted once the first N elements have been correctly placed.  (The
resulting algorithms will not be linear in n, but the operation will
be cheaper than a full sort using the same algorithm).

-Andy Lowry

allbery@ncoast.UUCP (Brandon Allbery) (02/29/88)

As quoted from <2314@geac.UUCP> by daveb@geac.UUCP (David Collier-Brown):
+---------------
| In article <714@uel.uel.co.uk> andrew@uel.uel.co.uk (Andrew Josey) writes:
| >let counter = 0
| >select count (*) into counter from table
| >  where table.column = key
| >if counter > 0
| >...
| >As some of my database tables are large, I would like to find
| >an alternative technique that terminates after finding the first
| >match instead of searching the whole table.
| 
|    This is interesting, in that it is a perfectly plausible request
| that does not **seem** to be well-defined in the relational
| algebra/calculus (and perhaps even less well-defined in the *#%!@?&&
| sublanguages we get with them).
+---------------

Dunno about relational calculus, but I would say:

SELECT COUNT(*) INTO :counter FROM DUAL
	WHERE EXISTS (SELECT 'x' FROM table WHERE table.column = :key);

This could be done from Informix-4GL if you create the DUAL table (a table
with one column containing one record, for the express purpose of performing
SQL operations that do not requite tables to execute but DO require a table
to satisfy SQL syntax).

create table dual (
	dummy char(1)
);
insert into dual values ("X");

The EXISTS operator doesn't have to search beyond the first match, and the
DUAL table should be fairly fast in use.  To make it faster, run the SQL
statement as:

	whenever error continue
	select "x" from dual where exists ...
	if status != 0 then ...

Now the SELECT succeeds and returns one record or fails and returns nothing.

That is actually how it would be done from Oracle, which is where I got
this trick from.

	EXEC SQL WHENEVER NOT FOUND GO TO notfound;
	EXEC SQL SELECT "x" FROM DUAL WHERE EXISTS (SELECT ...);
	/* records exist */
	
notfound:
	/* records don't exist */

For Accell types:  SET/SELECT does this automatically, see the manual.

	set $dummy to select #field from table where #field = $key;

Since the SET command can only handle one value if there is no EXECUTING
phrase, the first record found is returned and the SELECT statement stops
executing.  To see if it found anything, check the value of $dummy or the
value of status$().  (Set $dummy to UNDEFINED first if you check the value;
it'll be either UNDEFINED or the same as $key afterward.)  Of course, for the
case above the fastest way is to define an explicit relationship, after which
the delete operation will fail if there are any related records....

I admit that it's not the cleanest in the case of standard SQL, but it can
be done with some small measure of optimization.
-- 
	      Brandon S. Allbery, moderator of comp.sources.misc
       {well!hoptoad,uunet!hnsurg3,cbosgd,sun!mandrill}!ncoast!allbery

brianc@daedalus (Brian Colfer) (03/04/88)

In article <2314@geac.UUCP> daveb@geac.UUCP (David Collier-Brown) writes:
>In article <714@uel.uel.co.uk> andrew@uel.uel.co.uk (Andrew Josey) writes:
>>When deleting records in my database I check whether they are
>>referenced by any other records. This is currently done by code
>>similar to the following :
>>
>>let counter = 0
>>select count (*) into counter from table
>>  where table.column = key
>>if counter > 0
>>...
>>As some of my database tables are large, I would like to find
>>an alternative technique that terminates after finding the first
>>match instead of searching the whole table.
>
>   This is interesting, in that it is a perfectly plausible request
>that does not **seem** to be well-defined in the relational
>algebra/calculus (and perhaps even less well-defined in the *#%!@?&&
>sublanguages we get with them).

      Relational theory explicitly assumes that data is abstract and
therefore the sublanguages (SQL,etc.) should adhere to this assumption.
From E.F.Codd in Database programming and design vol. 1,no 2 (Feb. 88)

"What is important is that you can get at data in different relations 
by specifying conditions on the relationship between them.  It's true to 
say that a relation of the relationship model and a table are not the 
same thing.  A relation is a special kind of table in which ... it is possible
to shuffle the rows without changing information content....
[Here's the big one!]
In a relational database there is no notion of "nextness"..."

Following there is no such thing as the first or last
	so how can SQL support it?!

>
>   Can someone with a deeper knowledge of relational theory comment
>on this, especially with respect to saying
>	1) "give me one of"  -- the current problem

SELECT tbl1.col1
	INTO temptbl.col1
	FROM tbl1
	WHERE tbl1.col1 = unique_key OR tbl1.coln = unique_key

NOTE you can eleminate the INTO line and the value would be readinto
the virtual table known as the output device.

>	2) "give me the first n of" -- from a problem in bibliographic
>					search
>	3) "give me any n of"

Non-sensical from a relationship model perspective.  Plus from a
application perspective to arbitrarialy select a set of row(s) because of 
its position is illogical.  The better task formulation is:

Give me all the rows with these meaningful (date, author, publisher,
title) limtations. 

Or output the whole query to a file and then use some utility head/tail
to browse the content.

==========================================================================
             : UCSF Dept. of Lab. Medicine : brianc@cgl.ucsf.edu 
Brian Colfer : PH. 415-476-2325            : brianc@daedalus.ucsf.edu 
             :                             : brianc@ucsfcca.bitnet
==========================================================================

daveb@geac.UUCP (David Collier-Brown) (03/06/88)

>In article <2314@geac.UUCP> daveb@geac.UUCP (David Collier-Brown) writes:
>>   This is interesting, in that it is a perfectly plausible request
>>that does not **seem** to be well-defined in the relational
>>algebra/calculus (and perhaps even less well-defined in the *#%!@?&&
>>sublanguages we get with them).

In article <1169@ucsfcca.ucsf.edu> brianc@daedalus.UUCP (Brian Colfer) writes:
>      Relational theory explicitly assumes that data is abstract and
>therefore the sublanguages (SQL,etc.) should adhere to this assumption.
>From E.F.Codd in Database programming and design vol. 1,no 2 (Feb. 88)
[further clarification elided]

>>	2) "give me the first n of" -- from a problem in bibliographic
>>					search
>>	3) "give me any n of"
>
>Non-sensical from a relationship model perspective.  

  That's what I was hoping to discover.  Relational theory excludes
significant consideration of certain uses of the data which it
describes.  And that is a prefectly reasonable thing for it to do:
it is concerned with the nature and organization of data, not ad-hoc
manipulations such as the ones I raised.

  This raises a semantic-gap problem, though, in that selected
applications need operations which are not well-defined within the
constraints of the theory.  
>                                                             Plus from a
>application perspective to arbitrarialy select a set of row(s) because of 
>its position is illogical. 

  Quite the contrary!

  In this case, one is doing bibliographic query on a very large
database over a finite-speed link with limited display area for the
results (lets assume bus speeds and a 1M*1M bit screen).
  Human inspection of an arbitrary subset, *traditionally*
the first N encountered, is required to discover if the data
requested is the data desired.  Response is required to take less
than a minute.

  Without a theory which accepts such illogical (:-) constraints as
human, processing, communication and display limitations, one has
two choices:
	1) kludge it, or
	2) improve the theory.

  I know how to do (1).  Any suggestions on (2)?

--dave c-b

ps: As suggested in other postings, one can kludge this within SQL,
    with a wee bit of subtlety.
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind) 
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

brianc@daedalus (Brian Colfer) (03/10/88)

In article <2390@geac.UUCP> daveb@geac.UUCP (David Collier-Brown) writes:
>
>  That's what I was hoping to discover.  Relational theory excludes
>significant consideration of certain uses of the data which it
>describes.  And that is a prefectly reasonable thing for it to do:
>it is concerned with the nature and organization of data, not ad-hoc
>manipulations such as the ones I raised.

  See my followup with CURSORS.

>
>  This raises a semantic-gap problem, though, in that selected
>applications need operations which are not well-defined within the
>constraints of the theory.  

   I see a distinction between the issues of a good user interface and 
   a good database model.  I think that the relational model is clearly
   a good model.  I don't see this as advocating kludges rather it is 
   building on sound theory and research with a modular approach.

>>                                                             Plus from a
>>application perspective to arbitrarialy select a set of row(s) because of 
>>its position is illogical. 
>
>  Quite the contrary!
>
>  In this case, one is doing bibliographic query on a very large
   ...
>  Human inspection of an arbitrary subset, *traditionally*
>the first N encountered, is required to discover if the data
>requested is the data desired.  Response is required to take less
>than a minute.

CURSORS would accomadate this. But even here I think its better for the
user to be arbitrary rather than the relational model.  For example 
get a count and then select the set with the min. date  and allow for 
an interupt if its still too long. 

>
>  Without a theory which accepts such illogical (:-) constraints as
>human, processing, communication and display limitations, one has
>two choices:
>	1) kludge it, or

CURSORS should do it.

>	2) improve the theory.

WHY? I know that I all I can come up with are impressions of possible
extensions.  The foundation of the relational model is sound.   It is
based on predicate calculus which has a long and established history.

==========================================================================
             : UC San Francisco            : brianc@daedalus.ucsf.edu 
Brian Colfer : Dept. of Lab. Medicine      : ...!ucbvax!cgl!brianc 
             :  PH. 415-476-2325           : brianc@ucsfcca.bitnet
==========================================================================

daveb@geac.UUCP (David Collier-Brown) (03/11/88)

>In article <2390@geac.UUCP> daveb@geac.UUCP (David Collier-Brown) writes:
>>
>>  That's what I was hoping to discover.  Relational theory excludes
>>significant consideration of certain uses of the data which it
>>describes.  And that is a prefectly reasonable thing for it to do:
>>it is concerned with the nature and organization of data, not ad-hoc
>>manipulations such as the ones I raised.
>
In article <1180@ucsfcca.ucsf.edu> brianc@daedalus.UUCP (Brian Colfer) writes:
>  See my followup with CURSORS.

  Seen and appreciated, thanks. (other people made equivalent
suggestions).

	Side comment: someone on the net is forwarding
	latest-message-first... I keep getting things sorted
	by descending date. End of side comment.

>   a good database model.  I think that the relational model is clearly
>   a good model.  I don't see this as advocating kludges rather it is 
>   building on sound theory and research with a modular approach.
[...]  The foundation of the relational model is sound.   It is
>based on predicate calculus which has a long and established history.

  Yes, I agree predicate calculus is effective and surprisingly
descriptive:
  (A x | x is a woman (E w | w is work (w belongs to x -> w is undone))

  What is the next logical tool? The one which can describe cursors.
(I admit to having forgotten enough that I have trouble reading my
copy of Copi).

 --daveb
-- 
 David Collier-Brown.                 {mnetor yunexus utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind) 
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.