[comp.databases] SQL Precompilers

cohend@dg-rtp.dg.com (Dave Cohen) (07/26/90)

I would like to find out from experienced users/developers which
SQL RDBMs offer true preoptimization. I'm referring to precompilers
that process host languages with embedded SQL statements, 
optimize the statements based on database statistics and
indexes, and generate calls that refer to some statement identifier.
Simply massaging the SQL into calls that pass the actual
statement text is definitely not good enough. Neither is forcing
users to actually make calls that store that statement in the database
(ie, Sybase stored statements). It appears that most of the database
vendor hype ignores precompilation, but precompiled applications
run faster and are easier to code than call level SQL.

In particular, I'd like to hear about the 'biggies' : DB2 (pretty sure
this does), Oracle, Ingres, Informix, Sybase, Empress, etc.

If you want, email me and I'll post a summary.
Thanks.

David Cohen                      |    "There's nothin' wrong with goin'    
cohend@dg-rtp.dg.com             |     nowhere, baby, but we should be   
{world}!mcnc!rti!dg-rtp!cohend   |     should be goin' nowhere fast."       
Data General Corporation, RTP, NC|                  - Streets of Fire      

bg0l+@andrew.cmu.edu (Bruce E. Golightly) (07/27/90)

I'm not sure that you'd like the kind of pre-optimization you described. If the
characteristics of the data base changed to any real extent, performance
could fall right through the floor. Good performance can be related to 
optimization that makes use of statistics on the data base AS IT PRESENTLY
EXISTS. Old stats may be very misleading!!

Somewthing that works much like what you describe is a real stored procedure.
Ingres 6.2 and up alloow the creation of things they call database procedures.
They are SQL "scripts" that are stored as objects wihtin the data base. They 
may have permissions assigned, protecting the procedure and the data it 
references from the user.

#############################################################################
                                     !                                      
Bruce E. Golightly                   !  bg0l@andrew.cmu.edu (BeeGeeZeroEl)  
Chief of Data Base Development       !  (412)268-8560                       
Telecommunications Group             !                                      
Carnegie Mellon University           !  UCC 117, 4910 Forbes Ave.           
                                     !  Pittsburgh PA 15213-3890            
President, Western Penna IUA         !
#############################################################################

rbw00@ccc.amdahl.com ( 213 Richard Wilmot) (07/27/90)

Regarding precompiling of SQL queries Bruce E. Golightly wrote:

>I'm not sure that you'd like the kind of pre-optimization you described. If the
>characteristics of the data base changed to any real extent, performance
>could fall right through the floor. Good performance can be related to
>optimization that makes use of statistics on the data base AS IT PRESENTLY
>EXISTS. Old stats may be very misleading!!

While it is true that query efficiency can suffer from reliance on out-of-
date population and value distribution statistics, there are many
applications where dynamically determining the access path is not efficient.
The type of transaction used in the TPC benchmarks, for example, will
benefit markedly from precompiling (by the query optimizer) the database
accesses. TPC benchmarks are very short financial update transactions.

It might be worthwhile someday for the DB vendors to arrange their systems
so that growth beyond some thresholds in certain populations/value sets
could automatically trigger recompilation of designated queries.

I understand that precompiled DB2 access modules are invalidated and recompiled
when there are any structural changes to associated tables or indexes.

In highly repetitive systems (e.g. banking, airlines) compilation is popular
for because one can amortize a single reptition of optimization over many
executions of the same code.
-- 
  Dick Wilmot  | I declaim that Amdahl might disclaim any of my claims.
                 (408) 746-6108

mao@eden (Mike Olson) (07/27/90)

In <c0PZ02YZ01UI01@JUTS.ccc.amdahl.com>, rbw00@ccc.amdahl.com (Richard Wilmot)
writes

> While it is true that query efficiency can suffer from reliance on out-of-
> date population and value distribution statistics, there are many
> applications where dynamically determining the access path is not efficient.
> The type of transaction used in the TPC benchmarks, for example, will
> benefit markedly from precompiling (by the query optimizer) the database
> accesses. TPC benchmarks are very short financial update transactions.
> ...
> In highly repetitive systems (e.g. banking, airlines) compilation is popular
> for because one can amortize a single reptition of optimization over many
> executions of the same code.

sharebase (formerly britton lee) supported this level of interaction when
i worked there some time ago.  you could submit a query, a parsed query
tree, or a plan to the database engine from a user program.  in the last
case, the engine would execute the plan.  this is almost always a bad idea,
but, as you note, even bad ideas have their moments.

					mike olson
					postgres research group
					uc berkeley
					mao@postgres.berkeley.edu

davek@infmx.UUCP (David Kosenko) (07/27/90)

In article <1990Jul26.141643.6361@dg-rtp.dg.com> cohend@dg-rtp.dg.com writes:
->I would like to find out from experienced users/developers which
->SQL RDBMs offer true preoptimization. I'm referring to precompilers
->that process host languages with embedded SQL statements, 
->optimize the statements based on database statistics and
->indexes, and generate calls that refer to some statement identifier.
->Simply massaging the SQL into calls that pass the actual
->statement text is definitely not good enough. Neither is forcing
->users to actually make calls that store that statement in the database
->(ie, Sybase stored statements). It appears that most of the database
->vendor hype ignores precompilation, but precompiled applications
->run faster and are easier to code than call level SQL.

	Are you sure you want optimization of embedded SQL statements
to occur at compile time?  Consider what would happen if your schema
changed (adding/dropping indexes) or the profile of your database
changed (e.g. a previously small table grew very large).  That implies
having to recompile your application for optimal performance for
the new situation.
	Informix embedded languages will pass the SQL string to the
engine process via standard function calls.  The engine will then
optimize the statement for processing.  Why is this not "good enough"?
Doesn't that provide the most dynamic and optimal possibility for
statement optimization?

Dave
-- 
Disclaimer:  The opinions expressed herein | There's more than one answer
are by no means those of Informix Software | to these questions pointing me
(though they make you wonder about the     | in a crooked line...
 strange people they hire).                | 

davidm@uunet.UU.NET (David S. Masterson) (07/28/90)

In article <1990Jul26.141643.6361@dg-rtp.dg.com> cohend@dg-rtp.dg.com
(Dave Cohen) writes:

   I would like to find out from experienced users/developers which SQL RDBMs
   offer true preoptimization. I'm referring to precompilers that process host
   languages with embedded SQL statements, optimize the statements based on
   database statistics and indexes, and generate calls that refer to some
   statement identifier.  Simply massaging the SQL into calls that pass the
   actual statement text is definitely not good enough. Neither is forcing
   users to actually make calls that store that statement in the database (ie,
   Sybase stored statements). It appears that most of the database vendor hype
   ignores precompilation, but precompiled applications run faster and are
   easier to code than call level SQL.

Why would you want the embedded approach to SQL?  Regardless of how well the
precompiler is implemented, the support for it by other products will never be
good.  Consider how many utilities in Unix are screwed up with embedded SQL in
a C program (for instance, FCREF, INDENT and VGRIND).  In my book, the
function call approach for sending SQL statements to the database is the
correct one (ashame I haven't been given the chance to use it).  It breaks far
less utility routines that are very important to the programmer.  Also, if you
shift back and forth between languages (like C and C++), there is no new
syntax to learn or tricks to apply (like faking an SQL precompiler for C to
not mess up C++ code).

I believe that the statement "precompiled applications run faster and are
easier to code than call level SQL" may be true with what is often supplied,
but is wrong as a general statement.  In order to optimize a precompiled
statement the way that is suggested, certain aspects of the SQL must be fixed
for that statement (relations to deal with, columns within the relations,
operators to be applied).  This cuts out any dynamic SQL capability.  A very
good precompiler might determine the difference between a statement that will
be treated dynamically and one that will not and, therefore, make appropriate
optimizations.  However, that is not always possible (for instance):

		SELECT col FROM table
		WHERE col = :var;

A precompiler might make the assumption that everything is known and,
therefore, optimizable, but what if the user chose to fill "var" in with "1 or
col = 2" at run-time.  It changes the whole complexion of the statement
(especially from the optimizer's perspective).  This can get far more complex,
but it shows the problems that can happen when mixing potentially dynamic
statements with statements that can be optimized.  The decision of what is
optimizable and what isn't can be put in the hands of the user (he should
know, right?) by either limiting what they can do to only optimizable
statements or by extending the SQL statements to have some sentinel to inform
the precompiler to perform optimizations.  In doing this, though, what has
been bought?  (decreased flexibility or increased user confusion)

A function call approach will generally tend toward the sentinel approach
(although the sentinel is now the name of function, not a change to the SQL
syntax), so there might be some programmer confusion, but it is just in
learning new function calls, not a change in the language itself.  Also,
precompilers that worked with database engines typically embedded a set up
call in the program to inform the database of an optimized function (after
all, the compiled program might run against a copy of the database that it was
compiled against which would not know about any statements stored in the
original database).  This could certainly be done by the programmer using a
function call approach, its just not as automatic.

   In particular, I'd like to hear about the 'biggies' : DB2 (pretty sure
   this does), Oracle, Ingres, Informix, Sybase, Empress, etc.

Britton-Lee did it with there early IDL/C precompiler, but I'm not sure if
they do it anymore (there was more flexibility in not doing it).  DEC's
RDB/VMS precompiles SQL statements into macro assembler code that it then
calls with a generated function call (that's what it looks like anyway).  I
don't believe Ingres did this in v5, have no idea about v6.  Sybase did use a
function call approach (no precompiler), but they may have expanded their
repitoire since last I saw.  Unsure about the others.
--
===================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mt. View, CA  94043
===================================================================
"If someone thinks they know what I said, then I didn't say it!"

jas@llama.Ingres.COM (Jim Shankland) (07/31/90)

In article <26518@pasteur.Berkeley.EDU> mao@postgres.Berkeley.EDU (Mike Olson) writes:
>In <c0PZ02YZ01UI01@JUTS.ccc.amdahl.com>, rbw00@ccc.amdahl.com (Richard Wilmot)
>writes:
>> In highly repetitive systems (e.g. banking, airlines) compilation is popular
>> for because one can amortize a single reptition of optimization over many
>> executions of the same code.
>
>sharebase (formerly britton lee) supported this level of interaction when
>i worked there some time ago.  you could submit a query, a parsed query
>tree, or a plan to the database engine from a user program.  in the last
>case, the engine would execute the plan.  this is almost always a bad idea,
>but, as you note, even bad ideas have their moments.

I'd quibble with the words "almost always".  Banking, airline, and other
TP applications are hardly arcane applications that are "almost never"
executed.  In fact, at the risk of a slight overstatement, I claim that
anything *other* than submitting a pre-baked query plan for execution
is almost always a bad idea.  Surely no-one believes that Wells Fargo
Bank runs a query optimizer every time I inquire about my checking account
balance?

jas

These are my opinions.  Ingres Corp. neither knows nor cares about them.

jas@llama.Ingres.COM (Jim Shankland) (07/31/90)

In article <4907@infmx.UUCP> davek@infmx.UUCP (David Kosenko) writes:
>	Are you sure you want optimization of embedded SQL statements
>to occur at compile time?  Consider what would happen if your schema
>changed (adding/dropping indexes) or the profile of your database
>changed (e.g. a previously small table grew very large).  That implies
>having to recompile your application for optimal performance for
>the new situation.

All true.  Life is hard.  You have to weigh this against the cost of
re-compiling the query every blessed time you run it.  Also, it's not
too hard to invalidate stored query plans when, e.g., indexes are added
or dropped.  Finally, queries can be recompiled by the DBA from time
to time (nightly is easy if it's not a 24-hour shop).

>	Informix embedded languages will pass the SQL string to the
>engine process via standard function calls.  The engine will then
>optimize the statement for processing.  Why is this not "good enough"?
>Doesn't that provide the most dynamic and optimal possibility for
>statement optimization?

It's not good enough because the cost of optimization may dwarf the cost
of execution.  If the query is going to be executed thousands of times,
this can be a real lose.  It's analogous to recompiling all C programs
every time you run them because a system header file may have changed.

jas

These are my opinions; Ingres Corp. neither knows nor cares about them.

jeffl@sybase.Sybase.COM (Jeff Lichtman) (08/02/90)

>>...or a plan to the database engine from a user program...
>>this is almost always a bad idea...
> I claim that anything *other* than submitting a pre-baked query plan for
> execution is almost always a bad idea...

Precompiled queries are good.  I believe (please correct me if I'm wrong)
that the original posting was talking about user-written query plans
hard-coded into applications and sent to the server.  The problem here
is that the compilation and plan storage are being done in the wrong
place.  It's much better for the DBMS to compile and keep the plan,
so all the user has to do is invoke it by name.
---
Jeff Lichtman at Sybase
{mtxinu,pacbell}!sybase!jeffl  -or- jeffl@sybase.com
"Saints should always be judged guilty until they are proved innocent..."

jas@llama.Ingres.COM (Jim Shankland) (08/03/90)

In article <10388@sybase.sybase.com> jeffl@sybase.Sybase.COM (Jeff Lichtman) writes:
>Precompiled queries are good.  I believe (please correct me if I'm wrong)
>that the original posting was talking about user-written query plans
>hard-coded into applications and sent to the server.  The problem here
>is that the compilation and plan storage are being done in the wrong
>place.  It's much better for the DBMS to compile and keep the plan,
>so all the user has to do is invoke it by name.

Yes; I agree entirely.  My apologies if I misunderstood.  Perhaps it's
just as well that we made this explicit, though, since someone else
asked why it wasn't good enough to feed SQL to the DBMS engine
on every query execution.

jas

setas01@usgcdh.uucp (08/04/90)

 Re : SQL Precompilers

  David Cohen from  Data General Corporation, RTP, NC writes :
 
> I would like to find out from experienced users/developers which
> SQL RDBMs offer true preoptimization. I'm referring to precompilers
> that process host languages with embedded SQL statements, 
> optimize the statements based on database statistics and
> indexes, and generate calls that refer to some statement identifier.
> Simply massaging the SQL into calls that pass the actual
> statement text is definitely not good enough. Neither is forcing
> users to actually make calls that store that statement in the database
> (ie, Sybase stored statements). It appears that most of the database
> vendor hype ignores precompilation, but precompiled applications
> run faster and are easier to code than call level SQL.

> In particular, I'd like to hear about the 'biggies' : DB2 (pretty sure
> this does), Oracle, Ingres, Informix, Sybase, Empress, etc.

> If you want, email me and I'll post a summary.
> Thanks.

> David Cohen                      |    "There's nothin' wrong with goin'    
> cohend@dg-rtp.dg.com             |     nowhere, baby, but we should be   
> {world}!mcnc!rti!dg-rtp!cohend   |     should be goin' nowhere fast."       
> Data General Corporation, RTP, NC|                  - Streets of Fire      

I agree that there has been much hype in this area.  In fact, there
is only one DBMS on the mini-market (VAX/VMS, UNIX) that has this
capability:  CA-DB from Computer Associates.  None of the Oracle,
Ingres, Sybase etc. support precompiled queries.  As a consequence,
it is the fastest DBMS on the market when using ANSI embedded SQL
in standard programming languages which is an extremely important
factor in portability.  All the "big" vendors seem to be saying that
if you need high performance, you should lock your future into their
product (this may sound like bashing, but it is very annoying to see
marrketing hype from companies espousing Standards as well as open
architecture)!

CA-DB optimizes the query at preprocess time and stores the access
plans in the database.  At run time, these access plans are executed.
Of course, CA-DB has to also deal with changes in the database that
could invalidate an access plan and it does it automatically.

If you need more information, please call me.

Ashok Sethi
(408) 922-2666, Ext. 5636

miket@blia.BLI.COM (Mike Tossy) (08/07/90)

In article <319.26b9b861@usgcdh.uucp>, setas01@usgcdh.uucp writes:

> I agree that there has been much hype in this area.  In fact, there
> is only one DBMS on the mini-market (VAX/VMS, UNIX) that has this
> capability:  CA-DB from Computer Associates.  ...

ShareBase also has this capability.

>                                             ...  As a consequence,
> it is the fastest DBMS on the market when using ANSI embedded SQL
						  ^^^^^^^^^^^^^^^^^
> in standard programming languages which is an extremely important
> factor in portability.  All the "big" vendors seem to be saying that
> if you need high performance, you should lock your future into their
> product (this may sound like bashing, but it is very annoying to see
> marrketing hype from companies espousing Standards as well as open
> architecture)!

I think you are engaged in exactly the kind of marketing hype that you
find annoying.  There is an ANSI SQL validation test suite available from
the National Institute of Standards and Technology (NIST); has CA passed it?
I find the tone of your paragraph very annoying. 


--
						Teradata Corporation
Mike Tossy                                      ShareBase Division
miket@blia.bli.com                              14600 Wichester Blvd
(408) 378-7575 ext2200                          Los Gatos, CA 95030
						(Formerly: Britton Lee, Inc.)
	 These are only my opinions.

nigelc@cognos.UUCP (Nigel Campbell) (08/07/90)

In article <319.26b9b861@usgcdh.uucp> setas01@usgcdh.uucp writes:
>
> Re : SQL Precompilers

>CA-DB optimizes the query at preprocess time and stores the access
>plans in the database.  At run time, these access plans are executed.
>Of course, CA-DB has to also deal with changes in the database that
>could invalidate an access plan and it does it automatically.
>

Just my 2 cents about using systems that support storing the compile
unit in a db .

If your application is replicated across multiple instances of the
db (as happens in government offices in Canada quite frequently) you
have to remember to migrate the compile units along with your exe files when
updates to programs occur . 

If your process manipulates multiple databases then you would have
several compile units for the exe file not one . This increases the chance
of errors when distributing updates to the application .

Some systems will not store the reopt'd strategy for a compile unit marked
dirty if the transaction is read-only , so where is the gain over not
storing a compile unit ? As you may never know when this has occured you
end up running a batch process that runs around all the databases and use
a DBA tool (if you have it) to reopt the compile unit ready for the next
day .






-- 
Nigel Campbell          Voice: (613) 783-6828                P.O. Box 9707
Cognos Incorporated       FAX: (613) 738-0002                3755 Riverside Dr.
uucp: nigelc@cognos.uucp || uunet!mitel!sce!cognos!nigelc    Ottawa, Ontario
                                                             CANADA  K1G 3Z4

cohend@roadkill.rtp.dg.com (Dave Cohen) (08/08/90)

In article <CIMSHOP!DAVIDM.90Jul27111745@uunet.UU.NET>,
cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
|> 
|> Why would you want the embedded approach to SQL?  Regardless of how well the
|> precompiler is implemented, the support for it by other products will
never be
|> good.  Consider how many utilities in Unix are screwed up with
embedded SQL in
|> a C program (for instance, FCREF, INDENT and VGRIND).  In my book, the
|> function call approach for sending SQL statements to the database is the
|> correct one (ashame I haven't been given the chance to use it).  It
breaks far
|> less utility routines that are very important to the programmer. 
Also, if you
|> shift back and forth between languages (like C and C++), there is no new
|> syntax to learn or tricks to apply (like faking an SQL precompiler for C to
|> not mess up C++ code).
Two important points here, David:
   1) C to C++ is a trivial language switch. Howza bout Cobol to C?
      Function calls will change dramatically, embedded SQL will
      change very little.
   2) There is an ANSI standard for embedded SQL. No such animal
       exists for function calls, implying no portability among
       vendors.
|> 
|> I believe that the statement "precompiled applications run faster and are
|> easier to code than call level SQL" may be true with what is often supplied,
|> but is wrong as a general statement.  In order to optimize a precompiled
|> statement the way that is suggested, certain aspects of the SQL must
be fixed
|> for that statement (relations to deal with, columns within the relations,
|> operators to be applied).  This cuts out any dynamic SQL capability.  A very
|> good precompiler might determine the difference between a statement
that will
|> be treated dynamically and one that will not and, therefore, make
appropriate
|> optimizations.  However, that is not always possible (for instance):
|> 
|> 		SELECT col FROM table
|> 		WHERE col = :var;
|> 
|> A precompiler might make the assumption that everything is known and,
|> therefore, optimizable, but what if the user chose to fill "var" in
with "1 or
|> col = 2" at run-time.  It changes the whole complexion of the statement
|> (especially from the optimizer's perspective).  This can get far more
complex,
Are you saying that changing a variable's value will slow down a
preoptimized statement more than the dynamic statement's costs for
lexing and parsing ? This is absurd!!!!

|> but it shows the problems that can happen when mixing potentially dynamic
|> statements with statements that can be optimized.  The decision of what is
|> optimizable and what isn't can be put in the hands of the user (he should
|> know, right?) by either limiting what they can do to only optimizable
|> statements or by extending the SQL statements to have some sentinel
to inform
|> the precompiler to perform optimizations.  In doing this, though, what has
|> been bought?  (decreased flexibility or increased user confusion)
|> 
|> A function call approach will generally tend toward the sentinel approach
|> (although the sentinel is now the name of function, not a change to the SQL
|> syntax), so there might be some programmer confusion, but it is just in
|> learning new function calls, not a change in the language itself.  Also,
|> precompilers that worked with database engines typically embedded a set up
|> call in the program to inform the database of an optimized function (after
|> all, the compiled program might run against a copy of the database
that it was
|> compiled against which would not know about any statements stored in the
|> original database).  This could certainly be done by the programmer using a
|> function call approach, its just not as automatic.
|> 
|>    In particular, I'd like to hear about the 'biggies' : DB2 (pretty sure
|>    this does), Oracle, Ingres, Informix, Sybase, Empress, etc.
|> 
|> Britton-Lee did it with there early IDL/C precompiler, but I'm not sure if
|> they do it anymore (there was more flexibility in not doing it).  DEC's
|> RDB/VMS precompiles SQL statements into macro assembler code that it then
|> calls with a generated function call (that's what it looks like anyway).  I
|> don't believe Ingres did this in v5, have no idea about v6.  Sybase
did use a
|> function call approach (no precompiler), but they may have expanded their
|> repitoire since last I saw.  Unsure about the others.
|> --
|> ===================================================================
|> David Masterson					Consilium, Inc.
|> uunet!cimshop!davidm				Mt. View, CA  94043
|> ===================================================================
|> "If someone thinks they know what I said, then I didn't say it!"
|> 
David Cohen                      |    "There's nothin' wrong with goin'    
cohend@dg-rtp.dg.com             |     nowhere, baby, but we should be   
{world}!mcnc!rti!dg-rtp!cohend   |     should be goin' nowhere fast."       
Data General Corporation, RTP, NC|                  - Streets of Fire      

clh@tfic.bc.ca (Chris Hermansen) (08/08/90)

In article <10388@sybase.sybase.com> jeffl@sybase.Sybase.COM (Jeff Lichtman) writes:
>>>...or a plan to the database engine from a user program...
>>>this is almost always a bad idea...
>> I claim that anything *other* than submitting a pre-baked query plan for
>> execution is almost always a bad idea...
>
>Precompiled queries are good.  I believe (please correct me if I'm wrong)
>that the original posting was talking about user-written query plans
>hard-coded into applications and sent to the server.  The problem here
>is that the compilation and plan storage are being done in the wrong
>place.  It's much better for the DBMS to compile and keep the plan,
>so all the user has to do is invoke it by name.

Why not some make-like software in the middle that senses when it needs
to recompile/reoptimize the query based on some dependencies (ie the entry
in the tables table is newer than the precompiled query) and/or some
size related rules based on the table sizes at compile time (which could
possibly be stored with the precompiled query)?  That way, if the short
table became longer, or the schemae changed, or ...

All this said by someone not knowing much about query optimization.

Chris Hermansen                         Timberline Forest Inventory Consultants
Voice: 1 604 733 0731                   302 - 958 West 8th Avenue
FAX:   1 604 733 0634                   Vancouver B.C. CANADA
clh@tfic.bc.ca                          V5Z 1E5

C'est ma facon de parler.

davidm@uunet.UU.NET (David S. Masterson) (08/10/90)

In article <1990Aug8.145522.10694@dg-rtp.dg.com> cohend@roadkill.rtp.dg.com
(Dave Cohen) writes:

   In article <CIMSHOP!DAVIDM.90Jul27111745@uunet.UU.NET>,
   cimshop!davidm@uunet.UU.NET (David S. Masterson) writes:
   |> 
   |> Why would you want the embedded approach to SQL?  Regardless of how well
   |> the precompiler is implemented, the support for it by other products will
   |> never be good.  Consider how many utilities in Unix are screwed up with
   |> embedded SQL in a C program (for instance, FCREF, INDENT and VGRIND).  
   |> In my book, the function call approach for sending SQL statements to the
   |> database is the correct one (ashame I haven't been given the chance to 
   |> use it).  It breaks far less utility routines that are very important to
   |> the programmer.  Also, if you shift back and forth between languages 
   |> (like C and C++), there is no new syntax to learn or tricks to apply 
   |> (like faking an SQL precompiler for C to not mess up C++ code).

   Two important points here, David:
      1) C to C++ is a trivial language switch. Howza bout Cobol to C?
	 Function calls will change dramatically, embedded SQL will
	 change very little.
      2) There is an ANSI standard for embedded SQL. No such animal
	  exists for function calls, implying no portability among
	  vendors.

First, I am not suggesting abandonment of ANSI-SQL (although there are plenty
of reasons for that -- see Codd[90]), just embedded SQL.  Functions can
certainly take SQL as input.

1. SQL will change very little in either case.  The support code around the
SQL will change in both cases.  Agreed, function calls may be a dramatic
change when moving from environment to environment, but (I contend that)
embedded SQL causes a more dramatic change in the use of one environment (for
instance, the breaking of the UNIX support tools for C programming).  Which is
more important (how many environments do you use regularly)?

2. As has been seen, there is limited support for the ANSI-SQL standard.
Almost everyone goes beyond the standard in some way (therefore, cutting
direct portability).  Also, the standard, itself, is limited in what it
(currently) standardizes, so the vendors often have to go beyond the standard
to handle their specifics (as I see from the Rdb/VMS manuals).  Certainly,
this will get better over time, but so could the standard of function call
names, IF people desire it.

   |> I believe that the statement "precompiled applications run faster and are
   |> easier to code than call level SQL" may be true with what is often 
   |> supplied, but is wrong as a general statement.  In order to optimize a 
   |> precompiled statement the way that is suggested, certain aspects of the
   |> SQL must be fixed for that statement (relations to deal with, columns 
   |> within the relations, operators to be applied).  This cuts out any 
   |> dynamic SQL capability.  A very good precompiler might determine the 
   |> difference between a statement that will be treated dynamically and one 
   |> that will not and, therefore, make appropriate optimizations.  However, 
   |> that is not always possible (for instance):
   |> 
   |> 		SELECT col FROM table
   |> 		WHERE col = :var;
   |> 
   |> A precompiler might make the assumption that everything is known and,
   |> therefore, optimizable, but what if the user chose to fill "var" in
   |> with "1 or col = 2" at run-time.  It changes the whole complexion of 
   |> the statement (especially from the optimizer's perspective).  This can 
   |> get far more complex,

   Are you saying that changing a variable's value will slow down a
   preoptimized statement more than the dynamic statement's costs for
   lexing and parsing ? This is absurd!!!!

No!!  I merely said that, if you make no assumptions about what a statement
means, then it is very hard to preoptimize any SQL statements.  However,
thinking about SQL, I think my thought process in making the statements above
was wrong which makes the statements unintelligible.  Suffice it to say,
there is nothing I see to suggest that a function call approach would be any
different in performance than an embedded approach.  Also, "more user
friendly" to me is a caveat emptor -- we could start religious wars over
statements like that.
--
====================================================================
David Masterson					Consilium, Inc.
uunet!cimshop!davidm				Mtn. View, CA  94043
====================================================================
"If someone thinks they know what I said, then I didn't say it!"

setas01@cai.com (08/14/90)

Subject : SQL pre-compilers. 

Bruce E. Golightly   from  bg0l@andrew.cmu.edu (BeeGeeZeroEl) writes : 

>I'm not sure that you'd like the kind of pre-optimization you described. If the
>characteristics of the data base changed to any real extent, performance
>could fall right through the floor. Good performance can be related to 
>optimization that makes use of statistics on the data base AS IT PRESENTLY
>EXISTS. Old stats may be very misleading!!

I agree that old stats may be misleading, if the access plan is not
modified to reflect the changes done to the database. So  the access plans 
must change depending on the new stats. This is taken care of in CA-DB.
CA-DB provides a command to update statistics and invalidate all related 
queries. However, if there are changes in fields or indexes, all the compiled
queries accessing the tables are automatically invalidated. 

The invalidated queries are then re-compiled the next time any of the query
is executed. Thus, it automatically compiles before executing it and stores
the access plan for future use. The user need not have to manually do this.
CA-DB version 1.5 also provides a command to validate all compiled queries
that have been invalidated.  

This is the kind of pre-optimization supported by CA-DB from Computer
Associates, a relational DBMS available on VAX/VMS and UNIX environments.

Ashok Sethi                    UUCP      :     mit-eddie!usgcdh!setas01    
			       INTERNET	 :     USGCDH!setas01@eddie.mit.edu