[comp.databases] Intelligent Databases

davek@rtech.rtech.com (Dave Kellogg) (12/03/89)

In article <7323@sybase.sybase.com> tim@binky.UUCP (Tim Wood) writes:

>Client/server is a specific technical concept.  No one owns it, no one
>can claim they own it.
>This is quite different from the "intelligent
>database" slogan, which doesn't express a specific technical concept.

First off, as evidenced by the recent debates about client/server and 
other debates which haven't been on the net, it's pretty clear that
client/server isn't quite so "specific" as you may think.  Although I
haven't been following the nitty-gritty too much, have (and do correct me
if I'm wrong) you all been discussing between the (at least) 4 different 
implementations of the SERVER side of client/server?

To me, it seems like you are off discussing schedulers when you could off
discussing higher-level architectural issues.  Sure, when a database takes
control of the OS it could do better scheduling (does anyone?), but how
about answering the original (more than one-month old question?) from the
guy at Kodak about basic differences in client/server systems.

As I see it, the biggest differences come in the process implementation 
of the SERVER.  That is, I see at least 4 different types of client/server
systems (not to mention hardware vendor database systems, like Tandem) each
with their own strengths and weaknesses.

(examples not guaranteed 100% correct, but I'll try my best)

1. Server-Per-User

	Single single-threaded server per user (i.e. 2 process per user model).
	Examples:  Oracle on UNIX, Informix, Ingres Release 5

	Strength:  multiprocessing
	Weakness:  overhead on heavy loads

2. "No Server" or Shared Memory Server

	No distinct OS server process exists.  Rather the database engine
	code is mapped into shared memory and runs in the context of the 
	application code, and usually in a protected access mode (e.g. on
	VAX/VMS in executive or supervisor mode)
	Examples:  Rdb, Oracle on VMS.

	Strength:  Reduces processes on VAX/VMS (in networks, too?)
	Weakness:  No access modes on UNIX, not implementable 
		   without sacrificing memory protection of server.
		   (must degenerate to server-per-user to protect server's
		   address space on UNIX)

3. Single Server

	Single multithreaded server which has exclusive control over a set of 
	databases. 
	Example:  Sybase

	Strength:  Reduced per-user overhead
	Weakness:  No multiprocessor support

4. Multiserver

	Multiple multithreaded servers which may access same sets of databases.
	Examples:  Ingres 6.x, Interbase[?]

	Strength:  Reduced per-user overhead without exclusing multiprocessors
	Weakeness: No parallelization of queries (which nobody does yet anyway)
		   Does symmetric multiprocessing, not parallel processing 

The above seems like a better level for discussion, and I would be appreciative
if anyone could correct any misconceptions in the above.  And back to my other
theme, if client/server is such a specific "technical term" than how come I
can easily fire off four separate "client/server" architectures and
differentiate them in all of about 20 lines?

>Tim continues...
>This is quite different from the "intelligent
>database" slogan, which doesn't express a specific technical concept.

Finally, given that I've demonstrated my primary point (that client/server 
is by no means a "specific technical term," but rather to a large extent 
a simple marketism), I will add that "intelligent database" is certainly
in the same league of technical descriptiveness as client/server.  And, in
fact, I'll wager that in the upcoming months/years that both intelligent
database and intelligent systems will come to receive the same general
attention as did client/server.

For completeness, I'll add that Ingres Corp defines an intelligent database
[server] as one that manages all three of 

	1. simple business data (characters, numbers and the like)
	2. knowledge of data interrelationships (e.g. referential integrity)
	   and knowledge of business policies.
	3. objects more complex than characters and numbers (e.g. ordered
	   pairs, arrays, vectors, matrices, latitude/longitude, cubes, circles,
	   time series, etc.)

Knowledge is managed primarily via rules, which are differentiated from 
triggers in the data sheets I mentioned in the last posting.  At the highest
level it's simply that rules allow unlimited rules/table/operation (triggers,
and I'm sure Tim will correct me if I'm wrong allow only 1 trigger/
table/operation), and recursion (which triggers lack?).  In addition, there
are some forward-chaining issues and behavior at the boundary value.  (Tim,
are triggers still silent when they don't forward chain as expected, and 
is the forward chaining depth now greater than zero?)

"Objects" (things more complex than characters and numbers) are managed via
standard SQL, with the notable exception that users may define their own 
datatypes from scratch (not simply a domain-like remapping of an existing 
type) along with user-defined "builtin" SQL functions, and user-defined 
context sensitive overloading of classical operators (e.g. what does "+"
mean in the context of a point or complex number?)

Finally, and thanks to those who've hung in this long, I'll add that the
term intelligent database does have some credibility in the literature as
well.  Although our implementation is quite different from the hypothetical
[?] one mentioned in the following database text, many of the ideas and 
principles in the areas of bulding intelligence in at the server level
are similar to those mentioned in:

	"Intelligent Databases"
	Kamran Parsave et al.
	Wiley Publications, 1989

Dave Kellogg
All opinions are of course my own and not necessarily those of my employer

jkrueger@dev.dtic.dla.mil (Jonathan Krueger) (12/04/89)

In addition, for:

>1. Server-Per-User
>4. Multiserver

	Strength:  can use process abstractions to control DBMS
		   e.g. priorities, limits, quotas, accounting,
		   monitoring, profiling, suspend, kill, dump core,
		   assign pager, communicate with other processes;
		   with appropriate system management, can do this
		   to granularity of single user or application.

-- Jon
-- 
Jonathan Krueger    jkrueger@dgis.daitc.mil   uunet!dgis!jkrueger
Isn't it interesting that the first thing you do with your
color bitmapped window system on a network is emulate an ASR33?

tim@binky.sybase.com (Tim Wood) (12/04/89)

In article <4220@rtech.rtech.com> davek@rtech.UUCP (Dave Kellogg) writes:
>In article <7323@sybase.sybase.com> tim@binky.UUCP (Tim Wood) writes:
>
>>Client/server is a specific technical concept.  
>	First off, as evidenced by the recent debates about client/server and 
>other debates which haven't been on the net, it's pretty clear that
>client/server isn't quite so "specific" as you may think.  
>	As I see it, the biggest differences come in the process implementation 
>of the SERVER.  That is, I see at least 4 different types of client/server
>systems: 
>1. Server-Per-User [i.e. front-end/back-end process pair per user]
>2. "No Server" [i.e. DBMS linked with user application in one process]
>3. Single Server [i.e. DBMS is one process multiplexed across n users]
>4. Multiserver [i.e. DBMS is multiple cooperating multiplexed processes]

You've given a good overview of RDBMS architectures. 

>And back to my other theme, if client/server is such a specific 
>"technical term" than how come I can easily fire off four separate 
>"client/server" architectures and differentiate them in all of about 20 lines?

Those aren't client/server architectures, those are RDBMS internal 
archtitectures.

Only #3 & #4 allow enough control over the workings of the DBMS to
achieve the specialization of function that is part of the definition
of being a server in a client/server environment.  

>>This is quite different from the "intelligent
>>database" slogan, which doesn't express a specific technical concept.
>
>Finally, given that I've demonstrated my primary point (that client/server 
>is by no means a "specific technical term," but rather to a large extent 
>a simple marketism), 

I'm not yet satisfied on this.  One can observe a computing installation looking
for characteristics of a client/server organization: is the user ("client")
view of computing resources as a directory of named services which the 
client can access (if properly authorized) without regard to how
that service is implemented, where it resides, etc?  Do the offerers of
the services ("servers") have full integrity control over the resources
they make available, and define the interface through which it will be used?
Is there an overall "object orientation" in the environment?  If the
answers to these questions is yes, then IMO you are probably looking at a
client/server-based environment.  

>I will add that "intelligent database" is certainly
>in the same league of technical descriptiveness as client/server.  
>
>For completeness, I'll add that Ingres Corp defines an intelligent database
	This is what I have wanted to know...
>[server] as one that manages all three of 
>	1. simple business data (characters, numbers and the like)
>	2. knowledge of data interrelationships (e.g. referential integrity)
>	   and knowledge of business policies.
>	3. objects more complex than characters and numbers (e.g. ordered
>	   pairs, arrays, vectors, matrices, latitude/longitude, cubes, circles,
>	   time series, etc.)
>	Knowledge is managed primarily via rules, which are differentiated from 
>triggers in the data sheets I mentioned in the last posting.  
>	"Objects" (things more complex than characters and numbers) are managed 
>via standard SQL, [except] that users may define their own datatypes from 
>scratch (not simply a domain-like remapping of an existing type) along 
>with user-defined "builtin" SQL functions, and user-defined 
>context sensitive overloading of classical operators (e.g. what does "+"
>mean in the context of a point or complex number?)

No's 1 & 2 and the rules feature belong under client/server, IMO.  They 
pertain to the full responsibility of the DBMS to maintain integrity.
A "server" can't really be one without the intelligence you describe.

No. 3 could be client or server, as I see it.  On one hand, ADT's
(oops, abstract data types--echo from my RTI days :-) are a nice 
piece of object-orientation in the DBMS.  On the other hand, it
might be more flexible in practice for the application to handle 
the many arbitrary object types that a user could create.  Many 
abstract operations might be sufficiently handled in the presentation-level
stuff running on the client.  Why burden the DBMS with these functions,
especially if they are CPU intensive?  Then you could be stealing server
cycles from bread-and-butter transactions.  I'm just not sure if ADT's 
are closer to the front-end processing than to the data management, and
more of a decision-support feature.  

>[...] rules allow unlimited rules/table/operation (triggers,
>and I'm sure Tim will correct me if I'm wrong ....
>(Tim, are triggers still silent when they don't forward chain as expected, 
>and is the forward chaining depth now greater than zero?)

I'll get to this in a later posting; I'm not at work so can't research an
answer as complete as you'd want.  By "forward chaining", I assume that's
what we call "cascading", i.e. triggers firing other triggers.

>Finally, and thanks to those who've hung in this long, I'll add that the
>term intelligent database does have some credibility in the literature as
>well...  many of the ideas and principles in the areas of bulding 
>intelligence in at the server level are similar to those mentioned in:
>"Intelligent Databases", Kamran Parsave et al., Wiley Publications, 1989
>
>Dave Kellogg

Now I'm more prepared to accept "intelligent database".  Thanks.
-TW

Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608	  415-596-3500
tim@sybase.com          {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
		This message is solely my personal opinion.
		It is not a representation of Sybase, Inc.  OK.

forrest@phobos.sybase.com (Jon Forrest) (12/04/89)

In article <4220@rtech.rtech.com> davek@rtech.UUCP (Dave Kellogg) writes:
>3. Single Server
>
>	Single multithreaded server which has exclusive control over a set of 
>	databases. 
>	Example:  Sybase
>
>	Strength:  Reduced per-user overhead
>	Weakness:  No multiprocessor support
		   ^^^^^^^^^^^^^^^^^^^^^^^^^

Does this mean that if we come out with a multiprocessor server that
we'll be perfect?

----
Anything you read here is my opinion and in no way represents Sybase, Inc.

Jon Forrest WB6EDM
forrest@sybase.com
{pacbell,sun,{uunet,ucbvax}!mtxinu}!sybase!forrest
415-596-3422

dlw@odi.com (Dan Weinreb) (12/05/89)

There could be a category in between your categories 3 and 4: a single
multithreaded server (in the sense that all threads run within a
single address space) that nevertheless is capable of utilizing
hardware multiprocessors, because the threads are implemented by the
operating system.  Sequent's version of Unix is one example of an
operating system that can do this.  In earlier postings we heard about
standardization efforts in this area.

Dan Weinreb		Object Design, Inc.		dlw@odi.com

Bob_Campbell.ZORRO@gateway.qm.apple.Com (Bob Campbell) (12/05/89)

In article <7335@sybase.sybase.com> tim@binky.sybase.com (Tim Wood) writes:
> Many abstract operations might be sufficiently handled in the 
presentation-level
> stuff running on the client.  Why burden the DBMS with these functions,
> especially if they are CPU intensive?  Then you could be stealing server
> cycles from bread-and-butter transactions.  I'm just not sure if ADT's 
> are closer to the front-end processing than to the data management, and
> more of a decision-support feature.  

I agree that in a few cases like Pictures and Sounds there is very little 
to do on the server that is useful.  However Abstract Data Types provide 
much more then just blob data storage.  I have always understood Abstract 
Data Types to include data encapsulation, which means that I can define a 
type called RsoundS which includes the raw sound, as well as other 
information like sound format, sound playing time, sound title, and 
description of the sound.  This data encapsulation provides information 
which is useful to the server when searching and sorting.  This 
information is specific to the Abstract Data Type and must be understood 
at the server level for the server to do intelligent queries.

All of the existing databases that I have used do not provide all of the 
primitive operations that I would like to use.  Neither Sybase or Ingres 
(or any of the few other databases) that I have used provide even what I 
consider the minimal subset of string matching operators.

All of my transactions are Rbread-and-butterS even if they are CPU 
intensive, after all we put the database on a big machine so that the 
smaller machines spend less time sorting and searching and more time 
interacting with the user.  If we have to retrieve large amounts of 
database and then filter them at the workstation then we might as well put 
the whole database on the workstation (and by the way neither Sybase nor 
Ingres runs on my Macintosh...).

I like what Ingres is doing with Abstract Data Types, and I hope that some 
of the other companies take a good look what they are doing.

I don't have a cute trailer**************
Applelink: BOBC (BOBC@Applelink.apple.com)
Quickmail: Bob Campbell@ZORRO (Bob_Campbell.ZORRO@gateway.qm.apple.com)

nico@unify.uucp (Nico Nierenberg) (12/06/89)

In article <7337@sybase.sybase.com> forrest@sybase.com writes:
>In article <4220@rtech.rtech.com> davek@rtech.UUCP (Dave Kellogg) writes:
>>3. Single Server
>>
>>	Single multithreaded server which has exclusive control over a set of 
>>	databases. 
>>	Example:  Sybase
>>
>>	Strength:  Reduced per-user overhead
>>	Weakness:  No multiprocessor support
>		   ^^^^^^^^^^^^^^^^^^^^^^^^^
>
>Does this mean that if we come out with a multiprocessor server that
>we'll be perfect?
>

I realize that you are being funny, but there is a serious point here.

The answer is no.  When you come out with a multi-process server it will
have the advantage of multi-processor support, but it will have the
disadvantages of having to coordinate multiple OS processes in shared
memory.  Like all software, DBMS design is a set of trade offs and this
is one of them.

The reduced per-user overhead was most significant on some
of the older machines like the original Sun-3.  These machines had
relatively primitive CPUs.  This made re-mapping large processes very
expensive.  With the 68030 this re-mapping is virtually free.  The
major advantage of the single process server today is reduced memory
utilization.

Of course there is the additional advantage of not needing to coordinate
the usage of shared data structures between multiple server processes.
It is this advantage which will vanish when Sybase introduces a multiple
server product.

>----
>Anything you read here is my opinion and in no way represents Sybase, Inc.
>
>Jon Forrest WB6EDM
>forrest@sybase.com
>{pacbell,sun,{uunet,ucbvax}!mtxinu}!sybase!forrest
>415-596-3422

Nicolas Nierenberg
Unify Corp.

jkrueger@dev.dtic.dla.mil (Jonathan Krueger) (12/06/89)

tim@binky.sybase.com (Tim Wood) writes:

>Those aren't client/server architectures, those are RDBMS internal 
>archtitectures.

OK, first principles then: how do you define "architecture"?
For instance, would you accept Blaauw [1970]?

>ADTs ... are a nice piece of object-orientation in the DBMS.

Equally true the other way around.  And equally silly.
Each model has its merits.

>On the other hand, it
>might be more flexible in practice for the application to handle 
>the many arbitrary object types that a user could create.  Many 
>abstract operations might be sufficiently handled in the presentation-level
>stuff running on the client.  Why burden the DBMS with these functions,
>especially if they are CPU intensive? 

Because it's demonstrably unsafe to do so, it doesn't support
distributing the load, and it makes many query optimizations
impossible.  Your company makes quite a deal out of the first
reason, with respect to integrities: why should we expect less
of domain integrity?

>Then you could be stealing server
>cycles from bread-and-butter transactions.

One buys a server to execute one's database engine.  It needs
what it needs.  If one's data types are expensive, it needs more.

>I'm just not sure if ADT's 
>are closer to the front-end processing than to the data management, and
>more of a decision-support feature.  

Neither: they're one's data.  Consider arbitrarily large fixed point,
for instance.  Expensive to implement regardless of where you put it.
Now, what would its use be?

-- Jon
-- 
Jonathan Krueger    jkrueger@dgis.daitc.mil   uunet!dgis!jkrueger
Isn't it interesting that the first thing you do with your
color bitmapped window system on a network is emulate an ASR33?

tim@binky.sybase.com (Tim Wood) (12/07/89)

In article <3584@dev.dtic.dla.mil> jkrueger@dev.dtic.dla.mil (Jonathan Krueger) writes:
>tim@binky.sybase.com (Tim Wood) writes:
>>Those aren't client/server architectures, those are RDBMS internal 
>>archtitectures.
>
>OK, first principles then: how do you define "architecture"?
>For instance, would you accept Blaauw [1970]?

Well, if you hum a few bars, I can fake it. :-)

Seriously, I'm not familiar with the reference, and would appreciate
a summary of the results.

My point is, what happens in the DBMS is not the whole computing environment.
I'm talking about a higher-level archtitecture, that which the users of
the DBMS server are a part, as well as the DBMS itself.  The overall
computing environment may or may not have the same patterns of order
as are found in the DBMS internals.

>>ADTs ... are a nice piece of object-orientation in the DBMS.
>
>Equally true the other way around.  And equally silly.
>Each model has its merits.

Which, wha, way are we going?  I don't understand your point.
ADT model vs. DBMS model ??

>>On the other hand, it might be more flexible in practice for the
>>application to handle the many arbitrary object types that a user could
>>create.  Many abstract operations might be sufficiently handled in the
>>presentation-level stuff running on the client.  Why burden the DBMS
>>with these functions, especially if they are CPU intensive?
>
>Because it's demonstrably unsafe to do so, it doesn't support
>distributing the load, and it makes many query optimizations
>impossible.  Your company makes quite a deal out of the first
>reason, with respect to integrities: why should we expect less
>of domain integrity?

Good point.  I would be interested to know how many current or projected
applications out there need high TPS throughput on complex user-defined 
datatypes.  Or whether most of the TPS-critical stuff operates on prosaic
things like ints, f4/8's, dates and moneys.  

>>Then you could be stealing server
>>cycles from bread-and-butter transactions.
>
>One buys a server to execute one's database engine.  It needs
>what it needs.  If one's data types are expensive, it needs more.

Agreed, I'm just commenting that the simple-transaction and the ADT
workloads probably won't mix well for throughput in the same server.

>>I'm just not sure if ADT's are closer to the front-end processing than
>>to the data management, and more of a decision-support feature.
>
>Neither: they're one's data.  Consider arbitrarily large fixed point,
>for instance.  Expensive to implement regardless of where you put it.
>Now, what would its use be?

Again, it depends on user needs.  Some users might need nothing more
than BLOB type in the server, and will do most manipulation on the
front-end, at least for the near term.  Anyone want to comment?
A syntactically-correct C-language datatype would be neat. :-)
As for "bignum"s, I don't have an application idea offhand (although
tracking the Federal debt might be one. :-)  You?
-TW




Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608	  415-596-3500
tim@sybase.com          {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
		This message is solely my personal opinion.
		It is not a representation of Sybase, Inc.  OK.

jkrueger@dgis.dtic.dla.mil (Jon) (12/08/89)

tim@binky.sybase.com (Tim Wood) writes:

>ADT model vs. DBMS model ??

No, ADT's vs. object-oriented.   Both those models have their merits.

>Good point.  I would be interested to know how many current or projected
>applications out there need high TPS throughput on complex user-defined 
>datatypes.  Or whether most of the TPS-critical stuff operates on prosaic
>things like ints, f4/8's, dates and moneys.

Good point.  I'd like to know too.  But there's also a chicken-and-egg
effect here.  How much TPS-critical stuff doesn't use ADT's because no
one knows how to make them fast?  In how many cases is it also true
that the local folk just don't understand safer methods, and won't be
persuaded by any amount of data that they can be made to go fast?

>Again, it depends on user needs.  Some users might need nothing more
>than BLOB type in the server, and will do most manipulation on the
>front-end, at least for the near term.  Anyone want to comment?

Then they don't have data types.  They have shared persistent storage.
Outside of atomicity and serializability, it's not that different
from ordinary files.

>As for "bignum"s, I don't have an application idea offhand (although
>tracking the Federal debt might be one. :-)  You?

Yep, you're dead on.  Money datatypes are usually fixed width.  They do
all right for dollars; lira or yen are problems.  Arbitrary sizing is
an efficient way to handle sizing between currencies and inflation
within currencies.  Of course, the implementation of such animals
brings up a host of embedded language problems.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
Isn't it interesting that the first thing you do with your
color bitmapped window system on a network is emulate an ASR33?