[comp.arch] DataBase Machines

kfw@ecrcvax.UUCP (Fai Wong) (10/10/89)

	Hi,

	I am seeking for information on commercial database machines. I've
	read about the Britton-Lee IDM, the Intel iDBP and the Teradata
	DBC. The information I have describing these machines are out
	of date (ca 1985). Could anyone tell me if these machines are
	still being made ? If they are, how  & where could I  obtain more
	information about them ? I would also appreciate any information
      	on any other commercial database machines.

	Thanks a billion in advance.

	Cheers,
	Kam-Fai. (please send e-mail to kfw@ecrcvax.UUCP)	

johnl@esegue.segue.boston.ma.us (John R. Levine) (10/10/89)

In article <785@ecrcvax.UUCP> kfw@ecrcvax.UUCP (Kam-Fai Wong) writes:
>	I am seeking for information on commercial database machines. I've
>	read about the Britton-Lee IDM, the Intel iDBP and the Teradata
>	DBC. ...

I was pretty much the only OEM user of the iDBP.  I wrote a compiler for
its command language (a tokenized relational algebra with a lot of other
stuff) which either ran standalone or embedded in C, and a few demo
applications, a text searcher and a QBE subset.  The logical design was
great, but the implementation by a bunch of iRMX jocks left a lot to
be desired.  They never finished it and withdrew it before it shipped.

The design was basically relational, except that you could store pointers
either to precompute joins or to implement oldthink architectures, and
you could also have pointers to regions of unstructured data for text or
images.  I liked the design better than the IDM because it let you
manipulate your data in flexible ways; it was easy for example to get
each record in file A followed by the records in file B that are joined
to it, without having to generate a joined file and retrieve the A record
for every B record, stuff like that.
-- 
John R. Levine, Segue Software, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@esegue.segue.boston.ma.us, {ima|lotus|spdcc}!esegue!johnl
Massachusetts has over 100,000 unlicensed drivers.  -The Globe

lamaster@ames.arc.nasa.gov (Hugh LaMaster) (02/28/90)

In reviewing my (old) files on DataBase machines (e.g. ShareBase == Britton Lee)
it appeared that the main "problem" that a DBM is supposed to address is
the limitation of main memory bandwidth.  Specifically, if processing
can be done "in the channel" rather than on the main processor(s), then
only data which will be used later will be written to memory.  Also, some
level of parallelization can take place, and cycle consuming operations that
can be done in parallel can be offloaded.  Presumably, this
allows cheaper machines with limited memory bandwidth to avoid wasting
that bandwidth on reading data into memory which will be filtered out at
the first pass.  Or, which is wasted on "trivial but expensive" operations
such as compression.  Presumably the result is cheaper parallelism.

Now, this is an issue which reappears in many forms:

Specialized network processors,
Specialized graphics processors, and,
Specialized DataBase processors.

The question with any such processor is whether it provides a big
enough performance boost to keep the product ahead of general purpose
machines, which are usually on a much faster design cycle.


I have several questions:

1)	Just what kind of performance, by various appropriate measures, do the 
	current crop of DBMs provide (BTW - I notice that there are now
	2 measures of standard TPS units - any enlightenment and/or
	correspondence between the two appreciated) vs standard architecture
	machines.  Does anyone know what kind of performance IBM gets out
	of ACP and RDBMSs, say, on its biggest iron?  How about smaller
	machines.  Sun was quoting pretty high numbers, by one of the TPS
	measures, on the 4/490, for a bus based workstation server.  
	How about other, more complex operations?

2)  Has anyone considered an extended filesystem approach for Unix,
	wherein specialized DataBase operations are supported in
	the filesystem, and, a specialized bus-based (e.g. VME, FutureBus)
	processor is attached to the system.  This would appear to be more
	flexible and allow many companies to access the functionality, the
	same way that they do with new disk controllers, etc.  What primitive
	operations should be supported: powerful enough to reduce traffic
	to memory by a large fraction, general enough to use with a wide
	variety of database systems?  What I envision is a definition for a
	dbfs, database filesystem, which would support various RDBMSs such as 
	Oracle and Sybase via multiple dbfs's.  The parallelism would have 
	to be across controller boards/filesystems, with processors in the
	controllers for the operations.  You might support better keyed access,
	compression, lock support (what kind?), disk optimizations (the usual),
	and any other useful parallelizable operations.  In configuration terms,
	you might have 4 controller boards on your VME (say) system, and get 
	some fraction of parallel performance across the boards.

Comments?

  Hugh LaMaster, m/s 233-9,  UUCP ames!lamaster
  NASA Ames Research Center  ARPA lamaster@ames.arc.nasa.gov
  Moffett Field, CA 94035     
  Phone:  (415)604-6117       

jkrueger@dgis.dtic.dla.mil (Jon) (03/01/90)

lamaster@ames.arc.nasa.gov (Hugh LaMaster) writes:

>[about database machines as an example of the tradeoffs
>involved in designing specialized machines of various sorts]

>The question with any such processor is whether it provides a big
>enough performance boost to keep the product ahead of general purpose
>machines, which are usually on a much faster design cycle.

It seems (I have no numbers) that database machines provided a big
enough boost during roughly 1983 to 1988.  After that those KMs adnd
shortened design cycles narrowed the performance gap, reversed the
price/performance gap.

But a lot of database machines, then and now, justify their purchase
not by providing better performance for the database user but rather by
giving the general purpose machine back to everybody else.  Database
applications can be bad citizens in general timesharing environments.
People like to offload them to a separate machine.

Of course, since users connect to their databases from front ends
running on other machines, the catch is how many problems partition
themselves poorly between front and back end, thus bottlenecking on
network throughput and sometimes latency.  No amount of performance in
the workstation or database machine helps in this case, and it is common.

>I have several questions:

>1)	Just what kind of performance, by various appropriate measures, do the 
>	current crop of DBMs provide (BTW - I notice that there are now
>	2 measures of standard TPS units - any enlightenment and/or
>	correspondence between the two appreciated) vs standard architecture
>	machines.

Warning: TPS measurements are susceptible to manipulations that make
string inlining on Dhrystone look honest to a fault.  And TPS's are
almost never measured or reported with anything but intent to deceive.
And even when done honestly, just as compiler optimizations achievable
on small synthetic benchmarks can be poor predictors of performance for
real applications, vendor X's TPS's on TP1 or DebitCredit can lead you
down a garden path.

A simple example is that DebitCredit requires you to scale table size
by measured TPS's, you have to titrate it until your measured TPS's are
obtained against a table of appropriate size.  Reported TPS's seldom
also report enough information to determine whether this was done.
This is just a simple example; it gets much, much worse.
There are many, many variables that can lead to misleading results, and
misleadingly high results are seldom an accident.

Even when done honestly, TPS are a highly derived unit.  They're even
further from characterizing overall DBMS performance than "MIPS" are
from characterizing overall computational performance.  TPS results
aren't comparable across software, hardware, or communications pathways
between front and back ends.  E.g. you can use results to measure
improvements due to software (e.g. single versus multiple server
processes) by comparing on same hardware.  But you can't compare
results from database machines with results from different hardware and
software, the data simply isn't meaningful.

And when the data is collected, all it tells you is how well you did on
short, simple queries.  This is useful to people who want to attach
their point-of-sale system to a general purpose database engine, but of
very limited use to the rest of us.  For instance, TPS numbers don't
contribute a thing to predicting performance of CAD/CAM databases.

>2)  Has anyone considered an extended filesystem approach for Unix,
>	wherein specialized DataBase operations are supported in
>	the filesystem

See:

	The Use of Technological Advances to Enhance
	Database System Performance, P. Hawthorn and M.
	Stonebraker, ACM-SIGMOD Conference Proceedings,
	ACM-SIGMOD International Conference on Management
	of Data, Boston, Massachusetts, June 1979.

	Performance Enhancements to a Relational Database
	System, M.  Stonebraker, J. Woodfill, J.
	Ranstrom, M. Murphy, M. Meyer, and E.  Allman,
	ACM Transactions on Database Systems, vol. 8, no.
	2, June 1983.

Both reprinted in:

	The INGRES papers, M. Stonebraker, ed.,
	Addison-Wesley, 1986, chapters 5 and 6.

> and, a specialized bus-based (e.g. VME, FutureBus)
>	processor is attached to the system.  This would appear to be more
>	flexible and allow many companies to access the functionality, the
>	same way that they do with new disk controllers, etc.

This idea is new as far as I know.

>  What primitive
>	operations should be supported: powerful enough to reduce traffic
>	to memory by a large fraction, general enough to use with a wide
>	variety of database systems?  What I envision is a definition for a
>	dbfs, database filesystem, which would support various RDBMSs such as 
>	Oracle and Sybase via multiple dbfs's.  The parallelism would have 
>	to be across controller boards/filesystems, with processors in the
>	controllers for the operations.  You might support better keyed access,
>	compression, lock support (what kind?), disk optimizations (the usual),
>	and any other useful parallelizable operations.  In configuration terms,
>	you might have 4 controller boards on your VME (say) system, and get 
>	some fraction of parallel performance across the boards.

Analysis of existing database systems shows that simple, short queries
have distinctly different performance characteristics from
complex queries.  (TPS (sometimes) predicts performance for
applications in which the former predominate).  Things that help
are fast commits, fast acquisition and relinquishing of locks,
deferred writes, buffering of various sorts.  Some of this can
be helped by specialized hardware, but currently the bottlenecks
tend to be software.

Also the software can go a long way to reducing raw hardware
requirements.  For instance, a log file can record committed
transactions, from which the main table is silently (safely,
transparently) updated.  This decouples latency from throughput.  You
can then buy a high latency device to store your tables and a high
throughput device to store the log file.  The economies achievable by
not requiring both in a single device are said to be substantial
(again, no numbers, sorry).

Things that help complex queries are, well, complex.  Hardware assists
are more likely to help here, but in very close collaboration with
matching software.  For example, keeping as many parallel paths busy as
possible is intimately connected with query decomposition.  To saturate
your special hardware resources, you have to know which parts of the
query can execute in parallel, how much faster the total query will
(most likely) complete if a given amount of resource is invested to a
given part of the query at a given stage, not to mention the costs
and latencies of fetching from databases distributed over multiple machines.

Thus my guess is that general purpose machines will continue to have an
edge for at least several years.  Special hardware of various sorts
will help some, but until the software can make intelligent use of it
in real time, its potential will be mostly unrealized.

-- Jon
-- 
Jonathan Krueger    jkrueger@dtic.dla.mil   uunet!dgis!jkrueger
The Philip Morris Companies, Inc: without question the strongest
and best argument for an anti-flag-waving amendment.