[comp.databases] Query performance in commercial RDBMS

oysteing@idt.unit.no (Oystein Groevlen) (05/15/91)

Thank you to everybody who responded to my request concerning query
performance in commercial RDBMS. Our master thesis is due in the end
of June and I will make the results of our findings available to those
of you who have requested it after that date. 

Below I have summarized the responses I got to my last posting. I
would be very thankful for any further comments on the topic. Why not
post to the newsgroup so we can some interesting discussion going on
here. If you prefer replying by mail, I will summarize to the net. 

-------------------------------------------------------------------------------
From: jpk@ingres.com (Jon Krueger)

> What do you consider the most important aspects for choosing a DBMS?

Quality of the product and the company.  Productivity achievable with
development tools.  Quality achievable for resulting applications
without sacrificing that productivity.

> How important is the performance of queries?

Today, for most database applications, performance is a solved problem.
Design your database correctly and adequate performance will not require
special effort.  There are some applications for which this is not true;
they no longer define the common case.

> Another impression is that often the main reason for choosing a big
> RDBMS is the tools that comes with it that make it easy to build
> applications on it and design nice user interfaces.

Or the perception that such tools must exist if the RDBMS is sold
by a big company.  Buyers who spend time running benchmarks instead
of first verifying that tools exist make this mistake.

> We were not impressed by the performance measured on ORACLE. For
> General Range Queries (GRQ) with low sectivity on the attributes,
> indexes was of no help. That means that the best you can do is a full
> scan of the whole table. As far as I know a good disk today can deliver
> data at a speed somewhere around 1-3 MB/sec. It will take some time to
> search through 1 GigaByte at that speed! 

That's why our product comes with a state-of-the-art query optimizer
that uses statistical information about your data.  It often decides
correctly *not* to use an index when it estimates that fewer rows
would be returned from a different query plan.  Products that simply
use indexes when present will indeed force exhaustive scans needlessly.

> Our question is if people with large databases are satisfied with this
> query performance. Maybe the queries of the  GRQ-type is considered
> impossible in the same way as walking on water. It would have been nice to
> be able to do it, but it is really not possible. 

Before concluding that, please evaluate Ingres.

> Our impression is that many databases which contains very large data
> volumes have a pretty simple data model, or at least the relations
> between those tables that contains most of the data is pretty
> straightforward. In most of the examples of very large databases (> 1
> GB) that we have come across, the structure of the databases have been
> hierarchical. Is this your impression? Maybe the data model has to be
> simple in order to be able to collect large data volumes. 

This was once an article of belief among large systems.  Products
like ours, and large operations using them, have provided counter-examples.

-------------------------------------------------------------------------------
From: ethan@thinc.COM (Ethan.Lish@THINC.COM)

> What do you consider the most important aspects for choosing a DBMS?

	(Since you say 'DBMS' I will assume you mean the entire package
	rather then just the front-end vs the back-end.  Because one may also
	write Progress application to use a DBIM - Database Interface Module
	to "attach" to *other* DBMS Back-ends: Oracle, RMS, Rdb...)

	Reliability: My data must be safe.

	Rapid Development: Easy 4Gl

	Professional Look: My screens and reports have to *look* good

> How important is the performance of queries? 

	Yes, one usually varies the underling hardware to increase
	performance, once committed to a development environment.

>                                              We have the impression
> that many organizations when needing a tool for storing and retrieving
> data, choose one of the big commercial RDBMS without considering if
> they really need it or if it is the best tool for their applications.

	I agree, but this says more about the organization rather than the
	package.  This type of organization must have *some* set agenda that
	lead them to such a conclusion.

-------------------------------------------------------------------------------
From: brian%edat@uunet.UU.NET

Here are the results of a study a did a few years ago for my
organization.  I interviewed most of the engineering staff and
accumulated a list of features that they thought were relevant in
purchasing a database.  I then resubmitted the list and asked
everyone to grade each item on a scale of 1-10 10 meaning the most
important issue.  This is what I found:

				Engrs	Mngmt	Final Average
Data Integrity & Recovery	10.0	10.0	10.0
Reliability			 9.7	10.0	 9.8
Development Tools		 9.0	 9.0	 9.0
Network Interoperability	 9.6	 7.5	 9.0
Audit Systems			 8.6	 8.8	 8.6
Distributed Capability		 8.5	 8.8	 8.6
Performance			 8.5	 8.8	 8.6
Maintenance			 8.0	 8.8	 8.2
Implementation			 7.9	 8.5	 8.1
Decision Support Systems	 8.6	 6.5	 8.0
Ad Hod Queries			 7.2	 8.0	 7.4
Standards Compliance		 6.7	 8.0	 7.1
Resource Requirements		 6.4	 7.0	 6.6
Marketing/Accounting Tools	 5.6	 7.0	 6.0
Company Financials		 5.4	 6.8	 5.8
Cost				 4.6	 7.0	 5.3
Third Party Interfaces		 5.6	 4.3	 5.2
Source Code			 5.3	 4.8	 5.1

There were more Engineers than Managers, hence their lower
weighting.

-------------------------------------------------------------------------------
From: grossman@eola.cs.ucf.edu (David Grossman)
To: oysteing@idt.unit.no

>Our impression is that many databases which contains very large data
>volumes have a pretty simple data model, or at least the relations
>between those tables that contains most of the data is pretty
>straightforward.

Your right about this. I worked as a systems programmer for two years
before returning to school and people do not use complex production 
databases. Performance is the key, but I think your making a mistake
to focus on Oracle as a database used to provide performance. People
who need performance don't buy Oracle unless they are simply
uninformed. The major IBM products and database machines are used...these
include DB2 (over 7500 installations at large mainframe sites), TERADATA,
etc. 

I agree data with low selectivity isn't helped by B-tree's, but its
been my experience that people don't do that kind of a search on-line.
Searches of that nature are typically done during non-prime-time
processing...there is a need for them and if someone can think of
a better  tool to use than b-trees I think its a wide open research
area...not easy, but there is plenty of room for new ideas.

-- 
Oystein Groevlen 
Division of Computer Systems and Telematics
The Norwegian Institute of Technology
The University of Trondheim
Email: oysteing@idt.unit.no