[comp.databases] Query Performance in commercial DBMS

oysteing@idt.unit.no (Oystein Groevlen) (04/13/91)

In my master project at The Norwegian Institute of Technology I am
working on the performance of queries in DBMS for large data volumes.
We have tested the ORACLE RDBMS (V6) with respect to query
performance.

We would be very pleased if some of the readers of this newsgroup
could send us some comments on the questions/statements below. 

What do you consider the most important aspects for choosing a DBMS?
How important is the performance of queries? We have the impression
that many organizations when needing a tool for storing and retrieving
data, choose one of the big commercial RDBMS without considering if
they really need it or if it is the best tool for their applications.
Another impression is that often the main reason for choosing a big
RDBMS is the tools that comes with it that make it easy to build
applications on it and design nice user interfaces.

We were not impressed by the performance measured on ORACLE. For
General Range Queries (GRQ) with low sectivity on the attributes,
indexes was of no help. That means that the best you can do is a full
scan of the whole table. As far as I know a good disk today can deliver
data at a speed somewhere around 1-3 MB/sec. It will take some time to
search through 1 GigaByte at that speed! 

Our question is if people with large databases are satisfied with this
query performance. Maybe the queries of the  GRQ-type is considered
impossible in the same way as walking on water. It would have been nice to
be able to do it, but it is really not possible. 

If any of you have experiences from situations where query performance
has really been a problem, we would be pleased to hear more about it.
Examples of interesting information is:
	- The size of the data volume.
	- The data model.
	- The types of questions that were the problem.
	- The life cycle of the data. Is a tuple once inserted almost
	  never updated or does updates and deletions happen all the
	  time? 
	        
Our impression is that many databases which contains very large data
volumes have a pretty simple data model, or at least the relations
between those tables that contains most of the data is pretty
straightforward. In most of the examples of very large databases (> 1
GB) that we have come across, the structure of the databases have been
hierarchical. Is this your impression? Maybe the data model has to be
simple in order to be able to collect large data volumes. 

When I write data volumes of 1 GB I mean the size of the part of
database that can be part of a query (Not including pictures and other
unformatted data). 

We would be very pleased for any response to this article. Either post
to this group or mail me. I will summarize any response I might get to
the newsgroup. 

If you are interested in any of the results of our project feel free to
ask us.

Thanks,

-- 
Oystein Groevlen 
Division of Computer Systems and Telematics
The Norwegian Institute of Technology
The University of Trondheim
Email: oysteing@idt.unit.no