oysteing@idt.unit.no (Oystein Groevlen) (04/13/91)
In my master project at The Norwegian Institute of Technology I am working on the performance of queries in DBMS for large data volumes. We have tested the ORACLE RDBMS (V6) with respect to query performance. We would be very pleased if some of the readers of this newsgroup could send us some comments on the questions/statements below. What do you consider the most important aspects for choosing a DBMS? How important is the performance of queries? We have the impression that many organizations when needing a tool for storing and retrieving data, choose one of the big commercial RDBMS without considering if they really need it or if it is the best tool for their applications. Another impression is that often the main reason for choosing a big RDBMS is the tools that comes with it that make it easy to build applications on it and design nice user interfaces. We were not impressed by the performance measured on ORACLE. For General Range Queries (GRQ) with low sectivity on the attributes, indexes was of no help. That means that the best you can do is a full scan of the whole table. As far as I know a good disk today can deliver data at a speed somewhere around 1-3 MB/sec. It will take some time to search through 1 GigaByte at that speed! Our question is if people with large databases are satisfied with this query performance. Maybe the queries of the GRQ-type is considered impossible in the same way as walking on water. It would have been nice to be able to do it, but it is really not possible. If any of you have experiences from situations where query performance has really been a problem, we would be pleased to hear more about it. Examples of interesting information is: - The size of the data volume. - The data model. - The types of questions that were the problem. - The life cycle of the data. Is a tuple once inserted almost never updated or does updates and deletions happen all the time? Our impression is that many databases which contains very large data volumes have a pretty simple data model, or at least the relations between those tables that contains most of the data is pretty straightforward. In most of the examples of very large databases (> 1 GB) that we have come across, the structure of the databases have been hierarchical. Is this your impression? Maybe the data model has to be simple in order to be able to collect large data volumes. When I write data volumes of 1 GB I mean the size of the part of database that can be part of a query (Not including pictures and other unformatted data). We would be very pleased for any response to this article. Either post to this group or mail me. I will summarize any response I might get to the newsgroup. If you are interested in any of the results of our project feel free to ask us. Thanks, -- Oystein Groevlen Division of Computer Systems and Telematics The Norwegian Institute of Technology The University of Trondheim Email: oysteing@idt.unit.no