[comp.misc] RDBMS performance - disks, cache, O/S

john@geac.UUCP (John Henshaw) (08/18/87)

Every time I try to have a meaningful discussion regarding the relative
performance of various RDBMSs, the issue of comparing disk/memory performance
arises. In particular, benchmarks such as the "De Witt benchmark", or
"TP1" are well defined [:-)] in terms of what they are attempting to do,
however, the *incredible* multitude of machine configurations often
defeats the task of meaningful comparison. At best, vendors are able to
provide "best" numbers, useless to most potential customers.

Even if there are two identical hardware configurations (ha!), then one
must be careful to insist that the O/S is compiled identically,
providing the same amount of cache, etc., to executing processes. That
every RDBMS benchmarked always/never uses cache or raw partitions, etc.,
is important for comparison purposes.

So folks, how do we compare RDBMS products meaningfully? What are the
*important* issues, and how can one compensate for the *differences*? 
Has anyone attempted to catalogue a "fundamental list of prerequisites"
to benchmarking?

-john-
-- 
John Henshaw,			(mnetor, yetti, utgpu !geac!john)
Geac Computers Ltd.		"Try to fit the social norm... and be a
Markham, Ontario			 good man in a storm..."

larry@xanadu.uucp (Larry Rowe) (08/19/87)

In article <1170@geac.UUCP> john@geac.UUCP (John Henshaw) writes:
>Every time I try to have a meaningful discussion regarding the relative
>performance of various RDBMSs, the issue of comparing disk/memory performance
>arises. ...

unfortunately, it is much worse than that.  i've noticed that the load factor
and distribution of free pages on a disk system can have a dramatic
impact on the avg i/o time, hence the benchmark time.  

my sense is that the systems are roughly comparable if there isn't a major
difference.  for example, suppose system A does .5 TP1's a second and
system B does 5 TP1's a second.  this is probably a measurable difference
attributable to differences in the implementations (e.g., how good is the
query optimizer, is reading/writing the log optimized, does the system
do group commits, etc.).  however, even with this big a difference it
may be that system A isn't installed properly (i.e., not enough memory
is allocated for buffers, or locks, or ...). 

in general, my advice is...
1. pick a representative benchmark you want to compare the system on.
(choose this carefully, don't let a vendor define this for you since the
vendor will slant the benchmark to what he knows his system will do well.)
2. run the benchmark.
3. show it to both vendors, ask them to help you improve it.

then, throw out the results from the benchmark and ask yourself:
1. which system was easiest to use?
2. which vendor helped you the most?  (remember, the vendor had better
help you before you spend your money, because he's not likely to give
you more help after he's got your money.)
3. which system gave you twice the maximum performance you think you need.

these questions will help you figure out which system/vendor is likely
to help you solve your problem.  lastly, find another user of the system
that has a nearly identical application and workload to your environment
and ask him which system solves his problem.

life's not easy, but you don't have to find the absolutely optimal 
performing system.
	larry

pavlov@hscfvax.UUCP (840033@G.Pavlov) (08/20/87)

In article <1170@geac.UUCP>, john@geac.UUCP (John Henshaw) writes:
> Every time I try to have a meaningful discussion regarding the relative
> performance of various RDBMSs, the issue of comparing disk/memory performance
> arises....
> the *incredible* multitude of machine configurations often
> defeats the task of meaningful comparison. At best, vendors are able to
> provide "best" numbers, useless to most potential customers...
> So folks, how do we compare RDBMS products meaningfully? What are the
> *important* issues, and how can one compensate for the *differences*? 
> Has anyone attempted to catalogue a "fundamental list of prerequisites"
> to benchmarking?
> 
  It depends on the reasons one wishes to benchmark.  If an intellectual exer-
  cise, then one may as well pretend to be able to interpolate for op sys, i/o,
  and cpu differences (one should also toss in the factor of new releases of 
  the dbms systems involved, since these typically have improved performance
  at the rate of 30 to 40 percent with each iteration).

  If one already has a system and an application and wishes to obtain the "best"
  dbms for it, then there is a straightforward (tho labor-intensive) way to
  accomplish it: obtain copies of the dbms's one wishes to consider,implement a
  subset of the application under each one, and benchmark the performance.  In
  doing so, vary the quantity of data involved an the number of simulated sim-
  ultaneous users involved, to get a sense of degradation under load and/or 
  size.  Implement the application using any and all optimizations, features,
  etc, available in each dbms (unless contrary to prudent operation).

  When all done, compare the results in light of overall functionality of 
  each dbms, and the methods each dbms used to achieve its performance (e.g.,
  if dbms a and b appear to be "equal" in speed, but dbms b's file structure
  requires 150-megabyte restores if something should fail, then a may be the
  better choice).

  In summary: yes, the total number of combinations from which benchmark re-
  sults may be garnered are as good as infinite.  But one can make useful
  comparisons if one has a useful purpose.  But no, there are no shortcuts
  (meaningful ones, anyway).

       greg pavlov, fstrf, amherst, ny