[comp.databases] benchmarks

bgolden@infmx.UUCP (Bernard Golden) (11/29/89)

There has been some discussion in this group recently about the desire for a
'real' (i.e., 'trustable') benchmark for database products.  One of the
major complaints is that benchmarks with the same name (e.g., 'debit-credit')
vary from test to test and therefore test results are not directly 
comparable.

An organization called the Transaction Processing Performance Council has
been addressing this problem by developing a general benchmark that will 
not be subject to individual test tweaking.  Instead, the benchmark is
very specifically defined and no modification will be allowed.  Of course
individual test situations will be audited as well.  All of the major 
database vendors are participating in the Council.   Having the test
performed does not necessarily imply that the results must be made public,
however.  The benchmark is scheduled to be available in the very near
future.

-b

john@anasaz.UUCP (John Moore) (11/30/89)

In article <2715@infmx.UUCP> bgolden@infmx.UUCP (Bernard Golden) writes:
]There has been some discussion in this group recently about the desire for a
]'real' (i.e., 'trustable') benchmark for database products.  One of the
]major complaints is that benchmarks with the same name (e.g., 'debit-credit')
]vary from test to test and therefore test results are not directly 
]comparable.

My company is currently developing a hotel reservation system - something
which has much more complex database activities than a debit-credit
benchmark, and has high performance needs (60 transactions per second
with 50,000 terminals). We developed a detailed benchmark to simulate
our application, and ran it on Oracle and Informix. We were able
to quite accurately predict the performance in our application FROM
THE DEBIT CREDIT benchmarks. What we did was analyze the DC benchmark
in terms of internal database operations. We then performed the same
analysis on our benchmark. We were then able to take the results from
the DC benchmark and apply it to a model of ours.
-- 
John Moore (NJ7E)           mcdphx!anasaz!john asuvax!anasaz!john
(602) 861-7607 (day or eve) long palladium, short petroleum
7525 Clearwater Pkwy, Scottsdale, AZ 85253
The 2nd amendment is about military weapons, NOT JUST hunting weapons!

dhepner@hpisod2.HP.COM (Dan Hepner) (12/01/89)

From: bgolden@infmx.UUCP (Bernard Golden)
> 
> An organization called the Transaction Processing Performance Council has
> been addressing this problem by developing a general benchmark that will 
> not be subject to individual test tweaking.  Instead, the benchmark is
> very specifically defined and no modification will be allowed.

I'm with you Bernard in wishing this were true, but it can't be so.

Benchmarking heterogeneous systems will always be subject to individual
test tweaking.  Witness dhrystone.  And TPC-A will not include source
code (how could it).

Benchmarking is a game.  Like football.  The goal in this game is to
figure out how to show the most Transactions Per Second (TPS) while
still meeting the rules.  Clever game players will be rewarded;
whether or not this cleverness is of value external to the benchmark.
One is reminded of the hypothetical C compilers we heard about which 
checked every source file to see if it were the dhrystone benchmark, 
and if so, generated code which showed extreme speed.  It won't
be quite like that, but we are going to see some "features" which
are useless unless you want to run the TPC benchmark.

Some cases in point:

1) If you use a large file as part of your application, does that
   file have an index?  Using an indexed access method to generate
   TPS #s will be simply stupid.  It isn't required.  Use a hashed
   access method.  The fastest you can imagine for a file of just this
   _fixed_ size and composition.  Spend as long as you need to find
   the perfect scheme for this particular file. It might take a week 
   to add a new record?  No problem.  Not too usable for any known
   customer? That's ok.

2) None of the transactions used to calculate TPS will ever abort.
   Ah ha.  Does this mean I can use an extremely optimistic logging
   scheme which would take five minutes to straighten up the mess
   if a transaction actually _did_ abort?  You betcha.  The rules say 
   that such a feature must be available to customers, but not how 
   many of them might use it.

3) The transaction simulates the modification of a customer bank
   account record, but no provision needs be made for a customer
   who might exceed his authority to withdraw the specified amount.
   This allows an optimization which allows the
   backend to proceed to complete the entire transaction without
   any communication with the front end.  There's no if(reasonable) 
   part of the txn.  How many transaction applications do you know
   that submit one request to the backend, get an the answer,
   and have completed an entire transaction which modified three
   files and appended to a fourth, never having checked anything?

4) SQL required? Are you kidding?  Any DBMS at all?  Define
   DBMS. TPC Didn't even try. Almost any ad hoc access method will do.

This isn't intended to be critical of the TPC.  Their charter
was to define a benchmark which was runnable on many different types
of machines and to not preclude as yet unknown DMBS architectures.
The benchmark is far better than "Debit-Credit"  or "TP1" (which have the
above problems and much, much more).  The largest file _does_ have
to be too big to be cacheable (10 MB / TPS, implying 1GB for 100 TPS).
The Unit Under Test must receive input from and send output to
_somewhere_. Real "ACID" transactions must be used.  "Full disclosure"
is required.

It's just that benchmarking for publication isn't any more than a game.  
It cannot be used to reliably compare the predicted performance of two 
systems on a given application load.  And not just by a little, which
is where mistakes are easy to make.  "Well, we'll knock that estimate
down by 50% because our txns are a little tougher".  Clever, not-all-that-
valuable-in-the-real-world techniques can increase a score by an order 
of magnitude.

This isn't a claim that all published numbers will use non-general
techniques, but it may be a suggestion to investigate those high TPS
numbers we'll see with skepticism that the system described could
run _your_ don't-seem-that-big transactions even 1/10 that fast.

Dan Hepner

Disclaimer: I don't work on any of HP's DBMS products at all, and 
            certainly don't speak for HP with this opinion.