john@geac.UUCP (John Henshaw) (08/18/87)
Every time I try to have a meaningful discussion regarding the relative performance of various RDBMSs, the issue of comparing disk/memory performance arises. In particular, benchmarks such as the "De Witt benchmark", or "TP1" are well defined [:-)] in terms of what they are attempting to do, however, the *incredible* multitude of machine configurations often defeats the task of meaningful comparison. At best, vendors are able to provide "best" numbers, useless to most potential customers. Even if there are two identical hardware configurations (ha!), then one must be careful to insist that the O/S is compiled identically, providing the same amount of cache, etc., to executing processes. That every RDBMS benchmarked always/never uses cache or raw partitions, etc., is important for comparison purposes. So folks, how do we compare RDBMS products meaningfully? What are the *important* issues, and how can one compensate for the *differences*? Has anyone attempted to catalogue a "fundamental list of prerequisites" to benchmarking? -john- -- John Henshaw, (mnetor, yetti, utgpu !geac!john) Geac Computers Ltd. "Try to fit the social norm... and be a Markham, Ontario good man in a storm..."
larry@xanadu.uucp (Larry Rowe) (08/19/87)
In article <1170@geac.UUCP> john@geac.UUCP (John Henshaw) writes: >Every time I try to have a meaningful discussion regarding the relative >performance of various RDBMSs, the issue of comparing disk/memory performance >arises. ... unfortunately, it is much worse than that. i've noticed that the load factor and distribution of free pages on a disk system can have a dramatic impact on the avg i/o time, hence the benchmark time. my sense is that the systems are roughly comparable if there isn't a major difference. for example, suppose system A does .5 TP1's a second and system B does 5 TP1's a second. this is probably a measurable difference attributable to differences in the implementations (e.g., how good is the query optimizer, is reading/writing the log optimized, does the system do group commits, etc.). however, even with this big a difference it may be that system A isn't installed properly (i.e., not enough memory is allocated for buffers, or locks, or ...). in general, my advice is... 1. pick a representative benchmark you want to compare the system on. (choose this carefully, don't let a vendor define this for you since the vendor will slant the benchmark to what he knows his system will do well.) 2. run the benchmark. 3. show it to both vendors, ask them to help you improve it. then, throw out the results from the benchmark and ask yourself: 1. which system was easiest to use? 2. which vendor helped you the most? (remember, the vendor had better help you before you spend your money, because he's not likely to give you more help after he's got your money.) 3. which system gave you twice the maximum performance you think you need. these questions will help you figure out which system/vendor is likely to help you solve your problem. lastly, find another user of the system that has a nearly identical application and workload to your environment and ask him which system solves his problem. life's not easy, but you don't have to find the absolutely optimal performing system. larry
pavlov@hscfvax.UUCP (840033@G.Pavlov) (08/20/87)
In article <1170@geac.UUCP>, john@geac.UUCP (John Henshaw) writes: > Every time I try to have a meaningful discussion regarding the relative > performance of various RDBMSs, the issue of comparing disk/memory performance > arises.... > the *incredible* multitude of machine configurations often > defeats the task of meaningful comparison. At best, vendors are able to > provide "best" numbers, useless to most potential customers... > So folks, how do we compare RDBMS products meaningfully? What are the > *important* issues, and how can one compensate for the *differences*? > Has anyone attempted to catalogue a "fundamental list of prerequisites" > to benchmarking? > It depends on the reasons one wishes to benchmark. If an intellectual exer- cise, then one may as well pretend to be able to interpolate for op sys, i/o, and cpu differences (one should also toss in the factor of new releases of the dbms systems involved, since these typically have improved performance at the rate of 30 to 40 percent with each iteration). If one already has a system and an application and wishes to obtain the "best" dbms for it, then there is a straightforward (tho labor-intensive) way to accomplish it: obtain copies of the dbms's one wishes to consider,implement a subset of the application under each one, and benchmark the performance. In doing so, vary the quantity of data involved an the number of simulated sim- ultaneous users involved, to get a sense of degradation under load and/or size. Implement the application using any and all optimizations, features, etc, available in each dbms (unless contrary to prudent operation). When all done, compare the results in light of overall functionality of each dbms, and the methods each dbms used to achieve its performance (e.g., if dbms a and b appear to be "equal" in speed, but dbms b's file structure requires 150-megabyte restores if something should fail, then a may be the better choice). In summary: yes, the total number of combinations from which benchmark re- sults may be garnered are as good as infinite. But one can make useful comparisons if one has a useful purpose. But no, there are no shortcuts (meaningful ones, anyway). greg pavlov, fstrf, amherst, ny