[comp.windows.x] Problems with XSTONES calculations in xbench

adamsc@shark.WV.TEK.COM (Chuck Adams) (08/26/89)

About two weeks ago I tried to contact Claus Gittinger with
the following concerns I have about the XSTONES calculations
in xbench.  Because I have concerns that someone may be 
using this program I will post this mail awaiting 
an answer from Claus.

--- chuck adams
adamsc@orca.wv.tek.com
{decvax ucbvax hplabs}!tektronix!orca!adamsc
Interactive Technologies Division/Visual Systems Group
Tektronix, Inc.
P.O. Box 1000, M/S 61-049
Wilsonville, OR 97070
(503) 685-2589

------------------------------------------------------------------------------

To: sinix!claus@unido.UUCP
Cc: adamsc
Subject: Problems with XSTONES calculations in xbench

I have been using the xbench program and have a fairly major
problem with the algorithm used to compute xstones.  I believe
the file xstones.awk is incorrect in the following way.

The current algorithm uses the following mathematics:

	ratio_n = measured_value / sun_value
	ratio_w = sum of (weight x sun_value / measured_value)
	          k x sum of (weight x sun_value / measured_value)
        ratio_w = ------------------------------------------------
	                       sum of (weight)

	                                         k
        ratio_w = ------------------------------------------------------------------------
	          k x (k x sum of (weight x sun_value / measured_value)) / sum of (weight)

	                       k x sum of (weight)
	ratio_w = ---------------------------------------------
	          sum of (weight x sun_value / measured_value))

The algorithm should be changed to use the following mathematics:

	ratio_n = measured_value / sun_value

	ratio_w = sum of (weight x measured_value / sun_value)

	          k x sum of (weight x measured_value / sun_value)
	ratio_w = ------------------------------------------------
	                      sum of (weight)


In theory the more something is weighted the more it should affect 
the xstone calculated. I believe the following test cases indicate the 
nature of the problem:

Test case a:

	measured_value[0] = 100 weight[0] = 300 sun_value[0] = 100
	measured_value[1] = 10  weight[1] = 600 sun_value[1] = 10

	xstone by old algorithm = 10000
	xstone should be       = 10000

Test case b:

	measured_value[0] = 50  weight[0] = 300 sun_value[0] = 100
	measured_value[1] = 20  weight[1] = 600 sun_value[1] = 10

	xstone by old algorithm = 10000
	xstone should be       = 15000

Test case c:

	measured_value[0] = 100 weight[0] = 300 sun_value[0] = 100
	measured_value[1] = 20  weight[1] = 600 sun_value[1] = 10

	xstone by old algorithm = 15000
	xstone should be       = 16666

A diagram of the correct results for these three test cases would be:

	case a		case b		case c
xstone  10000		15000		16666
			 __		 __
			|  |		|  |
			|  |		|  |
			|  |		|  |
			|  |		|  |
			|  |		|  |
	 __		|  |		|  |
	|  |		|  |		|  |
	|  |		|  |		|  |
	|  |		|  |		|  |
	|  |		|  |		|  |
weight	|  |		|  |		|  |
600	|  |		|  |		|  |
	----		----		----
weight	|  |		|  |		|  |
300	|  |		----		|  |
	|__|				|__|

The context diffs at the end of this message should fix the problem.

If you have any questions please contact me at your earliest convenience.

Thanks for your help.



---- chuck adams


*** xstones.awk	Fri Aug 11 13:59:23 1989
--- xstones.awk.orig	Fri Aug 11 12:38:41 1989
***************
*** 128,134
  /rate =/             {
  		       if ( x != "dummy" ) {
  			 ratio = $3 / sunValue[x];
! 			 runtime["all"] = runtime["all"] + w*ratio;
  			 countedWeight["all"] = countedWeight["all"] + w;
  
  			 runtime[g] = runtime[g] + w*ratio;

--- 128,134 -----
  /rate =/             {
  		       if ( x != "dummy" ) {
  			 ratio = $3 / sunValue[x];
! 			 runtime["all"] = runtime["all"] + w/ratio;
  			 countedWeight["all"] = countedWeight["all"] + w;
  
  			 runtime[g] = runtime[g] + w/ratio;
***************
*** 131,137
  			 runtime["all"] = runtime["all"] + w*ratio;
  			 countedWeight["all"] = countedWeight["all"] + w;
  
! 			 runtime[g] = runtime[g] + w*ratio;
  			 countedWeight[g] = countedWeight[g] + w;
  			 x = "dummy"; w = 1
  		       }

--- 131,137 -----
  			 runtime["all"] = runtime["all"] + w/ratio;
  			 countedWeight["all"] = countedWeight["all"] + w;
  
! 			 runtime[g] = runtime[g] + w/ratio;
  			 countedWeight[g] = countedWeight[g] + w;
  			 x = "dummy"; w = 1
  		       }
***************
*** 154,159
  		       if (cw == 0) {
  			   print "TOTAL ? lineStones"
  		       } else {
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {

--- 154,160 -----
  		       if (cw == 0) {
  			   print "TOTAL ? lineStones"
  		       } else {
+ 			   rt = (rt*allWeight)/cw;
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {
***************
*** 160,166
  			       text = "";
  		           }
  
! 			   ratio = rt / cw;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f lineStones",text,stones);
  			   print t;

--- 161,167 -----
  			       text = "";
  		           }
  
! 			   ratio = allWeight/rt;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f lineStones",text,stones);
  			   print t;
***************
*** 173,178
  		       if (cw == 0) {
  			   print "TOTAL ? fillStones"
  		       } else {
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {

--- 174,180 -----
  		       if (cw == 0) {
  			   print "TOTAL ? fillStones"
  		       } else {
+ 			   rt = (rt*allWeight)/cw;
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {
***************
*** 179,185
  			       text = "";
  		           }
  
! 			   ratio = rt / cw;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f fillStones",text,stones);
  			   print t;

--- 181,187 -----
  			       text = "";
  		           }
  
! 			   ratio = allWeight/rt;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f fillStones",text,stones);
  			   print t;
***************
*** 192,197
  		       if (cw == 0) {
  			   print "TOTAL ? blitStones"
  		       } else {
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {

--- 194,200 -----
  		       if (cw == 0) {
  			   print "TOTAL ? blitStones"
  		       } else {
+ 			   rt = (rt*allWeight)/cw;
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {
***************
*** 198,204
  			       text = "";
  		           }
  
! 			   ratio = rt / cw;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f blitStones",text,stones);
  			   print t;

--- 201,207 -----
  			       text = "";
  		           }
  
! 			   ratio = allWeight/rt;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f blitStones",text,stones);
  			   print t;
***************
*** 211,216
  		       if (cw == 0) {
  			   print "TOTAL ? arcStones"
  		       } else {
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {

--- 214,220 -----
  		       if (cw == 0) {
  			   print "TOTAL ? arcStones"
  		       } else {
+ 			   rt = (rt*allWeight)/cw;
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {
***************
*** 217,223
  			       text = "";
  		           }
  
! 			   ratio = rt / cw;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f arcStones",text,stones);
  			   print t;

--- 221,227 -----
  			       text = "";
  		           }
  
! 			   ratio = allWeight/rt;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f arcStones",text,stones);
  			   print t;
***************
*** 230,235
  		       if (cw == 0) {
  			   print "TOTAL ? textStones"
  		       } else {
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {

--- 234,240 -----
  		       if (cw == 0) {
  			   print "TOTAL ? textStones"
  		       } else {
+ 			   rt = (rt*allWeight)/cw;
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {
***************
*** 236,242
  			       text = "";
  		           }
  
! 			   ratio = rt / cw;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f textStones",text,stones);
  			   print t;

--- 241,247 -----
  			       text = "";
  		           }
  
! 			   ratio = allWeight/rt;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f textStones",text,stones);
  			   print t;
***************
*** 249,254
  		       if (cw == 0) {
  			   print "TOTAL ? complexStones"
  		       } else {
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {

--- 254,260 -----
  		       if (cw == 0) {
  			   print "TOTAL ? complexStones"
  		       } else {
+ 			   rt = (rt*allWeight)/cw;
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {
***************
*** 255,261
  			       text = "";
  		           }
  
! 			   ratio = rt / cw;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f complexStones",text,stones);
  			   print t;

--- 261,267 -----
  			       text = "";
  		           }
  
! 			   ratio = allWeight/rt;
  			   stones = int(allWeight * ratio);
  			   t = sprintf("TOTAL %s %8.0f complexStones",text,stones);
  			   print t;
***************
*** 268,273
  		       if (cw == 0) {
  			   print "TOTAL ? xStones"
  		       } else {
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {

--- 274,280 -----
  		       if (cw == 0) {
  			   print "TOTAL ? xStones"
  		       } else {
+ 			   rt = (rt*allWeight)/cw;
  			   if (mw > 0) {
  			       text = "expected ";
  		           } else {
***************
*** 274,280
  			       text = "";
  		           }
  
! 			   ratio = rt / cw;
  			   stones = int(allWeight * ratio);
  		           t = sprintf("TOTAL %s %8.0f xStones",text,stones);
  			   print t;

--- 281,287 -----
  			       text = "";
  		           }
  
! 			   ratio = allWeight/rt;
  			   stones = int(allWeight * ratio);
  		           t = sprintf("TOTAL %s %8.0f xStones",text,stones);
  			   print t;

-------

montnaro@sprite.crd.ge.com (Skip Montanaro) (08/26/89)

I also have some problems with the Xstones data. The author uses results
from a Sun-3/50 as the base (10,000 *stones) for each category. I tried
xbench on a Sun-3/50 with 68881 & 8 megs (diskless) and only got about a
7500 Xstone rating out of it. It's no big deal for me, since I use the
relative magnitudes, not the absolute numbers. It wasn't paging at all, at
least as far as our Excelan LANalyzer was concerned (essentially no network
traffic at all from the machine during the test).

--
Skip Montanaro (montanaro@sprite.crd.ge.com)

david@ms.uky.edu (David Herron -- One of the vertebrae) (08/26/89)

Funny, I was about to post a couplea questions on xbench & xstones ..

We've got an evaluation copy of an NCD-16 and I'm evaluating
it against a VaxStation 2000 that is the average-to-low end
workstation here.  I just happen to have one in our office, see.

My perception -- and I spent two full days using both and switching
back and forth -- is that they are equal in speed.  And the NCD
is faster in some things.  But maybe I'm not measuring the same
sort of things with my eyes the benchmarks are.  I'm looking at
things like iconizing/deiconizing windows, window refreshes, and
so forth.

Then I run xbench and get (old is original calculations, new is with patches)

Xstones:
			  old	  new
	Vs2000 (unix:0)	27907	131994
	Vs2000 (ether)	17301	101474
	NCD-16		 6657	 12390

Anybody else measured these terminals?  Get similar numbers?  Have
comments on xbench itself?  Have I possibly made any mistakes?  (I
did follow the directions in the README ...)

BTW, the textstone numbers are very close (15863 for Vs2000 & ether,
14549 for NCD-16) and is probably the basis for my opinion that they're
fairly equal.  Also I realize that the on-board processors are very
different, the NCD only has a 68000, so I'm not *completely* surprised
at the differences.

Don't get me wrong.  Performance comparable to a diskless sun3/50
at much less network load (no swapping over ether!) for 1/2 (at least)
the price is a good deal in my book.

Is anybody collecting Xstones numbers?
-- 
<- David Herron; an MMDF guy                              <david@ms.uky.edu>
<- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<- 
<- "So raise your right hand if you thought that was a Russian water tentacle."

rgs@jeff.megatek.uucp (Rusty Sanders) (08/31/89)

From article <4344@orca.WV.TEK.COM>, by adamsc@shark.WV.TEK.COM (Chuck Adams):
> About two weeks ago I tried to contact Claus Gittinger with
> the following concerns I have about the XSTONES calculations
> in xbench.  Because I have concerns that someone may be 
> using this program I will post this mail awaiting 
> an answer from Claus.

[description of problem and patches to fix it deleted]

I noticed when the benchmark first came in that the way it actually calculated
the synthetic stones numbers and how it described how it calculated those
numbers didn't quite agree. Without looking too deeply at your patch it does
appear to fix this problem.

However, I'm not sure that the actual problem is in the code. My impression is
that the actual algorithm used is what it should be, and the text description
should be changed to reflect the code, not the other way around.

Remember that the Xstones number is synthetic, and doesn't need to actually
represent any real comparisons. As long as different Xstones numbers can be
compaired in some predictable fasion then all is well.

With the current algorithm, servers are rewarded if they have consistent
performance across all tested areas. Bad performance in any one area can
seriously effect the Xstones number.

With the modified code, servers are rewarded if any significant areas perform
very well, even if others perform absolutly abysmally. For a general benchmark
I suspect the first behaviour is better than the second.

It doesn't help that the benchmark system (Sun 3/50, R3, no fpu) runs arcs very
slowly. This allows any decent server to get arcStones in the hundreds of
thousands, if not the millions. Even though arcStones makes up a small
percentage of the final Xstone, a small percentage of a HUGE number is still a
large number. This seriously skews the Xstones values for such machines.

These problems could be mitigated by using a better benchmark base. But I
really feel that the effect of the current algorithm gives a better comparison
base then using the algorithm described in the text, and implemented in your
patch.

Of course, none of this is to imply the xbench is really a great benchmark. Its
biggest asset is that it comes up with one final number, which can be used as
a quick "general estimation" of a server's speed. A server could have a quite
low Xstones rating, and still be the best price/performance solution to a
particular application. Likewise, a server with a high Xstones rating could be
a real dog for some applications. But, for a quick reference, I believe the
Xstones number is usable. It at least sets a scale as a basis for further
benchmarking efforts.
----
Rusty Sanders, Megatek Corp. --> rgs@megatek or...
         ...ucsd!    ...hplabs!hp-sdd!    ...ames!scubed!   ...uunet!

rws@EXPO.LCS.MIT.EDU (Bob Scheifler) (08/31/89)

    Its biggest asset is that it comes up with one final number, which can be
    used as a quick "general estimation" of a server's speed.

Single-number benchmarks are absurd.

adamsc@shark.WV.TEK.COM (Chuck Adams) (09/01/89)

> However, I'm not sure that the actual problem is in the code. My impression is
> that the actual algorithm used is what it should be, and the text description
> should be changed to reflect the code, not the other way around.
> 
> Remember that the Xstones number is synthetic, and doesn't need to actually
> represent any real comparisons. As long as different Xstones numbers can be
> compared in some predictable fashion then all is well.

I realize xstones are synthetic but as such are supposed to convey a 
warm and fuzzy feeling.  In my case they do not because the
results are exactly the opposite of what the documentation states.

Claus explicitly stated "the weights are based on our experience ... "
Claus seemed to have spent a lot of effort on working out the 
proposed weights because a lot of the documentation goes over them.
I thinks Claus intended the weights to reward performance in certain areas
and overlook minor deficiencies in other areas.  

> With the current algorithm, servers are rewarded if they have consistent
> performance across all tested areas. Bad performance in any one area can
> seriously effect the Xstones number.

This is not the case.  The current algorithm rewards exceptional performance
in areas that are weighted less than others.  Take for instance,

Test case d:

	measured_value[0] = 50 weight[0] = 300 sun_value[0] = 100
	measured_value[1] =  5 weight[1] = 600 sun_value[1] = 10

	xstone by old algorithm = 15000
	xstone should be        = 10000


         __
        |  |
        |  |
        |  |             __
        |  |            |  |
weight  |  |            |  |
600     |  |            |  |
        ----            ----
weight  |  |            |  |
300     |  |            |  |
        |__|            |  |
                        |  |
                        |  |
                        |__|

> With the modified code, servers are rewarded if any significant areas perform
> very well, even if others perform absolutely abysmally. For a general benchmark
> I suspect the first behavior is better than the second.

If you really want the later than you will have to use a third algorithm
to compute xstones.   

> It doesn't help that the benchmark system (Sun 3/50, R3, no fpu) runs arcs very
> slowly. This allows any decent server to get arcStones in the hundreds of
> thousands, if not the millions. Even though arcStones makes up a small
> percentage of the final Xstone, a small percentage of a HUGE number is still a
> large number. This seriously skews the Xstones values for such machines.

The point is that xstones are seriously skewed by either algorithm.  The old
algorithm rewards elements of the test that have low weights. The new
algorithm rewards elements of the test that have high weights.  But the
later is what the documentation states is intended.

> These problems could be mitigated by using a better benchmark base. But I
> really feel that the effect of the current algorithm gives a better comparison
> base then using the algorithm described in the text, and implemented in your
> patch.

Again the algorithm is biased to favor machines that perform better at
lesser weighted elements of the test.  This is contrary to what the
documentation states.

> Of course, none of this is to imply the xbench is really a great benchmark. Its
> biggest asset is that it comes up with one final number, which can be used as
> a quick "general estimation" of a server's speed. A server could have a quite
> low Xstones rating, and still be the best price/performance solution to a
> particular application. Likewise, a server with a high Xstones rating could be
> a real dog for some applications. But, for a quick reference, I believe the
> Xstones number is usable. It at least sets a scale as a basis for further
> benchmarking efforts.

For all you golfers out there.  Xstones is about as useful as standing on
the tee box and throwing grass up in the air to tell how to play the hole.
I rather doubt that it sets any kind of scale for benchmarking X.  

---- chuck adams
adamsc@orca.wv.tek.com
{decvax ucbvax hplabs}!tektronix!orca!adamsc
Interactive Technologies Division/Visual Systems Group
Tektronix, Inc.
P.O. Box 1000, M/S 61-049
Wilsonville, OR 97070
(503) 685-2589

jim@EXPO.LCS.MIT.EDU (Jim Fulton) (09/01/89)

    Claus explicitly stated "the weights are based on our experience ... "
    Claus seemed to have spent a lot of effort on working out the 
    proposed weights because a lot of the documentation goes over them.
    I thinks Claus intended the weights to reward performance in certain areas
    and overlook minor deficiencies in other areas.  

I'll even go further and say that this is why any single number is useless
without knowing the context in which it was generated.

I find it easiest to think of the rating as the cross-product of the various
request timings  (including things like clipping, whether or not software
cursors are used, number of subwindows, etc.) and the weighted profile of the
application to be modeled (i.e. relative importance of each element in the
set of server timings).  By plugging in different application profiles,
you'll get radically different ratings for a single server.

In other words, a server that is acceptable for software development may be 
completely unable for CAD, imaging, wysiwyg, etc.

adamsc@shark.WV.TEK.COM (Chuck Adams) (09/01/89)

In article <8909011324.AA03731@expo.lcs.mit.edu>, jim@EXPO.LCS.MIT.EDU (Jim Fulton) writes:

> 
> I'll even go further and say that this is why any single number is useless
> without knowing the context in which it was generated.
> 

Right on.  I totally agree.

> I find it easiest to think of the rating as the cross-product of the various
> request timings  (including things like clipping, whether or not software
> cursors are used, number of subwindows, etc.) and the weighted profile of the
> application to be modeled (i.e. relative importance of each element in the
> set of server timings).  By plugging in different application profiles,
> you'll get radically different ratings for a single server.

But as of this point we only have one profile.  Even if it is poorly
documented it is documented and it seems like someone out there is
using it with many misconceptions.  As I stated before it is not
an unbiased weighting scheme and as such the bias should match the
documentation.

> 
> In other words, a server that is acceptable for software development may be 
> completely unable for CAD, imaging, wysiwyg, etc.

Exactly, thanks for the clarity.