[comp.benchmarks] X benchmarks

rwtucker@starbase.mitre.org (Richard W. Tucker) (05/31/91)

I'd appreciate any and all info about X performance benchmarks.  Are
there any in the SPEC suite?  I'd like to see a list of X benchmarks
for a variety of platforms.  Also, what are xstones and why do they
only apply to X terminals?  What's X/Perf?  Thanks.

 - Rick Tucker
   rwtucker@mitre.org

tohanson@gonzo.lerc.nasa.gov (Jeff Hanson) (05/31/91)

> I'd appreciate any and all info about X performance benchmarks.  
> Are there any in the SPEC suite?

No.  Currently SPEC is CPU only.  Note, they (SPEC) just announced a new
suite (actually two programs) for benchmarking multiuser applications.
SPEC and GPC (Graphics Performance Characterization) were looking at some
joint work but now that NCGA is publishing GPC's benchmark results, I doubt
if SPEC will offer any graphics benchmarks.

> I'd like to see a list of X benchmarks for a variety of platforms.  
> Also, what are xstones and why do they only apply to X terminals?  

Xstones is a measure of several primitives versus sun 3/50 performance.
This benchmark is worthless (IMHO) since failure to perform a certain task
results in getting the same score as the sun.  Xstones do apply to other
than just X terminals but the benchmark is being replace by x11perf (see
below).

> What's X/Perf?

x11perf is a test of the majority of X operations (drawing, pixel, window, 
etc.).  It has been organized into 4 sections by Digital Review.  A script
to run the benchmark and organize the results is available from 
uunet.uu.net in unix-today/benchmarks.  Enclosed is the README from this
directory.  To see an excellent example of standards oriented benchmarking,
get the HP 700 series benchmark report from HP.

Contents of the Unix Today! benchmarks directory.

README           -- this file.
PROCEDURE        -- Procedure used to benchmark X terminals for the 4/1/91 
                    issue.
x11perfcompDR    -- Bourne shell script used to massage x11perf data. Used in
                    4/1/91 issue.
4191.bench.tar.Z -- Raw x11perf output files and x11perfcompDR output files 
                    used in Unix Today! 4/1/91 review. Includes PROCEDURE file 
                    above.

"Unix Today!" is published twice a month by 

CMP Publications
600 Community Drive
Manhasset, New York 11030
(516) 562-5000


> Thanks.

Sure.  Hope this helps.

-- 
---------------------------------------------------------------------------
 Jeff Hanson - Scientific Graphics Programmer and Workstation Administrator
 NASA Lewis Research Center, MS 86-4, Cleveland, Ohio 44135
 Telephone - (216) 433-2284  Fax - (216) 433-2182
 tohanson@gonzo.lerc.nasa.gov	-   ViSC: Better Science Through Pictures

roell@informatik.tu-muenchen.de (Thomas Roell) (05/31/91)

>x11perf is a test of the majority of X operations (drawing, pixel, window, 
>etc.).  It has been organized into 4 sections by Digital Review.  A script
>to run the benchmark and organize the results is available from 
>uunet.uu.net in unix-today/benchmarks.  Enclosed is the README from this
>directory.  To see an excellent example of standards oriented benchmarking,
>get the HP 700 series benchmark report from HP.

I think you messed something up. The x11perf is a good analyzing tool, but
for server implementators, not for users. Normally you cannot guess the speed
of a specific server implementation from this bulk of numbers. Also the
x11percompDR is COMPLETLY USELESS, since there the numbers of two test were
compared WITHOUT wightening the resuslts. It's obviousely that painting a
point is quite unimportant compared to scrolling a 500x500 area. Thus the
direct comparishon is misleading and totally wrong.

The xbench is much better, cause not all primitives were tested, but those
which are *very* important for everydays work. And the results are much
more balanced. From the raw number of xstones you can guess how fast your
server will be at work. And don't joke with the comparing to a sun 3/50. I
think it's quite wise to select a LEVEL 0 for all other tests. Thus you can
guess what the number 20000 xstones means if you worked allready with
another x-server with a well known rating.

- Thomas


--
_______________________________________________________________________________
E-Mail (domain):	 roell@lan.informatik.tu-muenchen.de
UUCP (if above fails):   roell@tumult.{uucp | informatik.tu-muenchen.de}
famous last words: "diskspace - the final frontier..."

jason@cs.utexas.edu (Jason Martin Levitt) (06/03/91)

In article <1991May31.151431.9127@Informatik.TU-Muenchen.DE>, roell@informatik.tu-muenchen.de (Thomas Roell) writes:

>tohanson@gonzo.lerc.nasa.gov writes:
>>x11perf is a test of the majority of X operations (drawing, pixel, window, 
>>etc.).  It has been organized into 4 sections by Digital Review.  A script
>>to run the benchmark and organize the results is available from 
>>uunet.uu.net in unix-today/benchmarks.  [stuff deleted]
> 
> I think you messed something up. The x11perf is a good analyzing tool, but
> for server implementators, not for users. Normally you cannot guess the speed
> of a specific server implementation from this bulk of numbers. Also the
> x11percompDR is COMPLETLY USELESS, since there the numbers of two test were
> compared WITHOUT wightening the resuslts.  [rest of posting deleted]
                  [weighting?]

   I'll let someone else fight x11perfcompDR vs. xbench. IMHO, neither 
provides very useful X performance numbers, but neither is "COMPLETLY [sic]
USELESS" either. There simply is nothing else available in the public 
domain yet except equally mediocre tests and personal opinions.
   A good example of how confusing these types of numbers can be is
revealed in the table on page 57 of the June 1991 issue of Unix Review.

     ---Jason
-----
   Jason Martin Levitt                          email: jason@cs.utexas.edu

Recent X Terminal Reviews:
   "All of the X terminals...are viable contenders for desktop use."
                   --David Wilson, Unix Review, 6/91
   "It's difficult to choose an overall winner from this group." 
                   --Tom Yager, BYTE Magazine, 5/91.
   "It is difficult, if not unfair, to decide that one of the 10 
    terminals reviewed is significantly better overall."
                   --Jason Levitt, Unix Today!, 4/1/91

lonnie@hpcvlx.cv.hp.com (Lonnie Mandigo) (06/04/91)

> / hpcvlx:comp.benchmarks / jason@cs.utexas.edu (Jason Martin Levitt) /  3:17 pm  Jun  2, 1991 /
> In article <1991May31.151431.9127@Informatik.TU-Muenchen.DE>, roell@informatik.tu-muenchen.de (Thomas Roell) writes:
>

Jason writes...

>    I'll let someone else fight x11perfcompDR vs. xbench. IMHO, neither 
> provides very useful X performance numbers, but neither is "COMPLETLY [sic]
> USELESS" either. There simply is nothing else available in the public 
> domain yet except equally mediocre tests and personal opinions.

I agree with Jason there really isn't anything very good out there
for measuring X performance.  Those of us who are in the business
of publishing numbers in this area are unfortunately forced to work
with what we've got.

But, rather than cry on your collective shoulders, I offer the 
following comments for your dining pleasure.  Take them for what
they're worth... [This is moderately long, so its possibly a good 
time to move on to the next note :-)]

Reference diagram...


                    Single Operation Tests (SOT)
			|	    |
	   /------------/           \-------\          Frequency
           |                                |	       data from real
   Multi-operation Tests . . . . .> Summary of SOT     use (via xscope?)
           |    			    |          	    |
   Psuedo Application/ 	       	    Weighted Summary of <---/
        Environment                        SOT
           |
   Real Application/
     Environment w/script
           |
       Real Use


The above (nearly impossible to read) diagram describes my method for
categorizing X performance tests.  Its probably not a heck-of-a-lot
different than what might be used for any other kind of performance
testing. 

The raw data produced by most of the tests in the x11perf and xbench
suites falls into the Single Operation Test category.  In other words,
they pick a particular X operation and execute it many times in a
particular X environment and then calculate how long that operation
takes to execute on average .  As has been pointed out earlier, this
is really great for tuning up an X server, but tells an end user
almost nothing about how his application will perform.

A few other factors that are important at this level are the
techniques used by the benchmark suite to insure the quality of the
data.  These include strategies for knowing when an operation has
actually completed (i.e.  did that line really get drawn or was it
sitting in a queue somewhere waiting to get drawn when my Xlib call
returned), and the thoroughness in specifying the test environment (is
the screen saver turned off, etc.).  X11perf is very good here.  I
have been told by other investigators (and have some experience) that
xbench is not as thorough here.  This influenced our decision to focus
on x11perf.

Both x11perfcompDR and xbench follow the right hand path in the above
diagram.  X11perfcompDR stops at the summary level.  Xbench provides a
weighted summary.

X11perfcompDR is modelled after the technique used by Digital Review
Magazine for evaluating X performance.  It makes some effort to inject
reality into its summary (it eliminates all 1-pixel and 500-pixel
tests).  Our experience in using x11perfcompDR is that you can
generally trust the sign of the difference when making a comparison
(if it says that one system is faster than another, it probably is for
most applications).  To a lesser degree you can trust the magnitude of
the difference (if it says that one system is A LOT faster than
another system, it probably is for most applications).  NEVER use the
difference as a multiplier for your particular application, it will 
ALWAYS be wrong (but you can't always be sure which way it will be 
wrong).

Xbench attempts to make the reliablilty of a comparison somewhat
better by weighting the results of the individual tests.  Sometimes
this can help, but it can also make the problem worse.  Xbench uses
(intuitively derived) weights that are biased towards text.  If your
application doesn't happen to be text intensive (e.g. some CAE
application) or doesn't happen to use X's text facilities (e.g. some
document generation applications) then the numbers provided by xbench
may lead you astray.  (This doesn't imply that the "unweighted"
x11perfcompDR is better.  It is implicitly weighted by the
distribution of different types of tests.)  In general, the same
things can be said about xbench as were said about x11perfcompDR.
Most of the time its meaningful.  Sometimes its not.

A better solution for "right path" performance characterizations would
be to use something like xscope to find out what real applications
really do in a real environment.  From this information you could
(hopefully) identify various classes of applications.  Once the
classes were identified then you could weight the measurments
appropriately and possibly come up with something that is more likely
to be meaningful than what we have now.

The "left path" offers some advantages over the "right path".  A
multi-operation test contains a short (but realistic) sequence of X
operations which are executed many times to determine how long it takes
to execute that sequence.  This is necessary because the state of the
display server left by a previous operation can effect the performance
of the next operation to be executed.  Xbench contains one test which
addresses this (complex1).  I wish x11perf had some tests like this,
but I don't have time to write them.  These kinds of tests can be 
summarized in a fashion similar to single operation tests (Xbench 
does this).

A Psuedo Application/Environment test is some public domain piece of
code that attempts to simulate at least the X portion of a particular 
kind of real application.  These psuedo applications may also include 
other factors which may impact an application's performance such as; 
disk i/o, intensive computation, or interaction with other simultaneously
executing processes (e.g. a window manager).  I'm not aware of any X
specific tests that fall into this category.  The GPC benchmarks for 
measuring graphics performance might be in this category. (The graphics
may be done through X calls but not necessarily).

A Real Application/Environment with a fixed script is even better than
a Psuedo Application when only the numbers that are generated are
considered.  Unfortunately, since the code is not public domain other
problems creep up.  "Does this application run on the platforms that
I'm interested in comparing?" or "If I want this to be an officially
sanctioned standard am I going to have to pay royalties or require
purchase?" or "Which real application performance numbers should be
published in everybody's data sheet?", etc.

Real Use is, of course, the ultimate benchmark.  A real user gets to
use a real application in a real environment for a reasonable amount
of time so that he can either say "Hey this is great!  We really should
buy a 1000 of these!" or "This sucks! Get it out of here."

----------------------------------
Lonnie Mandigo
Hewlett-Packard Co.
Interface Technology Operation
Corvallis, OR.
lonnie@cv.hp.com

exudnw@exud1.ericsson.se (Dave Williams) (06/04/91)

In article <1991May31.151431.9127@Informatik.TU-Muenchen.DE> roell@informatik.tu-muenchen.de (Thomas Roell) writes:
>>x11perf is a test of the majority of X operations (drawing, pixel, window, 
>>etc.).  It has been organized into 4 sections by Digital Review.  A script
>>to run the benchmark and organize the results is available from 
>>uunet.uu.net in unix-today/benchmarks.  Enclosed is the README from this
>>directory.  To see an excellent example of standards oriented benchmarking,
>>get the HP 700 series benchmark report from HP.
>
>I think you messed something up. The x11perf is a good analyzing tool, but
>for server implementators, not for users. Normally you cannot guess the speed
>of a specific server implementation from this bulk of numbers. Also the
>x11percompDR is COMPLETLY USELESS, since there the numbers of two test were
>compared WITHOUT wightening the resuslts. It's obviousely that painting a
>point is quite unimportant compared to scrolling a 500x500 area. Thus the
>direct comparishon is misleading and totally wrong.
>
>The xbench is much better, cause not all primitives were tested, but those
>which are *very* important for everydays work. And the results are much
>more balanced. From the raw number of xstones you can guess how fast your
>server will be at work. And don't joke with the comparing to a sun 3/50. I
>think it's quite wise to select a LEVEL 0 for all other tests. Thus you can
>guess what the number 20000 xstones means if you worked allready with
>another x-server with a well known rating.
>
>- Thomas
IMHO, after running both X11perf, xbench, and a "killer" application on a *wide*
variety of X servers and hosts, both x11perf and xbench gave misleading results
that were poor indicators of applications performance.

If you have a primary application (mine was mechanical CAD/CAE), use it as your
benchmark.  If you are doing general office automation junk, then either of these
standard benchmarks will tell you something, but neither will tell you everything
you want to know.

--
= exudnw@exurchn1.ericsson.se || dnw@ponder.csci.unt.edu  (214)907-7928 =
= David Williams                                                        =
= Ericsson Network Systems                                              =
= Richardson, TX 75081                These opinions are my own.        =