rwtucker@starbase.mitre.org (Richard W. Tucker) (05/31/91)
I'd appreciate any and all info about X performance benchmarks. Are there any in the SPEC suite? I'd like to see a list of X benchmarks for a variety of platforms. Also, what are xstones and why do they only apply to X terminals? What's X/Perf? Thanks. - Rick Tucker rwtucker@mitre.org
tohanson@gonzo.lerc.nasa.gov (Jeff Hanson) (05/31/91)
> I'd appreciate any and all info about X performance benchmarks. > Are there any in the SPEC suite? No. Currently SPEC is CPU only. Note, they (SPEC) just announced a new suite (actually two programs) for benchmarking multiuser applications. SPEC and GPC (Graphics Performance Characterization) were looking at some joint work but now that NCGA is publishing GPC's benchmark results, I doubt if SPEC will offer any graphics benchmarks. > I'd like to see a list of X benchmarks for a variety of platforms. > Also, what are xstones and why do they only apply to X terminals? Xstones is a measure of several primitives versus sun 3/50 performance. This benchmark is worthless (IMHO) since failure to perform a certain task results in getting the same score as the sun. Xstones do apply to other than just X terminals but the benchmark is being replace by x11perf (see below). > What's X/Perf? x11perf is a test of the majority of X operations (drawing, pixel, window, etc.). It has been organized into 4 sections by Digital Review. A script to run the benchmark and organize the results is available from uunet.uu.net in unix-today/benchmarks. Enclosed is the README from this directory. To see an excellent example of standards oriented benchmarking, get the HP 700 series benchmark report from HP. Contents of the Unix Today! benchmarks directory. README -- this file. PROCEDURE -- Procedure used to benchmark X terminals for the 4/1/91 issue. x11perfcompDR -- Bourne shell script used to massage x11perf data. Used in 4/1/91 issue. 4191.bench.tar.Z -- Raw x11perf output files and x11perfcompDR output files used in Unix Today! 4/1/91 review. Includes PROCEDURE file above. "Unix Today!" is published twice a month by CMP Publications 600 Community Drive Manhasset, New York 11030 (516) 562-5000 > Thanks. Sure. Hope this helps. -- --------------------------------------------------------------------------- Jeff Hanson - Scientific Graphics Programmer and Workstation Administrator NASA Lewis Research Center, MS 86-4, Cleveland, Ohio 44135 Telephone - (216) 433-2284 Fax - (216) 433-2182 tohanson@gonzo.lerc.nasa.gov - ViSC: Better Science Through Pictures
roell@informatik.tu-muenchen.de (Thomas Roell) (05/31/91)
>x11perf is a test of the majority of X operations (drawing, pixel, window, >etc.). It has been organized into 4 sections by Digital Review. A script >to run the benchmark and organize the results is available from >uunet.uu.net in unix-today/benchmarks. Enclosed is the README from this >directory. To see an excellent example of standards oriented benchmarking, >get the HP 700 series benchmark report from HP. I think you messed something up. The x11perf is a good analyzing tool, but for server implementators, not for users. Normally you cannot guess the speed of a specific server implementation from this bulk of numbers. Also the x11percompDR is COMPLETLY USELESS, since there the numbers of two test were compared WITHOUT wightening the resuslts. It's obviousely that painting a point is quite unimportant compared to scrolling a 500x500 area. Thus the direct comparishon is misleading and totally wrong. The xbench is much better, cause not all primitives were tested, but those which are *very* important for everydays work. And the results are much more balanced. From the raw number of xstones you can guess how fast your server will be at work. And don't joke with the comparing to a sun 3/50. I think it's quite wise to select a LEVEL 0 for all other tests. Thus you can guess what the number 20000 xstones means if you worked allready with another x-server with a well known rating. - Thomas -- _______________________________________________________________________________ E-Mail (domain): roell@lan.informatik.tu-muenchen.de UUCP (if above fails): roell@tumult.{uucp | informatik.tu-muenchen.de} famous last words: "diskspace - the final frontier..."
jason@cs.utexas.edu (Jason Martin Levitt) (06/03/91)
In article <1991May31.151431.9127@Informatik.TU-Muenchen.DE>, roell@informatik.tu-muenchen.de (Thomas Roell) writes: >tohanson@gonzo.lerc.nasa.gov writes: >>x11perf is a test of the majority of X operations (drawing, pixel, window, >>etc.). It has been organized into 4 sections by Digital Review. A script >>to run the benchmark and organize the results is available from >>uunet.uu.net in unix-today/benchmarks. [stuff deleted] > > I think you messed something up. The x11perf is a good analyzing tool, but > for server implementators, not for users. Normally you cannot guess the speed > of a specific server implementation from this bulk of numbers. Also the > x11percompDR is COMPLETLY USELESS, since there the numbers of two test were > compared WITHOUT wightening the resuslts. [rest of posting deleted] [weighting?] I'll let someone else fight x11perfcompDR vs. xbench. IMHO, neither provides very useful X performance numbers, but neither is "COMPLETLY [sic] USELESS" either. There simply is nothing else available in the public domain yet except equally mediocre tests and personal opinions. A good example of how confusing these types of numbers can be is revealed in the table on page 57 of the June 1991 issue of Unix Review. ---Jason ----- Jason Martin Levitt email: jason@cs.utexas.edu Recent X Terminal Reviews: "All of the X terminals...are viable contenders for desktop use." --David Wilson, Unix Review, 6/91 "It's difficult to choose an overall winner from this group." --Tom Yager, BYTE Magazine, 5/91. "It is difficult, if not unfair, to decide that one of the 10 terminals reviewed is significantly better overall." --Jason Levitt, Unix Today!, 4/1/91
lonnie@hpcvlx.cv.hp.com (Lonnie Mandigo) (06/04/91)
> / hpcvlx:comp.benchmarks / jason@cs.utexas.edu (Jason Martin Levitt) / 3:17 pm Jun 2, 1991 / > In article <1991May31.151431.9127@Informatik.TU-Muenchen.DE>, roell@informatik.tu-muenchen.de (Thomas Roell) writes: > Jason writes... > I'll let someone else fight x11perfcompDR vs. xbench. IMHO, neither > provides very useful X performance numbers, but neither is "COMPLETLY [sic] > USELESS" either. There simply is nothing else available in the public > domain yet except equally mediocre tests and personal opinions. I agree with Jason there really isn't anything very good out there for measuring X performance. Those of us who are in the business of publishing numbers in this area are unfortunately forced to work with what we've got. But, rather than cry on your collective shoulders, I offer the following comments for your dining pleasure. Take them for what they're worth... [This is moderately long, so its possibly a good time to move on to the next note :-)] Reference diagram... Single Operation Tests (SOT) | | /------------/ \-------\ Frequency | | data from real Multi-operation Tests . . . . .> Summary of SOT use (via xscope?) | | | Psuedo Application/ Weighted Summary of <---/ Environment SOT | Real Application/ Environment w/script | Real Use The above (nearly impossible to read) diagram describes my method for categorizing X performance tests. Its probably not a heck-of-a-lot different than what might be used for any other kind of performance testing. The raw data produced by most of the tests in the x11perf and xbench suites falls into the Single Operation Test category. In other words, they pick a particular X operation and execute it many times in a particular X environment and then calculate how long that operation takes to execute on average . As has been pointed out earlier, this is really great for tuning up an X server, but tells an end user almost nothing about how his application will perform. A few other factors that are important at this level are the techniques used by the benchmark suite to insure the quality of the data. These include strategies for knowing when an operation has actually completed (i.e. did that line really get drawn or was it sitting in a queue somewhere waiting to get drawn when my Xlib call returned), and the thoroughness in specifying the test environment (is the screen saver turned off, etc.). X11perf is very good here. I have been told by other investigators (and have some experience) that xbench is not as thorough here. This influenced our decision to focus on x11perf. Both x11perfcompDR and xbench follow the right hand path in the above diagram. X11perfcompDR stops at the summary level. Xbench provides a weighted summary. X11perfcompDR is modelled after the technique used by Digital Review Magazine for evaluating X performance. It makes some effort to inject reality into its summary (it eliminates all 1-pixel and 500-pixel tests). Our experience in using x11perfcompDR is that you can generally trust the sign of the difference when making a comparison (if it says that one system is faster than another, it probably is for most applications). To a lesser degree you can trust the magnitude of the difference (if it says that one system is A LOT faster than another system, it probably is for most applications). NEVER use the difference as a multiplier for your particular application, it will ALWAYS be wrong (but you can't always be sure which way it will be wrong). Xbench attempts to make the reliablilty of a comparison somewhat better by weighting the results of the individual tests. Sometimes this can help, but it can also make the problem worse. Xbench uses (intuitively derived) weights that are biased towards text. If your application doesn't happen to be text intensive (e.g. some CAE application) or doesn't happen to use X's text facilities (e.g. some document generation applications) then the numbers provided by xbench may lead you astray. (This doesn't imply that the "unweighted" x11perfcompDR is better. It is implicitly weighted by the distribution of different types of tests.) In general, the same things can be said about xbench as were said about x11perfcompDR. Most of the time its meaningful. Sometimes its not. A better solution for "right path" performance characterizations would be to use something like xscope to find out what real applications really do in a real environment. From this information you could (hopefully) identify various classes of applications. Once the classes were identified then you could weight the measurments appropriately and possibly come up with something that is more likely to be meaningful than what we have now. The "left path" offers some advantages over the "right path". A multi-operation test contains a short (but realistic) sequence of X operations which are executed many times to determine how long it takes to execute that sequence. This is necessary because the state of the display server left by a previous operation can effect the performance of the next operation to be executed. Xbench contains one test which addresses this (complex1). I wish x11perf had some tests like this, but I don't have time to write them. These kinds of tests can be summarized in a fashion similar to single operation tests (Xbench does this). A Psuedo Application/Environment test is some public domain piece of code that attempts to simulate at least the X portion of a particular kind of real application. These psuedo applications may also include other factors which may impact an application's performance such as; disk i/o, intensive computation, or interaction with other simultaneously executing processes (e.g. a window manager). I'm not aware of any X specific tests that fall into this category. The GPC benchmarks for measuring graphics performance might be in this category. (The graphics may be done through X calls but not necessarily). A Real Application/Environment with a fixed script is even better than a Psuedo Application when only the numbers that are generated are considered. Unfortunately, since the code is not public domain other problems creep up. "Does this application run on the platforms that I'm interested in comparing?" or "If I want this to be an officially sanctioned standard am I going to have to pay royalties or require purchase?" or "Which real application performance numbers should be published in everybody's data sheet?", etc. Real Use is, of course, the ultimate benchmark. A real user gets to use a real application in a real environment for a reasonable amount of time so that he can either say "Hey this is great! We really should buy a 1000 of these!" or "This sucks! Get it out of here." ---------------------------------- Lonnie Mandigo Hewlett-Packard Co. Interface Technology Operation Corvallis, OR. lonnie@cv.hp.com
exudnw@exud1.ericsson.se (Dave Williams) (06/04/91)
In article <1991May31.151431.9127@Informatik.TU-Muenchen.DE> roell@informatik.tu-muenchen.de (Thomas Roell) writes: >>x11perf is a test of the majority of X operations (drawing, pixel, window, >>etc.). It has been organized into 4 sections by Digital Review. A script >>to run the benchmark and organize the results is available from >>uunet.uu.net in unix-today/benchmarks. Enclosed is the README from this >>directory. To see an excellent example of standards oriented benchmarking, >>get the HP 700 series benchmark report from HP. > >I think you messed something up. The x11perf is a good analyzing tool, but >for server implementators, not for users. Normally you cannot guess the speed >of a specific server implementation from this bulk of numbers. Also the >x11percompDR is COMPLETLY USELESS, since there the numbers of two test were >compared WITHOUT wightening the resuslts. It's obviousely that painting a >point is quite unimportant compared to scrolling a 500x500 area. Thus the >direct comparishon is misleading and totally wrong. > >The xbench is much better, cause not all primitives were tested, but those >which are *very* important for everydays work. And the results are much >more balanced. From the raw number of xstones you can guess how fast your >server will be at work. And don't joke with the comparing to a sun 3/50. I >think it's quite wise to select a LEVEL 0 for all other tests. Thus you can >guess what the number 20000 xstones means if you worked allready with >another x-server with a well known rating. > >- Thomas IMHO, after running both X11perf, xbench, and a "killer" application on a *wide* variety of X servers and hosts, both x11perf and xbench gave misleading results that were poor indicators of applications performance. If you have a primary application (mine was mechanical CAD/CAE), use it as your benchmark. If you are doing general office automation junk, then either of these standard benchmarks will tell you something, but neither will tell you everything you want to know. -- = exudnw@exurchn1.ericsson.se || dnw@ponder.csci.unt.edu (214)907-7928 = = David Williams = = Ericsson Network Systems = = Richardson, TX 75081 These opinions are my own. =